The Role of High Frequent Maximal Crossing Ambiguities in Chinese Word Segmentation
Sun Maosong , Zuo Zhengping , Benjamin K Tsou
Author information+
1.The State Key Laboratory of Intelligent Technology and Systems , Tsinghua University 2.Language Information Sciences Research Centre , City University of Hong Kong
The solution of crossing ambiguities is still an open issue in the study of Chinese word segmentation. In this paper , we introduce the concept of maximal crossing ambiguity at first , divide it further into two major types , i. e. , the true and the pseudo. Having observed a Chinese corpus with 100M characters , we find that the high frequent part of maximal crossing ambiguities is strong in coverage capacity (the coverage of the top 4,619 is as high as 59.20% , out of which 4,279 belongs to the pseudo type , with coverage 53.35%) and rather stable with regard to domain shifting. As a consequence , we propose for high frequent maximal crossing ambiguities a memory - based strategy that is expected to improve the performance of practical Chinese word segmenters significantly.
Sun Maosong1 , Zuo Zhengping1 , Benjamin K Tsou2.
The Role of High Frequent Maximal Crossing Ambiguities in Chinese Word Segmentation. Journal of Chinese Information Processing. 1999, 13(1): 28-35