Review
ZHAO Tie-jun,LV Ya-juan,YU Hao,YANG Mu-yun,LIU Fang
2001, 15(1): 13-18.
The automatic word segmentation of Chinese sentences is difficult when the processing mechanism faces large-scale real texts. The crucial two issues in Chinese segmentation are the identification of unknown words and the disambiguation of segmentation strings. This paper describes a strategy based on multi-steps processing for decreasing the difficulties and improving the accuracy of the segmentation. The processing steps include seven parts , i. e. , disambiguation of pseudo-ambiguities ,full segmentation of a sentence , determinate segmentation for some words , processing of numeral string ,processing for reduplication of words ,statistical identification for unknown words and final correction for segmentation ambiguities with part-of-speech which is integrated in the tagger. The output of this procedure is promising with above 98% accuracy in open test .