Review
JIANG Wenbin1,WU Jinxing1,2, CHANG Qing1,2,Nasanurtu2 ,LIU Qun1,ZHAO Lili1,3
2011, 25(5): 94-101.
We propose a generative statistical model for Mongolian lexical analysis. This model describes the lexical analysis result as a directed graph, where the nodes represent the stems, affixes and their tags, while the edges represent the transition or generation relationships between nodes. Especially in this work, we adopt three kinds of transition or generation probabilitiesa) probabilities of stem-stem transition, affix-affix transition and stem-affix generation; b) the transition or generation probabilities between the corresponding tags; and c) the generation probabilities between stems or affixes and their tags. Using the 3rd-level annotated corpus with about 200 000 words as the training data, this model achieves a word-level segmentation accuracy of 95.1%, and a word-level joint segmentation and tagging accuracy of 93%.
Key wordsMongolian; lexical analysis; segmentation; POS tagging; stemming; directed graph