龙从军,刘汇丹,吴 健. 藏语音节标注研究[J]. 中文信息学报, 2017, 31(4): 89-93.
LONG Congjun, LIU Huidan, WU Jian. Research on Tagging of Tibetan Syllables. , 2017, 31(4): 89-93.
1. Institute of Ethnology and Anthropology, Chinese Academy of Social Sciences, Beijing 100081,China; 2. Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Abstract:“Syllables” of Tibetan language are very important in vocabulary construction and text information processing, especially for solving the segmentation and annotation of OOVs. This paper proposes to tag the syllables, which can be applied to predict POS of compound words (especially OOVs) according to the rules of words-construction. This paper presents the definition of the Tibetan syllable, outlines and the principles of classification and labeling. The train and test texts are selected from teaching material of Tibetan language of primary and secondary schools, total 240K syllables. Experiments reveals a precision of 93.5208% for syllable tagging, upon which an improved 94.1967% accuracy for POS tagging can be reached. And given the gold-standard of syllable tagging, the accuracy of POS tagging will be improved to 97.775 4%.