Abstract:Automatic identification and annotation of fixed phrases are esseential to the Mongolian text processing. On the basis of “Mongolian Fixed Phrase Grammatical Information Dictionary”, this paper designs and implements an algorithm for Mongolian fixed phrase recognition and labeling based on finite state automata and rules. Experiments reavel an recognition rate of more than 90%, and an average processing speed of 0.005 millisecond per word.
[1] 吴金星. 蒙古语词法标注语料库的构建及相关技术研究[D]. 内蒙古大学硕士学位论文, 2011. [2] 华沙宝. 蒙古语语料库的词类标注系统——AYIMAG[J]. 内蒙古大学学报(人文社会科学版), 1999, 31(5):31-35. [3] 张贯虹, 斯·劳格劳, 乌达巴拉. 融合形态特征的最大熵蒙古文词性标注模型[J]. 计算机研究与发展, 2011, 48(12):2385-2390. [4] 王斯日古楞. 蒙古语单词词性自动识别研究[J]. 内蒙古师范大学学报(自然科学汉文版), 2007, 36(3):319-321. [5] 吴金星. 蒙古语语料库加工集成平台的构建[D], 内蒙古大学博士学位论文, 2015. [6] 德·青格乐图. 现代蒙古语固定短语语法信息词典详解[M]. 呼和浩特:内蒙古教育出版社, 2005. [7] 德·青格乐图. 面向信息处理的蒙古语固定词组研究[M]. 呼和浩特:内蒙古教育出版社, 2001. [8] 德·青格乐图. 面向信息处理的蒙古语固定词组分类[J]. 内蒙古师范大学学报(哲学社会科学蒙文版), 2000(3):120-128. [9] 斯·劳格劳. 基于不确定有限自动机的蒙古文校对算法[J]. 中文信息学报, 2009, 23(6):110-115. [10] 姜文斌, 吴金星, 乌日力嘎, 等. 蒙古语有向图形态分析器的判别式词干词缀切分[J]. 中文信息学报, 2011, 25(4):30-34. [11] Yael Cohen-Sygal. Finite-state registered automata for non-concatenative morphology[C]//Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Australia:Association for Computational Linguistics, 2006, 681-688. [12] Yona S, Shuly Wintner. A fnite-state morphological grammar of Hebrew[J]. Natural Language Engineering, 2008, 14(2):173-190. [13] Koskenniemi K. Two-level morphology:a general computational model for word-form recognition and production. The Department of General Linguistics, University of Helsinki, PUBLICATION, 1983(11). [14] Kay M. Non-concatenative fnite-state morphology[C]//Proceedings of the Third Conference of the European Chapter of the Association for Computational Linguistics, Cpenhagen, Denmark, 1987:2-10. [15] Zoltán Juhász, ádám Sipos , Implementation of a finite state machine with active libraries in C++[C]//Proceedings of the 7th International Conference on Applied Informatics Eger, Hungary, 2007(2):247-255. [16] 阿孜古丽·夏力甫, 早克热·卡德尔, 吐尔根·依布拉音. 维吾尔语动词体范畴的有限状态自动机的构建[J]. 中文信息学报, 2012, 26(4):61-65.