Sediyegvl Enwer1, Xiang Lu2, Zong Chengqing2, Akbar Pattar1, Askar Hamdulla1
1. Institute of Information Science and Engineering,Xinjiang University, Urumqi, Xinjiang 830046, China; 2. National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Abstract:Uyghur is an agglutinative language with complex morphology, Uyghur words stem segmentation plays an important role in Uyghur language information processing. But so far, the performance of the Uyghur words stem segmentation still has much room for improvement .According to the constraints of Uyghur word formation, we proposed a stem segmentation model for Uyghur which fuses the part of speech feature and context information based on N-gram model. Experimental results show that, the part of speech feature and the context information of stem can increase the performance of Uyghur words stem segmentation significantly with the accuracy reaching 95.19% and 96.60% respectively compared to the baseline system.
[1] Nagata, Masaaki, A stochastic Japanese morphological analyzer using a forward- DP backward-A N-best search algorithm[C]//Proceedings of the 15th conference on Computational linguistics-Volume 1, 1994. [2] Buckwalter Tim. Buckwalter Arabic Morphological Analyzer Version 1.0, 2002. [3] 姜文斌,吴金星,乌日力嘎等. 蒙古语有向图形态分析器的判别式词干词缀切分[J]. 中文信息学报,2011,25(04):30-34. [4] 早克热·卡德尔,艾山等. 维吾尔语名词构形词缀有限状态自动机的构造[J].中文信息学报, 2009, 23(6): 116-121. [5] 古丽拉·阿东别克,米吉提·阿布力米提.维吾尔语词切分方法初探[J].中文信息学报,2004,18(6):61-65. [6] 麦热哈巴·艾力,姜文斌,王志洋等. 维吾尔语词法分析的有向图模型[J]. 软件学报,2012,23(12):3115-3129 [7] Aisha B. A Letter Tagging Approach to Uyghur Tokenization[C]//Proceedings of the 2010 International Conference on Asian Language Processing:IEEE Computer Society, 2010:11-14. [8] Ablimit M, Eli M, Kawahara T. Partly supervised Uyghur morpheme segmentation[C]//Proceedings of the Oriental-COCOSDA Workshop.2008. 71-76. [9] 米吉提·阿布力米提,库尔班·吾布力. 在多文种环境下的维吾尔语文字校对系统的开发研究[J]. 系统工程理论与实践,2003,05:117-124. [10] 哈力克·尼亚孜.基础维吾尔语[M ].乌鲁木齐:新疆大学出版社. 1997: 73. [11] 哈米提·铁木尔.现代维吾尔语语法[M ].北京:民族出版社. 1987: 47-48. [12] 米热古丽·艾力,米吉提·阿不力米提,艾斯卡尔·艾木都拉.基于词法分析的维吾尔语元音弱化算法研究.中文信息学报[J]. 2008,04:43-47.