维吾尔语是形态变化复杂的黏着性语言,维吾尔语词干词缀切分对维吾尔语信息处理具有非常重要的意义,但到目前为止,维吾尔语词干提取的性能仍存在较大的改进空间。该文以N-gram模型为基本框架,根据维吾尔语的构词约束条件,提出了融合词性特征和上下文词干信息的维吾尔语词干提取模型。实验结果表明,词性特征和上下文词干信息可以显著提高维吾尔语词干提取的准确率,与基准系统比较,融入了词性特征和上下文词干信息的实验准确率分别达到了95.19%和96.60%。
Abstract
Uyghur is an agglutinative language with complex morphology, Uyghur words stem segmentation plays an important role in Uyghur language information processing. But so far, the performance of the Uyghur words stem segmentation still has much room for improvement .According to the constraints of Uyghur word formation, we proposed a stem segmentation model for Uyghur which fuses the part of speech feature and context information based on N-gram model. Experimental results show that, the part of speech feature and the context information of stem can increase the performance of Uyghur words stem segmentation significantly with the accuracy reaching 95.19% and 96.60% respectively compared to the baseline system.
关键词
维吾尔语 /
形态 /
词干提取 /
N-gram模型 /
词性特征 /
上下文词干信息
{{custom_keyword}} /
Key words
Uyghur /
morphology /
stem segmentation /
N-gram model /
part of speech /
context information
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Nagata, Masaaki, A stochastic Japanese morphological analyzer using a forward- DP backward-A N-best search algorithm[C]//Proceedings of the 15th conference on Computational linguistics-Volume 1, 1994.
[2] Buckwalter Tim. Buckwalter Arabic Morphological Analyzer Version 1.0, 2002.
[3] 姜文斌,吴金星,乌日力嘎等. 蒙古语有向图形态分析器的判别式词干词缀切分[J]. 中文信息学报,2011,25(04):30-34.
[4] 早克热·卡德尔,艾山等. 维吾尔语名词构形词缀有限状态自动机的构造[J].中文信息学报, 2009, 23(6): 116-121.
[5] 古丽拉·阿东别克,米吉提·阿布力米提.维吾尔语词切分方法初探[J].中文信息学报,2004,18(6):61-65.
[6] 麦热哈巴·艾力,姜文斌,王志洋等. 维吾尔语词法分析的有向图模型[J]. 软件学报,2012,23(12):3115-3129
[7] Aisha B. A Letter Tagging Approach to Uyghur Tokenization[C]//Proceedings of the 2010 International Conference on Asian Language Processing:IEEE Computer Society, 2010:11-14.
[8] Ablimit M, Eli M, Kawahara T. Partly supervised Uyghur morpheme segmentation[C]//Proceedings of the Oriental-COCOSDA Workshop.2008. 71-76.
[9] 米吉提·阿布力米提,库尔班·吾布力. 在多文种环境下的维吾尔语文字校对系统的开发研究[J]. 系统工程理论与实践,2003,05:117-124.
[10] 哈力克·尼亚孜.基础维吾尔语[M ].乌鲁木齐:新疆大学出版社. 1997: 73.
[11] 哈米提·铁木尔.现代维吾尔语语法[M ].北京:民族出版社. 1987: 47-48.
[12] 米热古丽·艾力,米吉提·阿不力米提,艾斯卡尔·艾木都拉.基于词法分析的维吾尔语元音弱化算法研究.中文信息学报[J]. 2008,04:43-47.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61163032)
{{custom_fund}}