维吾尔语动词体范畴的有限状态自动机的构建

阿孜古丽·夏力甫1,3,早克热·卡德尔2,吐尔根·依布拉音2

PDF(4235 KB)
PDF(4235 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (4) : 61-66.
综述

维吾尔语动词体范畴的有限状态自动机的构建

  • 阿孜古丽·夏力甫1,3,早克热·卡德尔2,吐尔根·依布拉音2
作者信息 +

Generating the Finite State Machines of Uyghur Verb Aspect Categories

  • Arzugul·XERIP1,3, Zokre·KADER2, Turghun·IBRAYIM2
Author information +
History +

摘要

维吾尔语动词的体范畴是维吾尔语动词语法范畴中极为复杂的范畴,也是维吾尔语信息处理中的难点问题之一,计算机对维吾尔语动词体范畴的处理是在对人称、时、否定等语法范畴处理之后才进行处理。但是难点就是体范畴重叠问题的解决。维吾尔语动词的体范畴词尾按照一定的规则连接在词干,这使得维吾尔语动词体范畴的重叠形式可用有限状态自动机形式化描述。因此它根据重叠规则构造从右向左的非确定自动机,之后把从右向左方向的自动机转换成从左向右的非确定自动机,最后把非确定自动机转换成确定自动机来实现维吾尔语动词体范畴的形式化描述。

Abstract

The verb aspect category is one of the most complicated categories in Uighur language and, thus, remains as one of the hardest problems in Uyghur language processing. Computer processing of verb aspect category can only be done after resolving the grammatical categories such as tense, person, negative in Uighur language. But overlapping of verb aspect is hard to crack. The verb aspect suffixes of Uighur language are attached to the verb stem according to specific rules, which enables to describe the overlapping forms of Uyghur verb aspect in terms of finite state machine. An FSM can be firstly generated from right to left according to overlapping rules, then it can be transformed into DFA from left to right, during which the formal description of Uyghur verb aspect is realized.
Key wordsUyghur language; verb; aspect category, finite state machine, formalization

关键词

维吾尔语 / 动词 / 体范畴 / 有限状态自动机 / 形式化

Key words

Uyghur language / verb / aspect category, finite state machine, formalization

引用本文

导出引用
阿孜古丽·夏力甫1,3,早克热·卡德尔2,吐尔根·依布拉音2. 维吾尔语动词体范畴的有限状态自动机的构建. 中文信息学报. 2012, 26(4): 61-66
Arzugul·XERIP1,3, Zokre·KADER2, Turghun·IBRAYIM2. Generating the Finite State Machines of Uyghur Verb Aspect Categories. Journal of Chinese Information Processing. 2012, 26(4): 61-66

参考文献

[1] 力提甫·托乎提.从短语结构到最简方案—阿尔泰语言的句法结构 [M]. 北京: 中央民族大学出版社,2004,79.
[2] 吐尔迪·艾合买提. 维吾尔语[M]. 新疆: 人民出版社,1981,716.
[3] 哈米提·铁木尔. 现代维吾尔语语法[M]. 北京: 民族出版社,1987,390-421.
[4] 高莉琴. 维吾尔语语法结构分析[M]. 新疆: 民族出版社,1987,127-155.
[5] 蒲泉,武致中. 实用维吾尔语语法[M]. 新疆: 人民出版社,1994,155.
[6] 程适良. 现代维吾尔语语法[M]. 新疆: 人民出版社,1996,444-470.
[7] 刘珉. 汉维共时语法[M]. 新疆: 人民出版社,1991,143-155.
[8] 力提甫·托乎提.从短语结构到最简方案—阿尔泰语言的句法结构[M]. 北京: 中央民族大学出版社,2004,80-85.
[9] 木哈白提· 哈斯木,哈力克·尼亚孜.现代维吾尔语动词体语缀的重叠与分布[J]. 民族语文,1996,(1):57-60.
[10] 木哈白提·哈斯木,哈力克·尼亚孜.现代维吾尔语动词体语缀wet,wal,wer,ala,wat探析[J]. 语言与翻译,1996,(2):12-15.
[11] L.S.Larkey, L.Ballesteros, M.E.Connell. Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis[C]//Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, Aug. 2002: 275-282.
[12] Greengrass M., Robertson A. M., Robyn S., et al. Processing morphological variants in searches of Latin text[J]. Information Research News, 1996, 6(4): 2-5.
[13] Berlian V., Vega S. N., Bressan S. Indexing the Indonesian web: Language identification and miscellaneous issues[C]//Proceedings of 10th International World Wide Web Conference, Hong Kong, 2001.
[14] G. Eryigˇit & E. Adal. An Affix Stripping Morphological Analyzer for Turkish[C]//Proceedings of the IASTED International Conference ARTIFICIAL INTELLIGENCE AND APPLICATIONS, 2004, Innsbruck, Austria.
[15] M.F. Porter. An algorithm for suffix stripping[J]. Program,1980, 14(3): 130-137.
[16] Massimo,M., Nicola,O. A Novel Method for Stemmer Generation Based on Hidden Markov Models. Conference on Information and Knowledge Management archive[C]//Proceedings of the 12th International Conference on Information and Knowledge Management, 2003: 131-134.
[17] 早克热·卡德尔,艾山·吾买尔,吐尔根·依布拉音,艾斯卡尔·艾木都拉. 维吾尔语名词构形词缀有限状态自动机的构造[J].中文信息学报,2009,23(6):116-121.
[18] 早克热·卡德尔, 吐尔根·依布拉音. 维吾尔语形容词构形词缀有限状态自动机的构造[J]. 电脑知识与技术,2009,(4):939-941.

基金

2011年度教育部人文社会科学青年基金资助项目(11YJC740001);国家社会科学基金资助项目(10AYY006);新疆维吾尔自治区普通高等学校人文社会科学重点研究基地基金资助项目(010812B04)
PDF(4235 KB)

509

Accesses

0

Citation

Detail

段落导航
相关文章

/