基于维吾尔语词干词缀粒度的汉维机器翻译

米莉万·雪合来提,刘 凯,吐尔根·依布拉音

PDF(3084 KB)
PDF(3084 KB)
中文信息学报 ›› 2015, Vol. 29 ›› Issue (3) : 201-206.
少数民族及周边语言信息处理

基于维吾尔语词干词缀粒度的汉维机器翻译

  • 米莉万·雪合来提1,2,3,刘 凯2,吐尔根·依布拉音1
作者信息 +

Chinese-Uyghur Machine Translation based on smallest Translation Units of Stems and Suffixes

  • Miliwan xuehelaiti1,2,3, LIU Kai2, Turgun Ibrahim1
Author information +
History +

摘要

汉语到维吾尔语的自动机器翻译有着重要的现实意义。目前对于汉维统计机器翻译方法的研究相对空白。该文提出了一种以维吾尔语为词干词缀粒度的汉维机器翻译方法。该方法利用维吾尔语形态分析后的词干词缀作为翻译的基本单位,并且根据其黏着语特性提出了一种基于有向图的维吾尔语“词干-词缀”语言模型。基于开放语料的实验证明我们的词干词缀翻译模型以及语言模型显著优于之前的基于词粒度的模型。

Abstract

Machine translation from Chinese to Uyghur has substantial real applications. Focusing on the insufficiently addressed issue, this paper, proposes a novel Chinese-Uyghur translation method employing stems and suffixes in Uyghur are used as the basic translation unit. Based on the directed graph, this “stem-suffix” language model is proved to be significant better than previous word based models.

关键词

维吾尔语 / 机器翻译 / 汉维翻译 / 词干 / 词缀 / 形态分析

Key words

Uyghur / machine translation / stem / suffix / morphologicalanalysis

引用本文

导出引用
米莉万·雪合来提,刘 凯,吐尔根·依布拉音. 基于维吾尔语词干词缀粒度的汉维机器翻译. 中文信息学报. 2015, 29(3): 201-206
Miliwan xuehelaiti, LIU Kai, Turgun Ibrahim. Chinese-Uyghur Machine Translation based on smallest Translation Units of Stems and Suffixes. Journal of Chinese Information Processing. 2015, 29(3): 201-206

参考文献

[1] Batuer AISHAN, Maosong SUN. Uyghur-Chinese Statistical Machine Translation by Incorporating Morphological Information[J].Journal of Computational System, 2010,6(10):3137-3145.
[2] 赵红梅,吕雅娟,贲国生,等. 第七届全国机器翻译研讨会(CWMT2011)评测报告[C]//第七届全国机器翻译研讨会论文集,2011:3-31.
[3] 刘凯,王志洋,于惠,等.2011全国机器翻译研讨会计算所系统描述[C]//第七届全国机器翻译研讨会论文集,2011: 46-58.
[4] Brown P F, Pietra V J D, Pietra S A D,et al. The mathematics of statistical machine translation: Parameter estimation [J]. Computational linguistics, 1993, 19:263-311.
[5] Koehn P, Och F J, Marcu D. Statistical phrase-based translation [C]//Proceedings of the 2003 Conference of the North American Chapter of the ACL on Human Language Technology-Volume 1, 2003:48-54.
[6] Chiang D. Hierarchical phrase-based translation [J]. Computational Linguistics, 2007, 33:201-228.
[7] Xiong D, Liu Q, Lin S. Maximum entropy based phrase reordering model for statistical machine translation [C]//Proceedings of the Association for Computational Linguistics, 2006:521-528.
[8] Liu Y, Liu Q, Lin S. Tree-to-string alignment template for statistical machine translation [C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, 2006:609-616.
[9] 阿依克孜·卡德尔,开沙尔·卡德尔,吐尔根·依不拉音.面向自然语言信息处理的维吾尔语名词形态分析研究[J].中文信息学报,2006,20(3):43-48.
[10] 姜文斌,吴金星,长青,等。蒙古语词法分析的有向图模型[J].中文信息学报,2011,25(5):94-100.
[11] 麦热哈巴·艾力,姜文斌,王志洋,等.维吾尔语词法分析的有限图模型[J];软件学报;2012,(23)12: 3115-3129.
[12] 麦热哈巴·艾力,姜文斌,吐尔根·伊布拉音.维吾尔语词法中音变现象的自动还原模型[J].中文信息学报,2012,26(1):91-96.
[13] Arianna Bisazza, Marcello Federico. Morphological pre-processing for Turkish to English statistical machine translation[C]//Proceedings of IWSLT,2009:1-135.
[14] Young-Suk Lee. Morphological analysis for statistical machine translation[C]//Proceedings of HLT-NAACL, Short Papers, 2004: 57-60.
[15] Minh-ThangLuong, PreslavNakov, Min-Yen Kan. A hybrid morpheme-word representation for machine translation of morphologically rich languages[C]//Proceedings of EMNLP,2010: 148-157.
[16] ReyyanYeniterzi, Kemal Oflazer. Syntaxto-morphology mapping in factored phrase-based statistical machine translation from English to Turkish[C]//Proceedings of ACL, 2010: 454-464.

基金

国家自然科学基金(61063026,61032008);国家社会科学基金(10AYY006);新疆多语种信息技术重点实验室开放课题。
PDF(3084 KB)

609

Accesses

0

Citation

Detail

段落导航
相关文章

/