该文对基于短语的统计机器翻译模型的删词问题进行研究与分析,使用人工评价的方式将删词错误分为3类。该文通过两种方法,即基于频次的方法和基于词性标注的方法,对源语言句子中关键词汇进行识别。通过对传统的短语对抽取算法中引入源语言对空关键词汇的约束来缓解删词错误问题。自动评价方法以及人工评价方法证明,该方法在汉英翻译任务以及英汉翻译任务中显著的缓解了删词错误问题,同时得到一个精简的短语翻译表。
Abstract
This paper addresses the word deletion issue in phrase-based machine translation. After accounting word deletion errors for three causes from the persective of human reading, we propose to introduce constraints on unaligned words of source language in phrase extraction to deal with this issue. Two methods are presented for the design of the constraints, including a frequency-based method and a part-of-speech-based method. Automatic and human evaluations demonstrate promising improvements in translation quality on both the Chinese-to-English and the English-to-Chinese translation tasks, on the basis of a more compact phrase tables.
关键词
统计机器翻译 /
删词问题 /
人工评价
{{custom_keyword}} /
Key words
statistical machine translation /
word deletion issue /
human evaluation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Philipp Koehn, Fran J Och, Daniel Marcu. Statistical phrase-based translation[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 2003: 48-54.
[2] Franz J Och, Heymann Ney. The alignment template approach to statistical machine translation[J]. Computational Linguistics, 2004, 30(4): 417-449.
[3] Franz J Och, Christoph Tillmann, Heymann Ney. Improved alignment models for statistical machine translation[C]//Proceedings of the 1999 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1999: 20-28.
[4] David Vilar, Jia Xu, Luis Fernando DHaro, et al. Error analysis of statistical machine translation output[C]//Proceedings of International Conference on Language Resources and Evaluation. 2006: 697-702.
[5] Chi-Ho Li, Dongdong Zhang, Mu Li, et al. An empirical study in source word deletion for phrase-based statistical machine translation[C]//Proceedings of the Third Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2008: 1-8.
[6] Tong Xiao, Jingbo Zhu, Hao Zhang, et al. NiuTrans: an open source toolkit for phrase-based and syntax-based machine translation[C]//Proceedings of the ACL 2012 System Demonstrations. Association for Computational Linguistics, 2012: 19-24.
[7] Franz J Och, Hermann Ney. Improved statistical alignment models[C]//Proceedings of the 38th Annual Meeting on Association for Computation Linguistics. Association for Computational Linguistics, 2000: 440-447.
[8] Franz J Och. Minimum error rate training in statistical machine translation[C]//Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 2003: 160-167.
[9] Kishore Papineni, Salim Roukos, Todd Ward, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2002: 311-318.
[10] Matthew Snover, Bonnie Dorr, Richard Schwartz, et al. A study of translation edit rate with targeted human annotation[C]//Proceedings of the 7th Conference of the Association for Machine Translation in the Americas. 2006: 223-231.
[11] Yuqi Zhang, Evgeny Matusov, Hermann Ney. Are unaligned words important for machine translation?[C]//Proceedings of the 13th Annual Conference of the EAMT. 2009: 226-233.
[12] Ulf Hermjakob. Improved word alignment with statistics and linguistic heuristics[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2009: 229-237.
[13] Jingbo Zhu, Qiang Li, Tong Xiao. Improving syntactic rule extraction through deleting spurious links with translation span alignment[J]. Natural Language Engineering, 2013: 1-23.
[14] Yang Liu, Qun Liu, Shouxun Lin. Discriminative word alignment by linear modeling[J]. Computational Linguistics, 2010, 36(3): 303-339.
[15] Yonggang Deng, Bowen Zhou. Optimizing word alignment combination for phrase table training[C]//Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Association for Computational Linguistics, 2009: 229-232.
[16] Arul Menezes, Chiris Quirk. Syntactic Models for Structural Word Insertion and Deletion[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2008: 735-744.
[17] Kristen Parton, Nizar Habash, Kathleen McKeown, et al. Can Automatic Post-Editing Make MT More Meaningful?[C]//Proceedings of the Conference on EAMT. 2012: 111-118.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61272376;61300097);中国博士后基金(2013M530131)
{{custom_fund}}