词汇化信息在短语调序中有重要的作用。然而层次短语翻译模型调序时并不考虑变量所泛化的短语的词汇化信息,因此该模型调序的歧义性较大。为此该文提出面向层次短语模型的词汇化调序方法。我们定义变量与邻接词语的调序关系,并使用变量所泛化短语片段的边界词信息来指导调序。在大规模语料的汉语到英语翻译评测任务中,我们的方法在NIST 2003-2005测试数据上获得了0.6~1.2 BLEU值的提高。
Abstract
Lexical information plays an important role in the phrase reordering. However, the reordering in the hierarchical phrase-based (HPB) model does not consider the lexical information within the phrases, resulting in the reordering ambiguity. To alleviate this, we propose a lexicalized reordering method for the HPB translation. We distinguish two orientations of a variable comparing to its adjacent words, and use boundary words covered by the variable to guide reordering choices. In the large scale Chinese-English translation evaluation task, the proposed method improves the translation performance ranging from 0.6 to 1.2 BLEU on NIST 2003-2005 test-sets.
Key wordsstatistical machine translation; hierarchical phrase-based; lexical reordering
关键词
统计机器翻译 /
层次短语 /
词汇化调序
{{custom_keyword}} /
Key words
statistical machine translation /
hierarchical phrase-based /
lexical reordering
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] David Chiang. A hierarchical phrase-based model for statistical machine translation [C]//Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. 2005: 263-270.
[2] David Chiang. Hierarchical phrase-based translation [J]. Computational Linguistics. 2007, 33(2): 201-228.
[3] Philipp Koehn, Franz Joseph Och, Daniel Marcu. Statistical Phrase-Based Translation [C]//Proceedings of NAACL 2003. 2003.
[4] Christoph Tillman. A unigram orientation model for statistical machine translation [C]//Proceedings of HLT-NAACL 2004: Short Papers. 2004: 101-104.
[5] Philipp Koehn, Amittai Axelrod, Alexandra Birch Mayne, et al. Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation[C]//Proceedings of IWSLT 2005, 2005.
[6] Philipp Koehn, Hieu Hoang, Alexandra Birch, et al. Moses: Open Source Toolkit for Statistical Machine Translation [C]//Proceeding of ACL 2007, demonstration session. 2007.
[7] Michel Galley, Christopher D. Manning. A simple and effective hierarchical phrase reordering model [C]//Proceedings of EMNLP 2008. 2008: 848-856.
[8] Franz Josef Och, Hermann Ney. Discriminative training and maximum entropy models for statistical machine translation [C]//Proceedings of the 40thAnnual Meeting of the Association for Computational Linguistics. 2002: 295-302.
[9] Kishore Papineni, Salim Roukos, Todd Ward, et al. Bleu: a method for automatic evaluation of machine translation [C]//Proceedings of ACL 2002. 2002.
[10] Deyi Xiong, Qun Liu and Shouxun Lin. Maximum Entropy Based on Phrase Reordering Model for Statistical Machine Translation [C]//Proceedings of ACL 2006,2006.
[11] Zhongjun He, Qun Liu, Shouxun Lin. Improving statistical machine translation using lexicalized rule selection[C]//Proceedings of EMNLP 2008, 2008.
[12] Zhongjun He, Yao Meng, Hao Yu. Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-based Translation [C]//Proceedings of EMNLP 2010, 2010.
[13] Franz Josef Och, Hermann Ney. A systematic comparison of various statistical alignment models[J]. Computational Linguistics, 2004, 29(1): 19-51.
[14] Andreas Stolcke. 2002. Srilm-an extensible language modeling toolkit [C]//Proceedings of the 7th International Conference on Spoken Language Processing. 2002: 901-904.
[15] Franz Joseph Och. Minimum error rate training in statistical machine translation [C]//Proceedings of ACL 2003. 2003.
[16] Yang Liu, Qun Liu, Shouxun Lin. Tree-to-String Alignment Template for Statistical Machine Translation [C]//Proceedings of ACL 2006. 2006.
[17] Michel Galley, Jonathan Graehl, Kevin Knight, et al. Scalable Inference and Training of Context-Rich Syntactic Translation Models [C]//Proceedings of ACL 2006. 2006.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金重点资助项目(60736014);国家自然基金资助项目(60873167)
{{custom_fund}}