近年来,discriminative re-ranking技术已经被应用到很多自然语言处理相关的分支中,像句法分析,词性标注,机器翻译等,并都取得了比较好的效果,在各自相应的评估标准下都有所提高。本文将以统计机器翻译为例,详细地讲解利用单纯形算法(Simplex Algorithm)对翻译结果进行re-rank的原理和过程,算法的实现和使用方法,以及re-rank实验中特征选择的方法,并给出该算法在NIST-2002(开发集)和NIST-2005(测试集)中英文机器翻译测试集合上的实验结果,在开发集和测试集上,BLEU分值分别获得了1.26%和1.16%的提高。
Abstract
Recently, discriminative re-ranking technique has been applied in many fields relative to NLP (Natural Language Processing), such as parsing, pos-tagging, and machine translation etc., and performs very well. We will take SMT as an example to explain how to re-rank the translation candidates using Simplex Algorithm in detail and give the experiment results on NIST-2002(development set) and NIST_2005(test set) Chinese-to-English test sets. Our experiments show that we can gain significant improvements in BLEU by re-ranking. It can provide 1.26% absolute increase in development set and 1.16% absolute increase in test set.
关键词
人工智能 /
机器翻译 /
discriminative re-ranking /
单纯形算法 /
统计机器翻译
{{custom_keyword}} /
Key words
artificial intelligence /
machine translation /
discriminative re-ranking /
simplex algorithm /
SMT
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Ashish Venugopal and Stephan Vogel. Considerations in Maximum Mutual Information and Minimum Classification Error training for Statistical Machine Translation [A]. In: EAMT 2005 Conference Proceedings [C].
[2] B. Chen, R. Cattoni, N. Bertoldi, M. Cettolo, M. Federico. The ITC-irst SMT System for IWSLT-2005 [A].
[3] Franz Josef Och. Minimum error rate training in statistical machine translation [A]. In: Pro. of ACL 2003 [C].
[4] Franz Josef Och and Hermann Ney. Discriminative Trainging and Maximum Entropy Models for Statistical Machine Translation [A]. In: Proceedings of the 40th Annual Meeting of the ACL [C]. Philadelphia, July 2002, pp.295-302.
[5] I. Dan Melamed. A Word-to-Word Model of Translational Equivalence [A]. In: Pro. of 35th Conference of the Association for Computational Linguistics (ACL’97) [C]. Madrid, 1997. 490-497.
[6] Libin Shen and A. K. Joshi. An SVM based voting algorithm with application to parse reranking [A]. In: Proc. of CoNLL 2003 [C].
[7] Libin Shen, Anoop Sarkar, Franz Josef Och. Discriminative Reranking for Machine Translation [A]. In: Proc. HLTNAACL 2004 [C].
[8] M. Cettolo, M. Federico, N. Bertoldi, R. Cattoni and B. Chen. A Look inside the ITC-irst SMT System [A]. In: Proceedings of the 10th MT-Summit [C]. Phuket, Thailand. 2005.
[9] M. Collins and N. Duffy. New ranking algorithm for parsing and tagging: Kernels over discret structures, and the voted perceptron [A]. In: Proceedings of ACL 2002 [C].
[10] P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, R. L. Mercer. The Mathematics of Statistical Machine Translation [J]. Computational Linguistics,1993, 19(2).
[11] Philipp Koehn, Franz Josef Och, and Daniel Marcu . Statistical phrase-based translation [A]. In: Proceedings of HLTNAACL [C]. 2003.127-133.
[12] W.M. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing [M]. Cambridge Univ. Press, 1992.
[13] Zhongjun He, Yang Liu, Deyi Xiong, Hongxu Hou, and Qun Liu. ICT System Description for the 2006 TC-STAR Run#2 SLT Evaluation [A]. TC-STAR Evaluation Workshop [C]. Barcelona, Spain, June 19-21. 2006.
[14] 刘群.统计机器翻译综述[J].中文信息学报,2003,19(4): 1-12.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
教育部人文社会科学重点研究基地重大研究项目(05JJD740176)
{{custom_fund}}