单纯形算法在统计机器翻译Re-ranking中的应用

付雷,刘群

PDF(283 KB)
PDF(283 KB)
中文信息学报 ›› 2007, Vol. 21 ›› Issue (3) : 28-33.
综述

单纯形算法在统计机器翻译Re-ranking中的应用

  • 付雷1,2,刘群2
作者信息 +

Re-ranking for Statistical Machine Translation Using Simplex Algorithm

  • FU Lei1,2, LIU Qun2
Author information +
History +

摘要

近年来,discriminative re-ranking技术已经被应用到很多自然语言处理相关的分支中,像句法分析,词性标注,机器翻译等,并都取得了比较好的效果,在各自相应的评估标准下都有所提高。本文将以统计机器翻译为例,详细地讲解利用单纯形算法(Simplex Algorithm)对翻译结果进行re-rank的原理和过程,算法的实现和使用方法,以及re-rank实验中特征选择的方法,并给出该算法在NIST-2002(开发集)和NIST-2005(测试集)中英文机器翻译测试集合上的实验结果,在开发集和测试集上,BLEU分值分别获得了1.26%和1.16%的提高。

Abstract

Recently, discriminative re-ranking technique has been applied in many fields relative to NLP (Natural Language Processing), such as parsing, pos-tagging, and machine translation etc., and performs very well. We will take SMT as an example to explain how to re-rank the translation candidates using Simplex Algorithm in detail and give the experiment results on NIST-2002(development set) and NIST_2005(test set) Chinese-to-English test sets. Our experiments show that we can gain significant improvements in BLEU by re-ranking. It can provide 1.26% absolute increase in development set and 1.16% absolute increase in test set.

关键词

人工智能 / 机器翻译 / discriminative re-ranking / 单纯形算法 / 统计机器翻译

Key words

artificial intelligence / machine translation / discriminative re-ranking / simplex algorithm / SMT

引用本文

导出引用
付雷,刘群. 单纯形算法在统计机器翻译Re-ranking中的应用. 中文信息学报. 2007, 21(3): 28-33
FU Lei, LIU Qun. Re-ranking for Statistical Machine Translation Using Simplex Algorithm. Journal of Chinese Information Processing. 2007, 21(3): 28-33

参考文献

[1] Ashish Venugopal and Stephan Vogel. Considerations in Maximum Mutual Information and Minimum Classification Error training for Statistical Machine Translation [A]. In: EAMT 2005 Conference Proceedings [C].
[2] B. Chen, R. Cattoni, N. Bertoldi, M. Cettolo, M. Federico. The ITC-irst SMT System for IWSLT-2005 [A].
[3] Franz Josef Och. Minimum error rate training in statistical machine translation [A]. In: Pro. of ACL 2003 [C].
[4] Franz Josef Och and Hermann Ney. Discriminative Trainging and Maximum Entropy Models for Statistical Machine Translation [A]. In: Proceedings of the 40th Annual Meeting of the ACL [C]. Philadelphia, July 2002, pp.295-302.
[5] I. Dan Melamed. A Word-to-Word Model of Translational Equivalence [A]. In: Pro. of 35th Conference of the Association for Computational Linguistics (ACL’97) [C]. Madrid, 1997. 490-497.
[6] Libin Shen and A. K. Joshi. An SVM based voting algorithm with application to parse reranking [A]. In: Proc. of CoNLL 2003 [C].
[7] Libin Shen, Anoop Sarkar, Franz Josef Och. Discriminative Reranking for Machine Translation [A]. In: Proc. HLTNAACL 2004 [C].
[8] M. Cettolo, M. Federico, N. Bertoldi, R. Cattoni and B. Chen. A Look inside the ITC-irst SMT System [A]. In: Proceedings of the 10th MT-Summit [C]. Phuket, Thailand. 2005.
[9] M. Collins and N. Duffy. New ranking algorithm for parsing and tagging: Kernels over discret structures, and the voted perceptron [A]. In: Proceedings of ACL 2002 [C].
[10] P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, R. L. Mercer. The Mathematics of Statistical Machine Translation [J]. Computational Linguistics,1993, 19(2).
[11] Philipp Koehn, Franz Josef Och, and Daniel Marcu . Statistical phrase-based translation [A]. In: Proceedings of HLTNAACL [C]. 2003.127-133.
[12] W.M. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing [M]. Cambridge Univ. Press, 1992.
[13] Zhongjun He, Yang Liu, Deyi Xiong, Hongxu Hou, and Qun Liu. ICT System Description for the 2006 TC-STAR Run#2 SLT Evaluation [A]. TC-STAR Evaluation Workshop [C]. Barcelona, Spain, June 19-21. 2006.
[14] 刘群.统计机器翻译综述[J].中文信息学报,2003,19(4): 1-12.

基金

教育部人文社会科学重点研究基地重大研究项目(05JJD740176)
PDF(283 KB)

Accesses

Citation

Detail

段落导航
相关文章

/