机器翻译系统融合技术综述

李茂西,宗成庆

PDF(1262 KB)
PDF(1262 KB)
中文信息学报 ›› 2010, Vol. 24 ›› Issue (4) : 74-85.
综述

机器翻译系统融合技术综述

  • 李茂西,宗成庆
作者信息 +

A Survey of System Combination for Machine Translation

  • LI Maoxi, ZONG Chengqing
Author information +
History +

摘要

该文对机器翻译研究中的系统融合方法进行了全面综述和分析。根据在多系统输出结果的基础上进行融合的层次差异,我们将系统融合方法分为三类 句子级系统融合、短语级系统融合和词汇级系统融合。然后,针对这三种融合方法,该文分别介绍了它们各自具有代表性的研究工作,包括实现方法、置信度估计和解码算法等,并着重阐述了近年来使用广泛的词汇级系统融合方法中用于构造混淆网络的词对齐技术。最后,该文对这三类系统融合方法进行了比较、总结和展望。

Abstract

This paper presents a survey of system combination for machine translation (MT). According to the different levels of combining the outputs from different machine translation systems, we classify the approaches to system combination into three typessentence-level combination, phrase-level combination, and word-level combination. The representative work for each type is discussed in this paper, including the methods exploited, confidences estimated, and decoding algorithms, as well as the monolingual sentence alignment approaches which used to build the confusion network in the word-level system combination method. Finally, we discuss the three combination approaches and compare them with each other. The future development prospects of MT system combination are also discussed.
Key wordsartificial intelligence; machine translation; system combination; minimum Bayes-risk decoding; confusion network decoding; word alignment

关键词

人工智能 / 机器翻译 / 系统融合 / 最小贝叶斯风险解码 / 混淆网络解码 / 词对齐

Key words

artificial intelligence / machine translation / system combination / minimum Bayes-risk decoding / confusion network decoding / word alignment

引用本文

导出引用
李茂西,宗成庆. 机器翻译系统融合技术综述. 中文信息学报. 2010, 24(4): 74-85
LI Maoxi, ZONG Chengqing. A Survey of System Combination for Machine Translation. Journal of Chinese Information Processing. 2010, 24(4): 74-85

参考文献

[1] 宗成庆. 统计自然语言处理[M]. 北京: 清华大学出版社, 2008.
[2] 刘群. 统计机器翻译综述[J]. 中文信息学报, 2003,17(4): 1-12.
[3] R. Frederking, S. Nirenburg. Three heads are better than one[C]//Proceedings of the fourth Conference on Applied Natural Language Processing. 1994: 95-100.
[4] J. G. Fiscus. A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)[C]//IEEE Workshop on Automatic Speech Recognition and Understanding. 1997: 347-354.
[5] S. Bangalore, F. Bordel, G. Riccardi. Computing consensus translation from multiple machine translation systems[C]//IEEE Workshop on Automatic Speech Recognition and Understanding. ASRU’01, 2001: 351-354.
[6] S. Kumar, W. Byrne. Minimum bayes-risk decoding for statistical machine translation[C]//Proc. HLT-NAACL. Boston, MA, USA, 2004: 196-176.
[7] A.-V. I. Rosti, N. F. Ayan, B. Xiang, et al. Combining outputs from multiple machine translation systems[C]//Proceedings of NAACL HLT. Rochester, NY, 2007: 228-235.
[8] K. Papineni, S. Roukos, T. Ward, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002). Philadelphia, PA, 2002: 311-318.
[9] M. Snover, B. Dorr, R. Schwartz, et al. A study of translation edit rate with targeted human annotation[C]//Proceedings of the 7th Conference of the Association for Machine Translation in the Americas. Cambridge, 2006: 223-231.
[10] F. J. Och, H. Ney. A systematic comparison of various statistical alignment models[J]. Computational Linguistics. 2003, 29(1): 19-51.
[11] P. Koehn, H. Hoang, A. Birch, et al. Moses: Open Source Toolkit for Statistical Machine Translation[C]//Proceedings of the ACL 2007 Demo and Poster Sessions. Prague, 2007: 177-180.
[12] F. Huang, K. Papineni. Hierarchical system combination for machine translation[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Prague, 2007: 277-286.
[13] B. Mellebeek, K. Owczarzak, J. V. Genabith, et al. Multi-engine machine translation by recursive sentence decomposition[C]//Proceedings of the 7th Conference of the Association for Machine Translation in the Americas. Cambridge, 2006: 110-118.
[14] M. Li, C. Zong. Word reordering alignment for combination of statistical machine translation systems[C]//International Symposium on Chinese Spoken Language Processing (ISCSLP). Kunming, China, 2008: 273-276..
[15] A.-V. I. Rosti, S. Matsoukas, R. Schwartz. Improved Word-Level System Combination for Machine Translation[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Republic, 2007: 312-319.
[16] B. Chen, M. Zhang, A. Aw, et al. Regenerating hypotheses for statistical machine translation[C]//Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, 2008: 105-112.
[17] 杜金华, 魏玮, 徐波. 基于混淆网络解码的机器翻译多系统融合[J]. 中文信息学报, 2008,22(4): 48-54.
[18] X. He, M. Yang, J. Gao, et al. Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Honolulu, 2008: 98-107.
[19] N. F. Ayan, J. Zheng, W. Wang. Improving alignments for better confusion networks for combining machine translation systems[C]//Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). Manchester, 2008: 33-40.
[20] R. P. Brent. Algorithms for minimization without derivatives[M]. Prentice-Hall, 1973.
[21] F. J. Och. Minimum error rate training in statistical machine translation[C]//Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1. Sapporo, Japan, 2003.
[22] K. C. Sim, W. J. Byrne, M. J. F. Gales, et al. Consensus Network Decoding for Statistical Machine Translation System Combination[C]//IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007). 2007: 105-108.
[23] E. Matusov, N. Ueffing, H. Ney. Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment[C]//The 11th Conference of the European Chapter of the Association for Computational Linguistics(EACL-2006). Trento, Italy, 2006: 33-40.
[24] A.-V. I. Rosti, B. Zhang, S. Matsoukas,et al. Incremental hypothesis alignment for building confusion networks with application to machine translation system combination[C]//Proceedings of the Third Workshop on Statistical Machine Translation. Columbus, Ohio, USA, 2008: 183-186.
[25] D. Wu. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora[J]. Computational Linguistics. 1997,23(3): 377-403.
[26] D. Karakos, J. Eisner, S. Khudanpur, et al. Machine Translation System Combination using ITG-based Alignments[C]//Proceedings of ACL-08: HLT, Short Papers (Companion Volume). Columbus, Ohio, USA, 2008: 81-84.
[27] G. Leusch, N. Uef?ng, H. Ney. A novel string-to-string distance measure with applications to machine translation evaluation[C]//Proceedings of MT Summit IX. 2003: 33-40.
[28] K. Ganchev, J. V.Graca, B. Taskar. Better Alignments=Better Translations?[C]//Proceedings of ACL-08: HLT. Columbus, Ohio, 2008: 986-993.
[29] 张剑, 吴际, 周明. 机器翻译评测的新进展[J]. 中文信息学报, 2003,17(6):1-8.
[30] 赵红梅, 谢军, 吕亚娟,等. 第四届全国机器翻译研讨会(CWMT’2008)评测报告[C]//机器翻译研究进展(第四届全国机器翻译研讨会论文集). 北京, 2008: 2-32.
[31] W. Macherey, F. J. Och. An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Prague, 2007: 986-995.
[32] K.-Y. Su. To have linguistic tree structures in statistical machine translation?[C]//Natural Language Processing and Knowledge Engineering (IEEE NLP-KE’05). Wuhan, China, 2005.

基金

国家自然科学基金资助项目(60975053,90820303,60736014);国家支撑计划资助项目(2006BAH03B02);国家863计划资助项目(206AA010108-4);中国新加坡数字媒体研究院资助项目(CSIDM-200804)
PDF(1262 KB)

797

Accesses

0

Citation

Detail

段落导航
相关文章

/