面向神经机器翻译的集成学习方法分析

PDF(2594 KB)

中文信息学报 ›› 2019, Vol. 33 ›› Issue (3) : 42-51.

机器翻译

面向神经机器翻译的集成学习方法分析

李北¹,王强¹,肖桐¹,姜雨帆¹,张哲旸¹,刘继强¹,张俐¹,于清²

作者信息 +

On Ensemble Learning of Neural Machine Translation

LI Bei¹, WANG Qiang¹, XIAO Tong¹, JIANG Yufan¹, ZHANG Zheyang¹, LIU Jiqiang¹, ZHANG Li¹, YU Qing²

Author information +

History +

摘要

集成学习是一种联合多个学习器进行协同决策的机器学习方法,应用在机器翻译任务的推断过程中可以有效整合多个模型预测的概率分布,达到提升翻译系统准确性的目的。虽然该方法的有效性已在机器翻译评测中得到了广泛验证,但关于子模型的选择与融合的策略仍鲜有研究。该文主要针对机器翻译任务中的参数平均与模型融合两种集成学习方法进行大量的实验,分别从模型与数据层面、多样性与模型数量层面对集成学习的策略进行了深入探索。实验结果表明在WMT中英新闻任务上,所提模型相比Transformer单模型有3.19个BLEU值的提升。

Abstract

Ensemble learning has been extensively proved valid in machine translation evaluation campaigns, but the sub-model selection and integration strategies are not well addressed. This paper examines the two kinds of ensemble learning methods: parameter averaging and model fusion in machine translation tasks, and investigates the impact of diversity and model quantity on system performance from the perspectives of data and model. Experimental results show that the best result yields improvements of 3.19 BLEU points over the strong Transformer baseline on WMT Chinese-English MT tasks.

导出引用

李北,王强,肖桐,姜雨帆,张哲旸,刘继强,张俐,于清. 面向神经机器翻译的集成学习方法分析. 中文信息学报. 2019, 33(3): 42-51

LI Bei, WANG Qiang, XIAO Tong, JIANG Yufan, ZHANG Zheyang, LIU Jiqiang, ZHANG Li, YU Qing. On Ensemble Learning of Neural Machine Translation. Journal of Chinese Information Processing. 2019, 33(3): 42-51

参考文献

[1] Hansen L K,Salamon P. Neural network ensembles[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(10): 993-1001.
[2] Dietterich T G. Ensemble methods in machine learning[J]. Multiple Classifier Systems, 2000: 1-15.
[3] Bauer E J,Kohavi R. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants[J]. Machine Learning, 1999,36: 105-139.
[4] Opitz D W, Maclin R. Popular ensemble methods: An empirical study[J]. Journal of Artificial Intelligence Research, 1999, 11(1): 169-198.
[5] Zhou Z, Jiang Y. Medical diagnosis with C 4.5 rule preceded by artificial neural network ensemble[J].IEEE Transactions on Information Technology in Biomedicine, 2003,7(1): 37-42.
[6] Xu L,Krzyzak A, Suen C Y, et al. Methods of combining multiple classifiers and their applications to handwriting recognition[J]. Systems Man and Cybernetics, 1992, 22(3): 418-435.
[7] Xiao T, Zhu J, Liu T, et al. Bagging and boosting statistical machine translation systems[J]. Artificial Intelligence, 2013: 496-527.
[8] Sennrich R, Haddow B, Birch A. Edinburgh neural machine translation systems for WMT16[J]. arXiv preprint arXiv: 1606.02891,2016.
[9] Sennrich R, Birch A, Currey A, et al. The University of Edinburghs neural MT Systems for WMT17[J]. arXiv preprint arXiv: 1708.00726,2017.
[10] Wang Y, Cheng S, Jiang L, et al.Sogou neural machine translation systems for WMT17[C]//Proceedings of the Conference on Machine Translation, 2017: 410-415.
[11] Tan Z, Wang B, Hu J, et al. XMU neural machine translation systems for WMT17[C]//Proceedings of the Conference on Machine Translation, 2017: 400-404.
[12] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J].arXiv preprint arXiv: 1706.03762,2017.
[13] Zhou Z, Shin J Y, Zhang L, et al. Fine-tuning con-volutional neural networks for biomedical image analysis: Actively and incrementally[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4761-4772.
[14] T G Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization[J].Machine Learning.2000b, 40(2): 139-157,
[15] Bahdanau D, Cho K, Bengio Y, et al. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 1409.0473,2014.
[16] Sutskever I, Vinyals O, Le Q V, et al. Sequence to sequence learning with neural networks[J].arXiv preprint arXiv: 1409.3215,2014.
[17] Ba J L,Kiros J R, Hinton G E. Layer normalization[J]. arXiv preprint arXiv: 1607.06450,2016.
[18] Tong Xiao,Jingbo Zhu, Hao Zhang, et al. Niutrans: An open source toolkit for phrase-based and syntax-based machine translation[C]//Proceedings of the ACL 2012 System Demonstrations. Association for Computational Linguistics, 2012: 19-24.
[19] Moore R C, Lewis W D. Intelligent Selection of Language model training data[C]//Proceedings of the ACL 2010 Conference Short papers, 2010: 220-224.
[20] Sennrich R, Haddow B, Birch A, et al. Improving neural machine translation models with monolingual data[J].arXiv preprint arXiv: 1511.06709,2015.
[21] Koehn P, Hoang H, Birch A, et al. Moses: Open source toolkit for statistical machine translation[C]//Proceedings of the 2006 Language Engineering Workshop, 2006.
[22] Sennrich R, Haddow B, Birch A, et al. Neural machine translation of rare words with subword units[J]. arXiv preprint arXiv: 1508.07909,2016.
[23] Kingma D P, Ba J. Adam: A method for stochastic optimization[C]//Proceedings of the International Conference on Learning Representations, 2015.
[24] Shu R, Nakayama H. Later-stage minimumbayes-risk decoding for neural machine translation[J]. arXiv preprint arXiv: 1704.03169,2017.
[25] Zhou L, Hu W, Zhang J, et al.: Neural system combination for machine translation[J]. arXiv preprint arXiv: 1704.06393,2017.
[26] Freitag M, Alonaizan Y, Sankaran B. Ensemble distillation for neural machine translation[J].arXiv preprint arXiv: 1702.01802, 2017.

基金

国家自然科学基金(61876035,61732005,61562082);中央高校基本科研业务费;辽宁省高等学校创新人才支持计划

PDF(2594 KB)

984

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

Received	Published
2018-06-10	2019-03-15
Issue Date
2019-03-15

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金