面向神经机器翻译的集成学习方法分析

李北,王强,肖桐,姜雨帆,张哲旸,刘继强,张俐,于清

PDF(2594 KB)
PDF(2594 KB)
中文信息学报 ›› 2019, Vol. 33 ›› Issue (3) : 42-51.
机器翻译

面向神经机器翻译的集成学习方法分析

  • 李北1,王强1,肖桐1,姜雨帆1,张哲旸1,刘继强1,张俐1,于清2
作者信息 +

On Ensemble Learning of Neural Machine Translation

  • LI Bei1, WANG Qiang1, XIAO Tong1, JIANG Yufan1, ZHANG Zheyang1, LIU Jiqiang1, ZHANG Li1, YU Qing2
Author information +
History +

摘要

集成学习是一种联合多个学习器进行协同决策的机器学习方法,应用在机器翻译任务的推断过程中可以有效整合多个模型预测的概率分布,达到提升翻译系统准确性的目的。虽然该方法的有效性已在机器翻译评测中得到了广泛验证,但关于子模型的选择与融合的策略仍鲜有研究。该文主要针对机器翻译任务中的参数平均与模型融合两种集成学习方法进行大量的实验,分别从模型与数据层面、多样性与模型数量层面对集成学习的策略进行了深入探索。实验结果表明在WMT中英新闻任务上,所提模型相比Transformer单模型有3.19个BLEU值的提升。

Abstract

Ensemble learning has been extensively proved valid in machine translation evaluation campaigns, but the sub-model selection and integration strategies are not well addressed. This paper examines the two kinds of ensemble learning methods: parameter averaging and model fusion in machine translation tasks, and investigates the impact of diversity and model quantity on system performance from the perspectives of data and model. Experimental results show that the best result yields improvements of 3.19 BLEU points over the strong Transformer baseline on WMT Chinese-English MT tasks.

关键词

集成学习 / 参数平均 / 模型融合 / 多样性

Key words

ensemble learning / parameter averaging / model fusion / diversity

引用本文

导出引用
李北,王强,肖桐,姜雨帆,张哲旸,刘继强,张俐,于清. 面向神经机器翻译的集成学习方法分析. 中文信息学报. 2019, 33(3): 42-51
LI Bei, WANG Qiang, XIAO Tong, JIANG Yufan, ZHANG Zheyang, LIU Jiqiang, ZHANG Li, YU Qing. On Ensemble Learning of Neural Machine Translation. Journal of Chinese Information Processing. 2019, 33(3): 42-51

参考文献

[1] Hansen L K,Salamon P. Neural network ensembles[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(10): 993-1001.
[2] Dietterich T G. Ensemble methods in machine learning[J]. Multiple Classifier Systems, 2000: 1-15.
[3] Bauer E J,Kohavi R. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants[J]. Machine Learning, 1999,36: 105-139.
[4] Opitz D W, Maclin R. Popular ensemble methods: An empirical study[J]. Journal of Artificial Intelligence Research, 1999, 11(1): 169-198.
[5] Zhou Z, Jiang Y. Medical diagnosis with C 4.5 rule preceded by artificial neural network ensemble[J].IEEE Transactions on Information Technology in Biomedicine, 2003,7(1): 37-42.
[6] Xu L,Krzyzak A, Suen C Y, et al. Methods of combining multiple classifiers and their applications to handwriting recognition[J]. Systems Man and Cybernetics, 1992, 22(3): 418-435.
[7] Xiao T, Zhu J, Liu T, et al. Bagging and boosting statistical machine translation systems[J]. Artificial Intelligence, 2013: 496-527.
[8] Sennrich R, Haddow B, Birch A. Edinburgh neural machine translation systems for WMT16[J]. arXiv preprint arXiv: 1606.02891,2016.
[9] Sennrich R, Birch A, Currey A, et al. The University of Edinburghs neural MT Systems for WMT17[J]. arXiv preprint arXiv: 1708.00726,2017.
[10] Wang Y, Cheng S, Jiang L, et al.Sogou neural machine translation systems for WMT17[C]//Proceedings of the Conference on Machine Translation, 2017: 410-415.
[11] Tan Z, Wang B, Hu J, et al. XMU neural machine translation systems for WMT17[C]//Proceedings of the Conference on Machine Translation, 2017: 400-404.
[12] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J].arXiv preprint arXiv: 1706.03762,2017.
[13] Zhou Z, Shin J Y, Zhang L, et al. Fine-tuning con-volutional neural networks for biomedical image analysis: Actively and incrementally[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4761-4772.
[14] T G Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization[J].Machine Learning.2000b, 40(2): 139-157,
[15] Bahdanau D, Cho K, Bengio Y, et al. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 1409.0473,2014.
[16] Sutskever I, Vinyals O, Le Q V, et al. Sequence to sequence learning with neural networks[J].arXiv preprint arXiv: 1409.3215,2014.
[17] Ba J L,Kiros J R, Hinton G E. Layer normalization[J]. arXiv preprint arXiv: 1607.06450,2016.
[18] Tong Xiao,Jingbo Zhu, Hao Zhang, et al. Niutrans: An open source toolkit for phrase-based and syntax-based machine translation[C]//Proceedings of the ACL 2012 System Demonstrations. Association for Computational Linguistics, 2012: 19-24.
[19] Moore R C, Lewis W D. Intelligent Selection of Language model training data[C]//Proceedings of the ACL 2010 Conference Short papers, 2010: 220-224.
[20] Sennrich R, Haddow B, Birch A, et al. Improving neural machine translation models with monolingual data[J].arXiv preprint arXiv: 1511.06709,2015.
[21] Koehn P, Hoang H, Birch A, et al. Moses: Open source toolkit for statistical machine translation[C]//Proceedings of the 2006 Language Engineering Workshop, 2006.
[22] Sennrich R, Haddow B, Birch A, et al. Neural machine translation of rare words with subword units[J]. arXiv preprint arXiv: 1508.07909,2016.
[23] Kingma D P, Ba J. Adam: A method for stochastic optimization[C]//Proceedings of the International Conference on Learning Representations, 2015.
[24] Shu R, Nakayama H. Later-stage minimumbayes-risk decoding for neural machine translation[J]. arXiv preprint arXiv: 1704.03169,2017.
[25] Zhou L, Hu W, Zhang J, et al.: Neural system combination for machine translation[J]. arXiv preprint arXiv: 1704.06393,2017.
[26] Freitag M, Alonaizan Y, Sankaran B. Ensemble distillation for neural machine translation[J].arXiv preprint arXiv: 1702.01802, 2017.

基金

国家自然科学基金(61876035,61732005,61562082);中央高校基本科研业务费;辽宁省高等学校创新人才支持计划
PDF(2594 KB)

984

Accesses

0

Citation

Detail

段落导航
相关文章

/