词性信息在神经机器翻译中的作用分析

郑一雄,朱俊国,余正涛

PDF(19383 KB)
PDF(19383 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (12) : 26-35,43.
机器翻译

词性信息在神经机器翻译中的作用分析

  • 郑一雄1,2,朱俊国1,2,余正涛1,2
作者信息 +

Examination on Part-of-Speech in Neural Machine Translation

  • ZHENG Yixiong1,2, ZHU Junguo1,2, YU Zhengtao 1,2
Author information +
History +

摘要

基于注意力机制的Transformer模型在机器翻译任务中已经取得了巨大的成功,但该模型仍是一个“黑箱”模型,其内部工作机制尚无法被直观理解。因此,如何在原有翻译模型基础上进一步提高机器翻译系统的翻译质量成为一个难点。该文以Transformer模型为研究对象,分析词与模型中隐状态节点之间的关系,分别屏蔽神经翻译模型的单个节点和节点组合,通过屏蔽节点前后译文的变化情况来分析节点的贡献度。在实验中,为了克服数据稀疏问题,该文关注开发集译文中不同词性的词的变化,分析词性信息与节点之间的关系,并根据贡献度选择能够提高译文质量的负作用节点进行屏蔽。实验结果表明,通过这种屏蔽负作用节点的方式,可以在测试集上进一步提高译文质量。

Abstract

Transformer has achieved great success in machine translation tasks. This paper analyzes the relationship between words and attention heads in Transformer. We mask single head and head group of Transformer to get different translations and analyze the contributions of heads through calculating the differences of translations. Specifically, we focus on the differences of words of some parts-of-speech in the translation. This paper analyzes the relationship between the parts-of-speeches and heads, and we can improve the translation quality by masking selected heads according to the heads' contribution. The experimental results show that we can further improve the translation quality on the test set by masking heads with negative effect to translation qualtiy.

关键词

神经机器翻译 / 注意力机制 / 可解释性 / 词性信息

Key words

neural machine translation mechanism / attention / interpretation / part-of-speech

引用本文

导出引用
郑一雄,朱俊国,余正涛. 词性信息在神经机器翻译中的作用分析. 中文信息学报. 2023, 37(12): 26-35,43
ZHENG Yixiong, ZHU Junguo, YU Zhengtao. Examination on Part-of-Speech in Neural Machine Translation. Journal of Chinese Information Processing. 2023, 37(12): 26-35,43

参考文献

[1] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[2] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the International Conference on Learning Representations, 2015: 1-15.
[3] SERRANO S, NOAH A, S. is attention interpretable?[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019: 2931-2951.
[4] VIG J, BELINKOV Y. Analyzing the Structure of attention in a transformer language model[C]// Proceedings of the ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2019: 63-76.
[5] RAGANATO A, TIEDEMANN J, An analysis of encoder representations in transformer-based machine translation[C]//Proceedings of the EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, 2018: 287-297.
[6] MICHEL P, LEVY O, NEUBIG G. Are sixteen heads really better than one?[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems, 2019: 1-11.
[7] WANG W X, TU Z P. Rethinking the value of transformer components[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2019: 6019-6029.
[8] CHRISTOPHER D. MANNING, KEVIN C, et al. Emergent linguistic structure in artificial neural networks trained by self-supervision[C]//Proceedings of the National Academy of Sciences, 2020(117): 30046-30054.
[9] FELIX S, DANIELLE S, BILL B. An operation sequence model for explainable neural machine translation[C]//Proceedings of the EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018: 175-186.
[10] YONATAN B, JAMES G. Analysis methods in neural language processing: A Survey[J]. Transactions of the Association for Computational Linguistics, 2019(7): 49-72.
[11] ELENA V, DAVID T, FEDOR M, et al. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned.[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5797-5808.
[12] CHULHEE Y, SRINADH B, ANKIT S R, et al. Are transformers universal approximators of sequence-to-sequence functions?[C]//Proceedings of the International Conference on Learning Representations, 2020.
[13] YANG B, WANG L, DEREK F, et al. Assessing the ability of self-attention networks to learn word order.[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3635-3644.
[14] ANTHONY B, YONATAN B, HASSAN S, et al. Identifying and controlling important neurons in neural machine translation[C]//Proceedings of the International Conference on Learning Representations, 2019.
[15] LI X T, LI G L, LIU L M, et al. On the word alignment from neural machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1293-1303.
[16] ELENA V, RICO S, IVAN T. The bottom-up evolution of representations in the transformer: A study with machine translation and language modeling objectives[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 4395-4405.
[17] HE S L, TU Z P, WANG X, et al. Towards understanding neural machine translation with word importance[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 952-962.
[18] DING Y Z, LIU Y, LUAN H B, et al. Visualizing and understanding neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1150-1159.
[19] FAHIM D, NADIR D, HASSAN S, et al. What is one grain of sand in the desert? Analyzing individual neurons in deep nlp models[C]//Proceedings of 23rd AAAI Conference on Artificial Intelligence, 2019: 6309-6317.
[20] WU L J, TAN X, HE D, et al. Beyond error propagation in neural machine translation: Characteristics of language also matter[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 3602-3611.
[21] SHI X, INKIT P, KEVIN K. Does string-based neural mt learn source syntax?[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016: 1526-1534.
[22] FELIX H, KYUNGHYUN C, SEBASTIEN J, et al. Embedding word similarity with neural machine translation.[C]//Proceedings of the International Conference on Learning Representations, 2015.
[23] TANG G B, RICO S, JOAKIM N. Encoders help you disambiguate word senses in neural machine translation.[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing,2019: 1429-1435.
[24] RICO S. How grammatical is character-level neural machine translation? Assessing mt quality with contrastive translation pairs[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017:376-382.
[25] YONATAN B, NADIR D, FAHIM D, et al. What do neural machine translation models learn about morphology?[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 861-872.

基金

国家自然科学基金(62166022);云南省科技厅面上项目(202101AT070077)
PDF(19383 KB)

Accesses

Citation

Detail

段落导航
相关文章

/