基于混合自注意力机制的神经机器翻译

PDF(1432 KB)

中文信息学报 ›› 2023, Vol. 37 ›› Issue (9) : 38-45.

机器翻译

基于混合自注意力机制的神经机器翻译

宋恺涛,陆建峰

作者信息 +

Hybrid Self-Attention Network for Neural Machine Translation

SONG Kaitao, LU Jianfeng

Author information +

History +

摘要

编码器-解码器结构是神经机器翻译最常用的一种框架，许多新型结构都基于此框架进行设计以改善翻译性能。其中，深度自注意力网络是非常出色的一种网络结构，其利用了自注意力机制来捕获全局的语义信息。然而，这种网络却不能有效地区分每个单词的相对位置，例如，依赖单词究竟位于目标单词的左边还是右边，也不能够捕获当前单词的局部语义。为了缓解这类问题，该文提出了一种新型的注意力机制，叫做混合注意力机制。该机制包含了对自注意力网络设计的多种不同的特定掩码来获取不同的语义信息，例如，全局和局部信息，以及左向或者右向信息。最后，该文提出了一个压缩门来融合不同类型的自注意力网络。在三个机器翻译数据集上的实验结果表明，该文方法能够取得比深度自注意力网络更好的结果。

Abstract

For Neural Machine Translation (NMT), Transformer is one of the most promising structures, which can leverage the self-attention mechanism to capture the semantic dependency from global view. However, it cannot distinguish the relative position of different tokens very well, such as the tokens located at the left or right of the current token, and cannot focus on the local information around the current token either. To alleviate these problems, we propose a novel attention mechanism named Hybrid Self-Attention Network (HySAN) with specific-designed masks for self-attention network to extract various semantic, such as the global/local information, the left/right part context. Finally, a squeeze gate is introduced to combine different kinds of SANs for fusion. Experimental results on three machine translation tasks show that the proposed appraoch outperforms the Transformer baseline significantly.

导出引用

宋恺涛,陆建峰. 基于混合自注意力机制的神经机器翻译. 中文信息学报. 2023, 37(9): 38-45

SONG Kaitao, LU Jianfeng. Hybrid Self-Attention Network for Neural Machine Translation. Journal of Chinese Information Processing. 2023, 37(9): 38-45

参考文献

[1] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of International Conference on Learning Representations, 2015: 1-14.
[2] 冯洋,邵晨泽. 神经机器翻译前沿综述[J]. 中文信息学报,2020,34(7): 1-18.
[3] SUTSKEVER I, VINYALS O, QUOC V. Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014: 3104-3112.
[4] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(1):1735-1780.
[5] JONAS G, MICHAEL A, DAVID G,et al. Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning. Australia: PMLR, 2017:1243-1252.
[6] VASWANI A, SHAZEER N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 5998-6008.
[7] PETER S, JAKOB U, ASHISH V. Self-attention with relative position representations[C]//Proceedings of the NAACL-HLT. New Orleans,2018: 464-468.
[8] SHEN T, ZHOU T, LONG G, et al. DiSAN: Directional self-attention network for RNN/CNN-Free language understanding[C]//Proceedings of the AAAI, 2018: 5446-5455.
[9] LUONG T, PHAM H, MANNING C. Effective approaches to attention-based neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Portugal: Association for Computational Linguistics, 2015: 1412-1421.
[10] LEI J, RYAN K, GEOFFREY E. Layer normalization[J]. arXiv preprint arXiv: 1607.06450.
[11] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 2016: 770-778.
[12] PAPINENI K, ROUKOS S, WARD T, et al. Bleu: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. USA, 2002: 311-318.
[13] WANG Y, CHENG S, et al. Sogou neural machine translation systems for WMT-17[C]//Proceedings of the 2nd Conference on Machine Translation. Denmark: Association for Computational Linguistics, 2017: 410-415.
[14] TAN Z, WANG B, et al. XMU neural machine translation systems for WMT 17[C]//Proceedings of the 2nd Conference on Machine Translation. Denmark: Association for Computational Linguistics, 2017: 400-404.
[15] SENNRICH R, BIRCH A, et al. The university of edinburgh's neural MT systems for WMT17[C]//Proceedings of the 2nd Conference on Machine Translation. Denmark: Association for Computational Linguistics, 2017: 389-399.
[16] VINOD N, GEOFFREY E. Rectified linear units improve restricted boltzmann machines[C]//Proceedings of the 27th International Conference on Machine Learning. Israel: Omnipress, 2010: 807-814.
[17] KALCHBRENNER N, BLUNSOM P. Recurrent continuous translation models[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. USA: Association for Computational Linguistics, 2013:1700-1709.

PDF(1432 KB)

1003

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注



Issue Date
2023-11-12