编码器-解码器结构是神经机器翻译最常用的一种框架,许多新型结构都基于此框架进行设计以改善翻译性能。其中,深度自注意力网络是非常出色的一种网络结构,其利用了自注意力机制来捕获全局的语义信息。然而,这种网络却不能有效地区分每个单词的相对位置,例如,依赖单词究竟位于目标单词的左边还是右边,也不能够捕获当前单词的局部语义。为了缓解这类问题,该文提出了一种新型的注意力机制,叫做混合注意力机制。该机制包含了对自注意力网络设计的多种不同的特定掩码来获取不同的语义信息,例如,全局和局部信息,以及左向或者右向信息。最后,该文提出了一个压缩门来融合不同类型的自注意力网络。在三个机器翻译数据集上的实验结果表明,该文方法能够取得比深度自注意力网络更好的结果。
Abstract
For Neural Machine Translation (NMT), Transformer is one of the most promising structures, which can leverage the self-attention mechanism to capture the semantic dependency from global view. However, it cannot distinguish the relative position of different tokens very well, such as the tokens located at the left or right of the current token, and cannot focus on the local information around the current token either. To alleviate these problems, we propose a novel attention mechanism named Hybrid Self-Attention Network (HySAN) with specific-designed masks for self-attention network to extract various semantic, such as the global/local information, the left/right part context. Finally, a squeeze gate is introduced to combine different kinds of SANs for fusion. Experimental results on three machine translation tasks show that the proposed appraoch outperforms the Transformer baseline significantly.
关键词
自注意力 /
神经机器翻译 /
深度神经网络
{{custom_keyword}} /
Key words
self-attention /
neural machine translation /
deep neural network
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of International Conference on Learning Representations, 2015: 1-14.
[2] 冯洋,邵晨泽. 神经机器翻译前沿综述[J]. 中文信息学报,2020,34(7): 1-18.
[3] SUTSKEVER I, VINYALS O, QUOC V. Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014: 3104-3112.
[4] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(1):1735-1780.
[5] JONAS G, MICHAEL A, DAVID G,et al. Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning. Australia: PMLR, 2017:1243-1252.
[6] VASWANI A, SHAZEER N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 5998-6008.
[7] PETER S, JAKOB U, ASHISH V. Self-attention with relative position representations[C]//Proceedings of the NAACL-HLT. New Orleans,2018: 464-468.
[8] SHEN T, ZHOU T, LONG G, et al. DiSAN: Directional self-attention network for RNN/CNN-Free language understanding[C]//Proceedings of the AAAI, 2018: 5446-5455.
[9] LUONG T, PHAM H, MANNING C. Effective approaches to attention-based neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Portugal: Association for Computational Linguistics, 2015: 1412-1421.
[10] LEI J, RYAN K, GEOFFREY E. Layer normalization[J]. arXiv preprint arXiv: 1607.06450.
[11] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 2016: 770-778.
[12] PAPINENI K, ROUKOS S, WARD T, et al. Bleu: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. USA, 2002: 311-318.
[13] WANG Y, CHENG S, et al. Sogou neural machine translation systems for WMT-17[C]//Proceedings of the 2nd Conference on Machine Translation. Denmark: Association for Computational Linguistics, 2017: 410-415.
[14] TAN Z, WANG B, et al. XMU neural machine translation systems for WMT 17[C]//Proceedings of the 2nd Conference on Machine Translation. Denmark: Association for Computational Linguistics, 2017: 400-404.
[15] SENNRICH R, BIRCH A, et al. The university of edinburgh's neural MT systems for WMT17[C]//Proceedings of the 2nd Conference on Machine Translation. Denmark: Association for Computational Linguistics, 2017: 389-399.
[16] VINOD N, GEOFFREY E. Rectified linear units improve restricted boltzmann machines[C]//Proceedings of the 27th International Conference on Machine Learning. Israel: Omnipress, 2010: 807-814.
[17] KALCHBRENNER N, BLUNSOM P. Recurrent continuous translation models[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. USA: Association for Computational Linguistics, 2013:1700-1709.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}