|
|
Research Progress of Attention Mechanism in Deep Learning |
ZHU Zhangli1, RAO Yuan1, WU Yuan1, QI Jiangnan1, ZHANG Yu2 |
1.Lab of Social Intelligence & Complex Data Processing, School of Software, Xi'an Jiaotong University, Xi’an, Shaanxi 710049, China; 2.School of Computer Science, Shaanxi Normal University, Xi’an, Shaanxi 710119, China |
|
|
Abstract The attention mechanism has gradually become one of the popular methods and research issues in deep learning. By improving the source language expression, it dynamically selects the related information of the source language in decoding, which greatly improves the insufficiency issue of the classic Encoder-Decoder framework. On the basis of the issues in the conventional Encoder-Decoder framework such as long-term memory limitation, interrelationships in sequence transformation, and output quality of model dynamic structure, this paper describes a varied aspects on attention mechanism, including the definition, the principle, the classification, state-of-the-art researches as well as the applications of attention mechanism in image recognition, speech recognition, and natural language processing. Meanwhile, this paper further discusses the multi-modal attention mechanism, evaluation mechanism of attention, interpretability of the model and integration of attention with the new model, providing new research issues and directions for the development of attention mechanism in deep learning.
|
Received: 06 August 2018
|
|
|
|
|
[1] Kim Y.Convolutional neural networks for sentence classification[J].arXiv preprint arXiv:1408.5882,2014. [2] Sutskever I,Martens J,Hinton G E.Generating text with recurrent neural Networks[C]//Proceedings of International Conference on Machine Learning.Bellevue,Washington:DBLP,2016:1017-1024. [3] Ma X Z,Hovy E.End-to-end sequence labeling via bi-directional lstm-cnns-crf[J].arXiv preprint arXiv:1603.01354,2016. [4] Mnih,Volodymyr,Heess,et al.Recurrent models of visual attention[J].arXiv preprint arXiv:1406-6247,2014. [5] Ba J,Mnih V,Kavukcuoglu K.Multiple object recognition with visual attention[J].arXiv preprint arXiv:1412.7755,2014. [6] Bahdanau D,Cho K,Bengio Y.Neural machine translation by jointly learning to align and translate[J].arXiv peprint arXiv:1409,0473,2014. [7] Cho K,Van Merrienboer B,Gulcehre C,et al.Learning phrase representations using RNN Encoder-Decoder for statistical machine translation[J].arXiv:1406.1078v3.2014,2(11):23-37. [8] Yin W,Schütze H.Convolutional neural network for paraphrase identification[C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics.North American:Human Language Technologies,2015:901-911. [9] Yin W,Schütze H,Xiang B,et al.ABCNN:Attention-based convolutional neural network for modeling sentence pairs[J].arXiv preprint arXiv:1512.05193,2015. [10] Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[J].arXiv preprint arXiv:1706.03762,2017. [11] Luong M T,Pham H,Manning C D.Effective approaches to attention-based neural machine translation[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.Lisbon,Portugal:ACL,2015:1412-1421. [12] Lin Z,Feng M,Santos C N D,et al.A structured self-attentive sentence embedding[J].arXiv:1703.03130,2017. [13] Daniluk M,Rocktschel T,Welbl J,et al.Frustratingly short attention spans in neural language modeling[J].arXiv preprint arXiv:1702.04521,2017. [14] Xu K,Ba J,Kiros R,et al.Show,attend and tell:Neural image caption generation with visual attention[J].arXiv:1502.03044v1.2015:2048-2057. [15] Tang D Y,Qin B,Liu T.Aspect level sentiment classification with deep memory network[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing .Austin,Texas:ACL,2016:214-224. [16] Wang L L,Cao Z,Melo G D,et al.Relation classification via multi-level attention CNNs[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin,Germany:ACL,2016:1298-1307. [17] Yu A W,Dohan D,Luong M T,et al.QANet:Combining local convolution with global self-attention for reading comprehension[J].arXiv preprint arXiv:1804.09541,2018. [18] Golub D,He X.Character-level question answering with attention[J].arXiv preprint arXiv:1604.00727,2017. [19] Lu J,Yang J,Batra D,et al.Hierarchical question-image co-attention for visual question answering[J].arXiv preprint arXiv:1606.00061,2016. [20] Xiong C M,Zhong V,Socher R.Dynamic coattention networks for question answering[J].arXiv preprint arXiv:1611.01604,2016. [21] Cui Y,Chen Z,Wei S,et al.Attention-over-attention neural networks for reading comprehension[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Vancouver,Canada:ACL,2017:593-602. [22] Cui Y,Liu T,Chen Z,et al.Consensus attention-based neural networks for Chinese reading comprehension[C]//Proceedings of COLING 2016.Osaka,Japan:The COLING 2016 Organizing Commitee,2016:1777-1786. [23] Huang B,Ou Y,Carley K M.Aspect level sentiment classification with attention-over-attention neural networks[J].arXiv:1804.06536,2018. [24] Graves A,Wayne G,Reynolds M,et al.Hybrid computing using a neural network with dynamic external memory[J].Nature,2016,538(7626):471-476. [25] Zhang H,Goodfellow I,Metaxas D,et al.Self-attention generative adversarial networks[J].arXiv:1805.08318,2018. [26] Shen T,Zhou T,Long G,et al.Bi-directional block self-attention for fast and memory-efficient sequence modeling[C]//Proceedings of International Conference on Learning Representations,2018. [27] Zhou C,Bai J,Song J,et al.ATRank:An attention-based user behavior modeling framework for recommendation[J].arXiv:1711.06632,2017. [28] Chorowski J,Bahdanau D,Serdyuk D,et al.Attention-based models for speech recognition[J].Computer Science,2015,10(4):429-439. [29] Bahdanau D,Chorowski J,Serdyuk D,et al.End-to-end attention-based large vocabulary speech recognition[J].Computer Science,2015:4945-4949. [30] Kim S,Hori T,Watanabe S.Joint CTC-attention based end-to-end speech recognition using multi-task learning[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing,2017:4835-4839. [31] Junczys-Dowmunt M,Dwojak T,Hoang H.Is neural machine translation ready for deployment? A case study on 30 translation directions[J].arXiv preprint arXiv:1610.01108,2016. [32] Tu Z,Lu Z,Liu Y,et al.Modeling coverage for neural machine translation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin,Germany:ACL,2016:76-85. [33] Cheng Y,Wu H,Wu H,et al.Agreement-based joint training for bidirectional attention-based neural machine translation[C]//Proceedings of IJCAI,New York,USA,2016. [34] 刘洋.神经机器翻译前沿进展[J].计算机研究与发展,2017,54(6):1144-1149. [35] Radev,Dragomir R,Hovy,et al.Introduction to the special issue on summarization[J].Computational Linguistics,2002,28(28):399-408. [36] 庞超,尹传环.基于分类的中文文本摘要方法[J].计算机科学,2018,45(1):144-147. [37] 周博通,孙承杰,林磊,等.InsunKBQA:一个基于知识库的问答系统[J].智能计算机与应用,2017,7(5):150-154. [38] Lipton Z C.The mythos of model interpretability[J].arXiv preprint arXiv:1606.03490,2016. [39] Sabour S,Frosst N,Hinton G E.Dynamic routing between capsules[J].arXiv preprint arXiv:1710.09829,2017. |
|
|
|