蒋浩泉,张儒清,郭嘉丰,范意兴,程学旗. 图卷积网络与自注意机制在文本分类任务上的对比分析[J]. 中文信息学报, 2021, 35(12): 84-93.
JIANG Haoquan, ZHANG Ruqing, GUO Jiafeng, FAN Yixing, CHENG Xueqi. A Comparative Study of Graph Convolutional Networks and Self-Attention Mechanism on Text Classification. , 2021, 35(12): 84-93.
A Comparative Study of Graph Convolutional Networks and Self-Attention Mechanism on Text Classification
JIANG Haoquan1,2, ZHANG Ruqing1,2, GUO Jiafeng1,2, FAN Yixing1,2, CHENG Xueqi1,2
1. Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:Graph Convolutional Networks has drawn much attention recently, and the self-attention mechanism has been widely applied as the core of the Transformer and many pre-trained models. We disclose that the selfattention mechanism can be seen as a generalization of Graph Convolutional Networks, in that it takes all input samples as nodes and then constructs a directed fully connected graph with learnable edge weights for convolution. Experiments show that the selfattention mechanism achieves better text classification accuracy than many state of the art Graph Convolutional Networks. Meanwhile, the performance gap of classification widens as the data size increases. These show that the self-attention mechanism is more expressive, and may surpass Graph Convolutional Networks with potential performance improvements on the task of text classification.
[1]Goodfellow I, Bengio Y, Courville A. Deep learning[M/OL]. Massachusetts: MIT Press, 2016.
[2]Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[3]Kim Y. Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Doha, Qatar: Association for Computational Linguistics, 2014: 1746-1751.
[4]Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Comput,1997, 9(8): 1735-1780.
[5]Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the 3rd International Conference on Learning Representations, Conference Track Proceedings, 2015.
[6]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[M]//Proceedings of the Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 2017: 5998-6008.
[7]Devlin J, Chang M W, Lee K, et al.BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186.
[8]Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks[J]. IEEE transactions on neural networks and learning systems, 2020, 32(1): 4-24.
[9]Bruna J, Zaremba W, Szlam A, et al.Spectral networks and locally connected networks on graphs[C]//Proceedings of the 2nd International Conference on Learning Representations, Conference Track Proceedings, 2014.
[10]Henaff M, Bruna J, Lecun Y. Deep convolutional networks on graph-structured data[J/OL]. CoRR, 2015, abs/1506.05163.
[11]Defferrard M, Bresson X, Vanderghe-ynst P. Convolutional neural networks on graphs with fast localized spectral filtering[C]//Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems Spain, 2016: 3837-3845.
[12]Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[C]//Proceedings of the 5th International Conference on Learning Representations, Conference Track Proceedings. OpenReview.net, 2017: 1-14.
[13]〖JP+2〗Marcheggiani D, titov I. Encoding sentences with graph convolutional networks for semantic role labeling[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2017: 1506-1515.〖JP〗
[14]Yao L, Mao C, Luo Y. Graph convolutional networks for text classification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI Press, 2019: 7370-7377.〖JP〗
[15]Huang L, Ma D, Li S, et al.Text level graph neural network for text classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Association for Computational Linguistics, 2019: 3442-3448.
[16]Bastings J, Titov I, Aziz W, et al. Graph convolutional encoders for syntax-aware neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2017: 1957-1967.
[17]Marcheggiani D, Bastings J, Titov I. Exploiting semantics in neural machine translation with graph convolutional networks[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, 2018: 486-492.
[18]Fernandes P, Allamanis M, Brockschmidt M. Structured neural summarization[C]//Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 2019.
[19]Ding M, Zhou C, Chen Q, et al.Cognitive graph for multi-hop reading comprehension at scale[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: As-sociation for Computational Linguistics, 2019: 2694-2703.
[20]Zhang S, Tong H, Xu J, et al. Graph convolutional networks: a comprehensive review[J]. Computational Social Networks, 2019, 6(1): 11.
[21]Yang Z, Yang D, Dyer C, et al.Hierarchical attention networks for document classification[C]//Proceedings of the Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, 2016: 1480-1489.
[22]Ying H, Zhuang F, Zhang F, et al. Sequential recommender system based on hierarchical attention networks[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018: 3926-3932.
[23]〖JP〗Shen T, Zhou T, Long G, et al.Disan: Directional self-attention network for rnn/cnn-free language understanding[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, Louisiana, USA, 2018: 5446-5455.〖JP〗
[24]Kiela D, Wang C, Cho K. Dynamic meta-embeddings for improved sentence representations[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2018: 1466-1477.
[25]Chaudharis, Polatkang, Ramanath R, et al. An attentive survey of attention models[J]. ACM Transactions on Intelligent Systems and Tech-nology (TIST), 2021, 12(5): 1-32.
[26]Velickovic P, Cucurull G, Casanova A, et al.Graph attention networks[C]//Proceedings of the 6th International Conference on Learning Representations, Canada, 2018.
[27]Zhang J, Shi X, Xie J, et al. gaan: gated attention networks for learning on large and spatio temporal graphs[C]//Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence, AUAI Press, 2018: 339-349.
[28]Lee J B, Rossi R A, Kim S, et al. Attention models in graphs: a survey[J]. ACM Trans. Knowl. Discov. Data, 2019, 13(6): 62.
[29]Pennington J, Socher R, Manning C D. Glove: global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, ACL, 2014: 1532-1543.
[30]Kingma D P, Ba J. Adam: a method for stochastic optimization[C]//Proceedings of the 3rd International Conference on Learning Representations, Conference Track Proceedings, 2015.