图神经网络在自然语言处理中的应用

PDF(18339 KB)

中文信息学报 ›› 2021, Vol. 35 ›› Issue (3) : 1-23.

综述

图神经网络在自然语言处理中的应用

陈雨龙^1,2,付乾坤^1,2,张岳^2,3

作者信息 +

Applications of Graph Neural Network for Natural Language Processing

CHEN Yulong^1,2, FU Qiankun^1,2, ZHANG Yue ^2,3

Author information +

History +

摘要

近几年,神经网络因其强大的表征能力逐渐取代传统的机器学习成为自然语言处理任务的基本模型。然而经典的神经网络模型只能处理欧氏空间中的数据,自然语言处理领域中,篇章结构,句法甚至句子本身都以图数据的形式存在。因此,图神经网络引起学界广泛关注,并在自然语言处理的多个领域成功应用。该文对图神经网络在自然语言处理领域中的应用进行了系统性的综述,首先介绍了图神经网络的核心思想并梳理了三种经典方法: 图循环网络,图卷积网络和图注意力网络;然后在具体任务中,详细描述了如何根据任务特性构建合适的图结构以及如何合理运用图结构表示模型。该文认为,相比专注于探索图神经网络的不同结构,探索如何以图的方式建模不同任务中的关键信息,是图神经网络未来工作中更具普遍性和学术价值的一个研究方向。

Abstract

In recent years, neural networks have gradually overtaken classical machine learning models and become the de facto paradigm for natural language processing tasks. Most typical neural networks are capable of dealing with data in Euclidean space. Due to the linguistic nature, however, the language information such as discourse and syntactic information is of graph structures. Therefore, there has been an increasing number of researches that use graph neural networks to explore structures in natural languages. This paper systematically introduces applications of graph neural networks in natural language processing areas. It first discusses the fundamental concepts and introduces three main categories of graph neural networks, namely graph recurrent neural network, graph convolutional network, and graph attention network. Then this paper introduces methods to construct proper graph structures according to different tasks, and to apply graph neural networks to embed those structures. This paper suggests that compared with focusing on novel structures, exploring how to use the key information in specific tasks to create corresponding graphs is more universal and is of more academic value, which can be a promising future research direction.

导出引用

陈雨龙,付乾坤,张岳. 图神经网络在自然语言处理中的应用. 中文信息学报. 2021, 35(3): 1-23

CHEN Yulong, FU Qiankun, ZHANG Yue. Applications of Graph Neural Network for Natural Language Processing. Journal of Chinese Information Processing. 2021, 35(3): 1-23

参考文献

[1] Bastings J, Titov I, Aziz W, et al. Graph convolutional encoders for syntax-aware neural machine translation[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 1957-1967.
[2] Song L, Gildea D, Zhang Y, et al. Semantic neural machine translation using AMR[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 19-31.
[3] Sahu S K, Christopoulou F, Miwa M, et al. Intersentence relation extraction with document-level graph convolutional neural network[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 4309-4316.
[4] Song L, Wang Z, Yu M, et al. Exploring graph structured passage representation for multihop reading comprehension with graph neural networks[J]. arXiv preprint arXiv:1809.02040, 2018.
[5] Fu T J, Li P H, Ma W Y. GraphRel: Modeling text as relational graphs for joint entity and relation extraction[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1409-1418.
[6] Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020 32(1): 4-24.
[7] Zhang Z, Cui P, Zhu W. Deep learning on graphs: A survey[J]. IEEE Transactions on Knowledge and Data Engineering, 2020: 1-1.
[8] Kim Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1746-1751.
[9] Chung J, Gulcehre C, Cho K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[C]//Proceedings of NIPS 2014 Workshop on Deep Learning, 2014.
[10] Tai K S, Socher R, Manning C D. Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015: 1556-1566.
[11] Zhu X, Sobihani P, Guo H. Long short-term memory over recursive structures[C]//Proceedings of the International Conference on Machine Learning, 2015: 1604-1612.
[12] Teng Z, Zhang Y.Bidirectional tree-structured LSTM with head lexicalization[J]. arXiv: 1611.06788 v1, 2016.
[13] Zhang Y, Liu Q, Song L. Sentence-state LSTM for text representation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 317-327.
[14] Song L, Zhang Y, Wang Z, et al. A graph-to-sequence model for AMR-to-text generation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 1616-1626.
[15] Beck D, Haffari G, Cohn T. Graph-to-sequence learning using gated graph neural networks[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 273-283.
[16] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[17] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907, 2016.
[18] Li Y, Tarlow D, Brockschmidt M, et al. Gated graph sequence neural networks[J]. arXiv preprint arXiv:1511.05493, 2015.
[19] Veliˇckovi＇c P, Cucurull G, Casanova A, et al. Graph attention networks[J]. arXiv preprint arXiv:1710.10903, 2017.
[20] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems, 2017: 5998-6008.
[21] Rajpurkar P, Zhang J, Lopyrev K, et al. SQuAD: 100,000+questions for machine comprehension of text[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016: 2383-2392.
[22] Yang Z, Qi P, Zhang S, et al. Hotpot QA: A dataset for diverse, explainable multi-hop question answering[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 2369-2380.
[23] Lai G, Xie Q, Liu H, et al. RACE: Large scale reading comprehension dataset from examinations[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 785-794.
[24] Koˇcisk T, Schwarz J, Blunsom P, et al. The narrative QA reading comprehension challenge[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 317-328.
[25] Welbl J, Stenetorp P, Riedel S. Constructing datasets for multi-hop reading comprehension across documents[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 287-302.
[26] Cao Y, Fang M, Tao D. BAG: Bi-directional attention entity graph convolutional network for multi-hop reasoning question answering[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 357-362.
[27] Tu M, Wang G, Huang J, et al. Multi-hop reading comprehension across multiple documents by reasoning over heterogeneous graphs[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2704-2713.
[28] Ding M, Zhou C, Chen Q, et al. Cognitive graph for multi-hop reading comprehension at scale[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2694-2703.
[29] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Naacl-hlt (1), 2019: 4171-4186.
[30] Qiu L, Xiao Y, Qu Y, et al. Dynamically fused graph network for multi-hop reasoning[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 6140-6150.
[31] Fang Y, Sun S, Gan Z, et al. Hierarchical graph network for multi-hop question answering[J]. arXiv, preprint arXiv 1911.03631, 2019.
[32] Pan L, Xie Y, Feng Y, et al. Semantic graphs for generating deep questions[J]. arXiv preprint arXiv:2004.12704, 2020.
[33] Sorokin D, Gurevych I. Modeling semantics with gated graph neural networks for knowledge base question answering[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 3306-3317.
[34] Berant J, Chou A, Frostig R, et al. Semantic parsing on freebase from question-answer pairs[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013: 1533-1544.
[35] Usbeck R, Ngomo A C N, Haarmann B, et al. 7th open challenge on question answering over linked data (qald-7)[C]//Proceedings of the Semantic Web Evaluation Challenge. Springer, Cham, 2017: 59-69.
[36] Fellbaum C. WordNet[J]. Theory and Applications of Ontology: Computer Applications. Springer, Dordrecht, 2010: 231-243.
[37] Liu H, Singh P. ConceptNet—A practical commonsense reasoning tool-kit[J]. BT Technology Journal, 2004, 22(4): 211-226.
[38] Qiu D, Zhang Y, Feng X, et al. Machine reading comprehension using structural knowledge graph-aware network[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019:5898-5903.
[39] Zhang S, Liu X, Liu J, et al. Record: Bridging the gap between human and machine commonsense reading comprehension[J]. arXiv preprint arXiv:1810.12885, 2018.
[40] Talmor A, Herzig J, Lourie N, et al. Commonsense QA: A question answering challenge targeting commonsense knowledge[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4149-4158.
[41] Lv S, Guo D, Xu J, et al. Graph-based reasoning over heterogeneous external knowledge for commonsense question answering[C]//Proceedings of AAAI, 2020: 8449-8456.
[42] Kwiatkowski T, Palomaki J, Redfield O, et al. Natural questions: A benchmark for question answering research[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 453-466.
[43] Zheng B, Wen H, Liang Y, et al. Document modeling with graph attention networks for multi-grained machine reading comprehension[J]. arXiv: 2005.05806, 2020.
[44] Cowie J, Lehnert W. Information extraction[J]. Communications of the ACM, 1996, 39(1): 80-91.
[45] Lample G, Ballesteros M, Subramanian S, etal. Neural architectures for named entity recognition[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 260-270.
[46] Mintz M, Bills S, Snow R, et al. Distant supervision for relation extraction without labeled Data[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2009: 1003-1011.
[47] Nguyen T H, Cho K, Grishman R. Joint event extraction via recurrent neural networks[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 300-309.
[48] Doddington G R, Mitchell A, Przybocki M A, et al. The automatic content extraction (ACE) program-tasks, data, and evaluation[C]//Proceedings of the LREC, 2004, 2(1): 837-840.
[49] Peng N, Poon H, Quirk C, et al. Cross-sentence n-ary relation extraction with graph LSTMs[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 101-115.
[50] Song L, Zhang Y, Wang Z, et al. N-ary relation extraction using graph-state LSTM[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 2226-2235.
[51] Riedel S, Yao L, McCallum A. Modeling relations and their mentions without labeled text[C]//Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, 2010: 148-163.
[52] Gardent C, Shimorina A, Narayan S, et al. Creating training corpora for NLG micro-planning[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), 2017: 179-188.
[53] Liu X, Gao F, Zhang Q, et al. Graph convolution for multimodal information extraction from visually rich documents[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 32-39.
[54] Song L, Zhang Y, Gildea D, et al. Leveraging dependency forest for neural medical relation extraction[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 208-218.
[55] Jin L, Song L, Zhang Y, et al. Relation extraction exploiting full dependency forests[C]//Proceedings of AAAI, 2020: 8034-8041.
[56] Jin H, Hou L, Li J, et al. Fine-grained entity typing via hierarchical multi graph convolutional networks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 4970-4979.
[57] Zhang Y, Yang J. Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 1554-1564.
[58] Gui T, Zou Y, Zhang Q, et al. A lexicon-based graph neural network for Chinese NER[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 1039-1049.
[59] Sui D, Chen Y, Liu K, et al. Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 3821-3831.
[60] Zhang N, Deng S, Sun Z, et al. Long-tail relation extraction via knowledge graph embeddings and graph convolution networks[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 3016-3025.
[61] Liu X, Luo Z, Huang H Y. Jointly multiple events extraction via attention-based graph information aggregation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 1247-1256.
[62] Yan H, Jin X, Meng X, et al. Event detection with multi-order graph convolution and aggregated attention[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 5770-5774.
[63] Subburathinam A, Lu D, Ji H, et al. Cross-lingual structure transfer for relation and event extraction[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 313-325.
[64] Koehn P, Och F J, Marcu D. Statistical phrase-based translation[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 2003: 48-54.
[65] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[C]//Proceedings of Advances in Neural Information Processing Systems, 2014: 3104-3112.
[66] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the 3rd International Conference on Learning Representations(ICLR), 2015.
[67] Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 1412-1421.
[68] Eriguchi A, Hashimoto K, Tsuruoka Y. Tree-to-sequence attentional neural machine translation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 823-833.
[69] Zollmann A, Venugopal A. Syntax augmented machine translation via chart parsing[C]//Proceedings of the Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2006: 138-141.
[70] Chiang D. A hierarchical phrase-based model for statistical machine translation[C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2005: 263-270.
[71] Sennrich R, Haddow B, Birch A. Edinburgh neural machine translation systems for WMT 16[C]//Proceedings of the 1st Conference on Machine Translation: Volume 2, Shared Task Papers, 2016: 371-376.
[72] Marcheggiani D, Bastings J, Titov I. Exploiting semantics in neural machine translation with graph convolutional networks[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 486-492.
[73] Marcheggiani D, Titov I. Encoding sentences with graph convolutional networks for semantic role labeling[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 1506-1515.
[74] Flanigan J, Thomson S, Carbonell J G, et al. A discriminative graph-based parser for the abstract meaning representation[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014: 1426-1436.
[75] Yin Y, Meng F, Su J, et al. A novel graph-based multi-modal fusion encoder for neural machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 3025-3035.
[76] Mikolov T, Sutskever I, Chen K,et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of Advances in Neural Information Processing Systems, 2013, 26:3111-9.
[77] Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1532-1543.
[78] Vashishth S, Bhandari M, Yadav P, et al. Incorporating syntactic and semantic information in word embeddings using graph convolutional networks[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3308-3318.
[79] Pang B, Lee L. Opinion mining and sentiment analysis[J]. Foundations and Trends in Information Retrieval, 2008, 2(1-2): 1-135.
[80] Zhang Y, Zhang Y. Tree communication models for sentiment analysis[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3518-3527.
[81] Li C, Goldwasser D. Encoding social information with graph convolutional networks for political perspective detection in news media[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2594-2604.
[82] Dozat T, Manning C D. Deep biaffine attention for neural dependency parsing[J]. arXiv preprint arXiv:1611.01734, 2016.
[83] Ji T, Wu Y, Lan M. Graph-based dependency parsing with graph neural networks[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2475-2485.
[84] Damonte M, Cohen S B. Structural neural encoders for AMR-to-text generation[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 3649-3658.
[85] Cai D, Lam W. Graph transformer for graph-to-sequence Learning[C]//Proceedings of AAAI, 2020: 7464-7471.
[86] Zhu J, Li J, Zhu M, et al. Modeling graph structure in transformer for better AMR-to-text generation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 5462-5471.
[87] Zhao Y, Chen L, Chen Z, et al. Line graph enhanced AMR-to-text generation with mix-order graph attention networks[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 732-741.
[88] Koncel-Kedziorski R, Bekal D, Luan Y, et al. Text generation from knowledge graphs with graph transformers[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 2284-2293.
[89] Ribeiro L F R, Gardent C, Gurevych I. Enhancing AMR-to-text generation with dual graph representations[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 3174-3185.
[90] Li W, Xu J, He Y, et al. Coherent comments generation for Chinese articles with a graph-to-sequence model[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 4843-4852.
[91] Liao K, Lebanoff L, Liu F. Abstract meaning representation for multi-document summarization[J]. arXiv preprint arXiv:1806.05655, 2018.

基金

国家自然科学基金(61976180)

PDF(18339 KB)

1459

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

Received	Published
2020-09-14	2021-04-16
Issue Date
2021-04-16

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金