机器阅读理解是自然语言处理中的一项重要而富有挑战性的任务。近年来,以BERT为代表的大规模预训练语言模型在此领域取得了显著的成功。但是,受限于序列模型的结构和规模,基于BERT的阅读理解模型在长距离和全局语义构建的能力有着显著缺陷,影响了其在阅读理解任务上的表现。针对这一问题,该文提出一种融合了序列和图结构的机器阅读理解的新模型。首先,提取文本中的命名实体,使用句子共现和滑动窗口共现两种方案构建命名实体共现图;基于空间的图卷积神经网络学习命名实体的嵌入表示;将通过图结构得到的实体嵌入表示融合到基于序列结构得到的文本嵌入表示中;最终采用片段抽取的方式实现机器阅读理解问答。实验结果表明,与采用BERT所实现的基于序列结构的阅读理解模型相比,融合序列和图结构的机器阅读理解模型EM值提高了7.8%,F1值提高了6.6%。
Abstract
Machine Reading Comprehension(MRC) is an essential and challenging task in Natural Language Processing(NLP). As the state-of-the-art solution, the BERT-based reading comprehension model, however, is defected in long-distance and global semantic construction owing to the structure and scale of the sequence model,. This paper proposes a new Machine Reading Comprehension method combining sequence and graph structure. First, named entities are extracted from the text, and sentence co-occurrence and sliding window co-occurrence are used to construct named entity co-occurrence diagram. Then a spatial-based Graph Convolutional Neural Network is designed to learn the embedded representation of the named entities. And the entity embedded representation obtained by graph structure is fused with the text embedded representation obtained by the sequence structure. In the end, the answer for the Machine Reading Comprehension question is decided by segment extraction. The experimental results show that, compared with the sequential structure-based reading comprehension model based on BERT, our model has achieved 7.8% improvements in EM, and 6.6% in F1.
关键词
机器阅读理解 /
图神经网络 /
深度学习
{{custom_keyword}} /
Key words
machine reading comprehension /
graph neural network /
deep learning
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Richardson M, Burges C J, Renshaw E. Mctest: A challenge dataset for the open-domain machine comprehension of text[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013:193-203.
[2] Chen D, Bolton J, Manning C D. A thorough examination of the CNN/Daily Mail reading comprehension task[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 2358-2367.
[3] Rajpurkar P, Zhang J, Lopyrev K, et al.SQuAD: 100,000+ questions for machine comprehension of text[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016:2383-2392.
[4] Kadlec R, Schmid M, Bajgar O, et al. Text understanding with the attention sum reader network[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers), 2016: 908-918.
[5] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[6] Seo M, Kembhavi A, Farhadi A, et al. Bidirectional attention flow for machine comprehension[J].arXiv preprint arXiv:1611.01603, 2016.
[7] Yu A W, Dohan D, Luong M-T, et al.QANet: Combining local convolution with global self-attention for reading comprehension[C]//Proceedings of the International Conference on Learning Representations. 2018:1-16.
[8] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st Conference on Neural Information Processing Systems, 2017:5998-6008.
[9] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidi-rectional transformers for language understanding[C]//Proceedings of the NAACL-HLT 2019, 2019: 4171-4186.
[10] Brown T B, Mann B, Ryder N, et al. Language models are few-shot learners[C]//Proceedings of the 34th Conference on Neural Information Processing Systems, 2020:1-25.
[11] Zhang S, Tong H, Xu J, et al. Graph convolutional networks: A com-prehensive review[J]. Computational Social Networks, 2019, 6(1):11.
[12] Bruna J, Zaremba W, Szlam A, et al. Spectral networks and locally connected networks on graphs[C]//Proceedings of the 2014 International Conference on Learning Representations (ICLR2014), CBLS, April 2014.
[13] Scarselli F, Gori M, Tsoi A C, et al. The graph neural network model[J]. IEEE Transactions on Neural Networks, 2008, 20(1): 61-80.
[14] Peng H, Li J, He Y, et al. Large-scale hierarchical text classification with recursively regularized deep graph-cnn[C]//Proceedings of the 2018 World Wide Web Conference, 2018: 1063-1072.
[15] Yao L, Mao C, Luo Y. Graph convolutional networks for text classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019,(33): 7370-7377.
[16] Song L, Wang Z, Yu M, et al. Exploring graph-structured passage representation for multi-hopreading comprehension with graph neural networks[J]. arXiv preprint arXiv:1809.02040, 2018.
[17] De Cao N, Aziz W, Titov I. Question answering by reasoning across documents with graph convolutional networks[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,Volume1. 2019: 2306-2317.
[18] Qi P, Zhang Y, Zhang Y, et al. Stanza: A Python natural language processing toolkit for many human languages[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020.
[19] Pennington J, Socher R, Manning C D. GloVe: Global vectors forward representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1532-1543.
[20] Rajpurkar P, Jia R, Liang P. Know what you dont know: Unanswerable questions for SQuAD[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 784-789.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
四川省科技计划项目(2020YFG0009)
{{custom_fund}}