目前主流的生成式自动文摘采用基于编码器—解码器架构的机器学习模型,且通常使用基于循环神经网络的编码器。该编码器主要学习文本的序列化信息,对文本的结构化信息学习能力较差。从语言学的角度来讲,文本的结构化信息对文本重要内容的判断具有重要作用。为了使编码器能够获取文本的结构信息,该文提出了基于文本结构信息的编码器,其使用了图卷积神经网络对文本进行编码。该文还提出了一种规范融合层,旨在使模型在获取文本结构信息的同时,也能关注到文本的序列化信息。另外,该文还使用了多头注意力机制的解码器,以提高生成摘要的质量。实验结果表明,在加入该文所提出的文本结构信息编码器、规范融合层后,系统性能在ROUGE评价指标上有显著的提高。
Abstract
The current method of abstractive summarization generally adopt machine learning models based on encoder-decoder architecture, with recurrent neural network as the encoder often. To capture the structure information of the text which is believed to play an important role in judging the important content, this paper proposes a text structure information encoder via graph convolutional neural network. This paper designs a normalization and fusion layer, which aims to enable the model to model both the linear and the structure information in the text. In addition, a multi-headed attention decoder is adopted to improve the quality of the generated summary. The experimental results show that the proposed method significantly improves the system performance according to ROUGE evaluation.
关键词
生成式文摘 /
文本结构 /
图卷积神经网络
{{custom_keyword}} /
Key words
abstractive summarization /
text structure /
graph convolutional neural network
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Radev D R, Hovy E, Mckeown K R, et al. Introduction to the special issue on summarization[J]. Computational Linguistics, 2002, 28(4): 399-408.
[2] 刘挺,吴岩,王开铸.自动文摘综述[J].情报科学, 1998(01): 65-71.
[3] 龚书. 抽取式多文档文摘的文本表示研究[D]. 北京: 北京交通大学博士学位论文, 2013.
[4] Mohan M J, Sunitha C, Ganesh A, et al. A Study on ontology based abstractive summarization[J]. Procedia Computer Science, 2016, 87: 32-37.
[5] Park D, Ahn C W . LSTM Encoder-Decoder with adversarial network for text generation from keyword: 13th International Conference, BIC-TA 2018, Beijing, China, November 2-4, 2018, Proceedings, Part II[M]. Bio-inspired Computing: Theories and Applications. Springer, Singapore, 2018.
[6] Gers F A, Schraudolph N N, Schmidhuber J, et al. Learning precise timing with LSTM recurrent networks[J]. Journal of Machine Learning Research, 2003, 3(1): 115-143.
[7] Hobbs J R. Information,intention,and structure in discourse: A first draft[C]//Proceedings of the Burning Issues in Discourse,NATO Advanced Research Workshop, 1993: 41-66.
[8] Mann W C, Thompson S A . Relational propositions in discourse[J]. Discourse Processes,1986,9(1): 57-90.
[9] Prasad R,Dinesh N,et a1. The Penn discourse treebank 2.0[C]//Proceedings of the 6th International Conference on Learning Resources and Evaluation, 2008: 2961-2968.
[10] Abualshour A. Applications of graph convolutional networks and deepGNC's in point cloud part segmen tation and upsampling[D]. Master Thesis, Jeddah: KAUST, 2020.
[11] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[12] Rush A M, Chopra S, Weston J, et al. A neural attention model for abstractive sentence summarization[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 379-389.
[13] Gu J, Lu Z, Li H, et al. Incorporating copying mechanism in sequence-to-sequence learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 1631-1640.
[14] Gulcehre C, Ahn S, Nallapati R, et al. Pointing the unknown words[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 140-149.
[15] See A, Liu P J, Manning C D, et al. Get to the point: Summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1073-1083.
[16] Cohan A, Dernoncourt F, Kim D S, et al. A discourse-aware attention model for abstractive Summarization of long documents[C]//Proceedings of the North American Chapter of the Association for Computational Linguistics, 2018: 615-621.
[17] Haveliwala T H. Topic-sensitive PageRank[C]//Proceedings of the 11th International Conference on World Wide Web, 2002: 517-526.
[18] Mihalcea R, Tarau P. TextRank: Bringing order into texts[C]//Proceedings of the Empirical Methods in Natural Language Processing, 2004: 404-411.
[19] Erkan G, Radev D R. LexPageRank: Prestige in multi-document text summarization.[C]//Proceedings of the Empirical Methods in Natural Language Processing, 2004: 365-371.
[20] Henaff M, Bruna J, Lecun Y, et al. Deep convolutional networks on graph-structured data[J]. arXiv: preprint arXiv: 1505.05163, 2015.
[21] Kipf T,Welling M. Semi-supervised classification with graph convolutional networks[C]//Proceedings of the ICLR2017, 2017: 1-14.
[22] Schlichtkrull M S, Kipf T, Bloem P, et al. Modeling relational data with graph convolutional networks[C]//Proceedings of the European Semantic Web Conference, 2018: 593-607.
[23] Elman J L.Finding structure in time[J]. Cognitive Science, 1990, 14(2): 179-211.
[24] Barraquand J, Latombe J. Robot motion planning: A distributed representation approach[J]. The International Journal of Robotics Research, 1991, 10(6): 628-649.
[25] Devlin J, Chang M, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HCT2019, 2019: 4171-4186.
[26] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning, 2015: 446-456.
[27] Kim T, Song I, Bengio Y, et al. Dynamic layer normalization for adaptive neural acoustic modeling in speech recognition[C]//Proceedings of INTERSPEECH 2017, 2017: 2411-2415.
[28] Krizhevsky A, Sutskever I, Hinton G E, et al. ImageNet classification with deep convolutional neural networks[C]//Proceedings of Neural Information Processing Systems, 2012, 141(5): 1097-1105.
[29] Canziani A, Paszke A, Culurciello E, et al. An analysis of deep neural network models for practical applications[J]. arXiv preprint arXiv: 1005.07678, 2016.
[30] Hua L, Wan X, Li L . Overview of the NLPCC 2017 shared task: Single document summarization[C]//Proceedings of the 6th CCF Conference on Natural Language Procssing and Chinese Computing, 2017: 1-6.
[31] Lin C. ROUGE: A Package for automatic evaluation of summaries[C]//Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 2004: 74-81.
[32] Kingma D P, Ba J. Adam: A method for stochastic optimization[C]//Proceedings of the International Conference on Learning Representations, 2015: 1-15.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61976146,61806137)
{{custom_fund}}