抽取式自动文摘研究抽取文档中最能代表文档核心内容的句子作为摘要,篇章主次关系分析则是从篇章结构方面分析出篇章的主要内容和次要内容,因此,篇章主次关系分析和抽取式自动文摘存在较大关联,篇章主次关系可指导摘要的抽取。该文提出了一种基于篇章主次关系的单文档抽取式摘要方法,该方法基于神经网络模型构建了一个篇章主次关系和文本摘要联合学习的模型。该模型在考虑词组、短语等语义信息的基础上同时考虑了篇章的主次关系等结构信息,最终基于篇章内容的整体优化抽取出最能代表文档核心内容的句子作为摘要。实验结果表明,与当前主流的单文档抽取式摘要方法相比,该方法在ROUGE评价指标上有显著提高。
Abstract
The single document extractive summarization aims to extract the most relevant sentences to represent the core content of the document. To employ the satellite and nuclear relations which can represent the importance of sentences, this paper proposes a neural approach to jointly model the satellite and nuclear relations extraction and text summarization. This model considers the semantic and structural information of the text, and finally extracts the sentences with most relevant and importance to represent the core content of the document as summary. The experimental results show that the method has a significant improvement in the ROUGE evaluation index compared with the current mainstream single document extractive summarization methods.
关键词
抽取式摘要 /
主次关系 /
神经网络
{{custom_keyword}} /
Key words
extractive summarization /
satellite and nuclear relation /
neural network
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 张龙凯,王厚峰.文本摘要问题中的句子抽取方法研究[J].中文信息学报,2012,26(2):97-102.
[2] Prasad R,Dinesh N,Lee A,et al.The Penn Discourse TreeBank 2.0[C]//Proceedings of LREC,2008.
[3] Carlson,Lynn,Marcu,et al.Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory[M].Current and New Directions in Discourse and Dialogue.Springer Netherlands,2003:2655-2661.
[4] Zhou Y,Xue N.The Chinese Discourse TreeBank: A Chinese corpus annotated with discourse relations[J].Language Resources and Evaluation,2015,49(2): 397-431.
[5] 蒋峰,褚晓敏,徐昇,等.基于主题相似度的宏观篇章主次关系识别方法[J].中文信息学报,2018,32(1): 43-50.
[6] 褚晓敏,朱巧明,周国栋.自然语言处理中的篇章主次关系研究[J].计算机学报 ,2017,40(4):842-860.
[7] Kupiec J,Pedersen J,Chen F.A trainable document summarizer[C]//Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,1995: 68-73.
[8] Cortes C,Vapnik V.Support-Vector networks[J].Machine Learning.1995,20(3):273-297.
[9] Joachims T.Making large-scale support vector machine learning practical[C]//Proceedings of Advances in Kernel Methods,1999:169-184.
[10] L Page,S Brin,R Motwani,et al.The pagerank citation ranking: Bringing order to the web[C]//Proceedings of Technical Report,Stanford Digital Libraries,1998.
[11] Mihalcea R,Tarau P.Textrank: Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing,2004.
[12] Erkan G,Radev D R.LexPageRank: Prestige in multi-document text summariztion[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing,EMNLP 2004,ACL 2004:365-371.
[13] K?geb?ck M,Mogren O,Tahmasebi N,et al.Extractive summarization using continuous vector space models[C]//Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC),2014: 31-39.
[14] Kim Y.Convolutional neural networks for sentence classification[J].arXiv preprint arXiv:1408.5882,2014.
[15] Chen K Y,Liu S H,Chen B,et al.Extractive broadcast news summarization leveraging recurrent neural network language modeling techniques[J].IEEE Transactions on Audio,Speech,and Language Processing,2015,23(8): 1322-1334.
[16] Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural Computation,1997,9(8): 1735-1780.
[17] 李艳翠.汉语篇章结构表示体系及资源构建研究[D].苏州:苏州大学博士学位论文,2015.
[18] Guzmán F,Joty S,Arquez L,et al.Using discourse structure improves machine translation evaluation[C]//Proceedings of Meeting of the Association for Computational Linguistics,2014.
[19] Atkinson J,Munoz R.Rhetorics-based multi-document summarization[J].Expert Systems with Applications,2013,40(11):4346-4352.
[20] Cohan A,Goharian N.Scientific article summarization using citation-context and articles discourse structure[J].arXiv preprint arXiv:1704.06619,2017.
[21] Presutti V,Draicchio F,Gangemi A.Knowledge extraction based on discourse representation theory and linguistic frames[C]//Proceedings of International Conference on Knowledge Engineering and Knowledge Management.Springer,Berlin,Heidelberg,2012: 114-129.
[22] Bhatia P,Ji Y,Eisenstein J.Better document-level sentiment analysis from rst discourse parsing[J].arXiv Preprint arXiv:1509.01599,2015.
[23] 刘凯.基于篇章结构理论的单文档自动文摘研究[D].苏州:苏州大学硕士学位论文,2018.
[24] 孙成,孔芳.基于转移的中文篇章结构解析研究[J].中文信息学报,2018,32(12):48-56.
[25] 李艳翠,冯文贺,周国栋,等.基于逗号的汉语子句 识别研究[J].北京大学学报 (自然科学版),2013,49(1): 7-14.
[26] Lin C Y.Rouge: A package for automatic evaluation of summaries[J].Text Summarization Branches Out,2004.
[27] Hua L,Wan X,Li L.Overview of the NLPCC 2017 Shared task: Single document summarization[C]//National CCF Conference on Natural Language Processing and Chinese Computing.Springer,Cham,2017:942-947.
[28] Wan X,Yang J.Multi-document summarization using cluster-based link analysis[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2008: 299-306.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61806137,61702518,61836007,61702149,61402314);江苏省高等学校自然科学研究项目(18KJB520043)
{{custom_fund}}