一种基于局部—全局主题关系的演化式摘要系统

吴仁守,刘凯,王红玲

PDF(1917 KB)
PDF(1917 KB)
中文信息学报 ›› 2018, Vol. 32 ›› Issue (9) : 75-83.
信息抽取与文本挖掘

一种基于局部—全局主题关系的演化式摘要系统

  • 吴仁守,刘凯,王红玲
作者信息 +

An Evolutionary Summarization System Based on Local-global Topic Relationship

  • WU Renshou, LIU Kai, WANG Hongling
Author information +
History +

摘要

带有时间标志的演化式摘要是近年来提出的自然语言处理任务,其本质是多文档自动文摘,它的研究对象是互联网上连续报道的热点新闻文档。针对互联网新闻事件报道的动态演化、动态关联和信息重复等特点,该文提出了一种基于局部—全局主题关系的演化式摘要方法,该方法将新闻事件划分为多个不同的子主题,在考虑时间演化的基础上同时考虑子主题之间的主题演化,最后将新闻标题作为摘要输出。实验结果表明,该方法是有效的,并且在以新闻标题作为输入输出时,和当前主流的多文档摘要和演化摘要方法相比,在Rouge评价指标上有显著提高。

Abstract

Evolutionary timeline summarization (ETS) for Internet News Event is a new task in natural language processing,which is a kind of multi-document summarization (MDS) in essence. According to the features of dynamic evolution,content relevance and information redundancy of Internet news event,this paper puts forward an evolutionary summarization method basing on local and global topic relations. First,the news event is divided into a number of different sub-topics. In the meantime,the basis of time evolution and the topic evolution between sub-topics are considered. Finally,headlines are extracted as summary. The experimental results show that this method is effective. Especially using news headlines as inputs or outputs brings significant improvements in the Rouge evaluation,compared with current popular method of multi-document summarization and evolution summarization.

关键词

主题关系 / PageRank / 演化式摘要 / 多文档文摘

Key words

topic relation / PageRank / evolutionary timeline summarization / multi-document summarization

引用本文

导出引用
吴仁守,刘凯,王红玲. 一种基于局部—全局主题关系的演化式摘要系统. 中文信息学报. 2018, 32(9): 75-83
WU Renshou, LIU Kai, WANG Hongling. An Evolutionary Summarization System Based on Local-global Topic Relationship. Journal of Chinese Information Processing. 2018, 32(9): 75-83

参考文献

[1] Page L. The PageRank citation ranking:Bringing order to the web[J].Stanford Digital Libraries Working Paper,1998,9(1):1-14.
[2] 秦兵,刘挺,李生. 多文档自动文摘综述[J].中文信息学报,2005,19(6):13-20.
[3] Wong K F,Wu M,Li W. Extractive summarization using supervised and semi-supervised learning[C]//Proceedings of the COLING 2008,International Conference on Computational Linguistics,Proceedings of the Conference,UK,2008:18-22.
[4] Nallapati R,Zhou B,Santos C N D,et al. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond[C]//Proceedings of the Signll Conference on Computational Natural Language Learning,2016:280-290.
[5] Jo Y,Hopcroft J E,Lagoze C. The web of topics:discovering the topology of topic evolution in a corpus[C]//Proceedings of the International Conference on World Wide Web,WWW 2011,Hyderabad,India,DBLP,2011:257-266.
[6] Allan J,Gupta R,Khandelwal V. Temporal summaries of news topics[C]//Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval,New Orleans,Louisiana,USA. DBLP,2001:10-18.
[7] Tran T A,Niederee C,Kanhabua N,et al. Balancing novelty and salience:Adaptive learning to rank entities for timeline Summarization of high-impact events[C]//Proceedings of the ACM International on Conference on Information and Knowledge Management. ACM,2015:1201-1210.
[8] Hai L C,Lee Y K. Query based event extraction along a timeline[C]//Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval,2004:425-432.
[9] Yan R,Wan X,Otterbacher J,et al. Evolutionary timeline summarization:A balanced optimization framework via iterative substitution[C]//Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval,2011:745-754.
[10] Yan R,Kong L,Huang C,et al. Timeline generation through evolutionary trans-temporal summarization[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,EMNLP 2011.
[11] Li J,Li S. Evolutionary hierarchical dirichlet process for timeline summarization[C]//Proceedings of Meeting of the Association for Computational Linguistics,2013:556-560.
[12] Wang W Y,Mehdad Y,Radev D R,et al. A Low-Rank approximation approach to learning joint embeddings of news stories and images for timeline summarization[C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2016:58-68.
[13] Meena Y K,Gopalani D. Evolutionary algorithms for extractive automatic text summarization [J].Procedia Computer Science,2015(48):244-249.
[14] 宋俊,韩啸宇,黄宇,等. 一种面向实体的演化式多文档摘要生成方法[J].广西师范大学学报(自然科学版),2015,33(02):36-41.
[15] 徐伟,赵斌,吉根林. 基于滑动窗口的微博时间线摘要算法[J].数据采集与处理,2017,32(3):523-532.
[16] Wang H,Zhou G. Topic-driven multi-document summarization[C]//Proceedings of International Conference on Asian Language Processing. IEEE,2011:195-198.
[17] Haveliwala T H. Topic-sensitive pageRank:A context-sensitive ranking algorithm for web search[J].Knowledge and Data Engineering IEEE Transactions on,2003,15(4):784-796.
[18] Carbonell J,Goldstein J. The use of MMR,diversity-based reranking for reordering documents and producing summaries[C]//Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM,1998:335-336.
[19] Tran G,Alrifai M,Herder E. Timeline summarization from relevant headlines[M].Advances in Information retrieval. Springer International Publishing,2015:245-256.
[20] Lin C Y,Hovy E. Automatic evaluation of summaries using N-gram co-occurrence statistics[C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Association for Computational Linguistics,2003:71-78.
[21] Erkan G,Radev D R. LexPageRank:Prestige in multi-document text summarization[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,EMNLP 2004,A Meeting of Sigdat,A Special Interest Group of the ACL,Held in Conjunction with ACL 2004,25-26 July 2004,Barcelona,Spain. DBLP,2004:365-371.
[22] 夏士雄,李文超,周勇,等. 一种改进的K-means聚类算法[J].东南大学学报(英文版),2007,23(3):435-438.

基金

国家自然科学基金(61402314)
PDF(1917 KB)

694

Accesses

0

Citation

Detail

段落导航
相关文章

/