基于事件项语义图聚类的多文档摘要方法

刘茂福1,李文捷2,姬东鸿3

PDF(1359 KB)
PDF(1359 KB)
中文信息学报 ›› 2010, Vol. 24 ›› Issue (5) : 77-85.
综述

基于事件项语义图聚类的多文档摘要方法

  • 刘茂福1,李文捷2,姬东鸿3
作者信息 +

Multi-Document Summarization Based on Event Term Semantic Relation Graph Clustering

  • LIU Maofu1, LI Wenjie2, JI Donghong3
Author information +
History +

摘要

基于事件的抽取式摘要方法一般首先抽取那些描述重要事件的句子,然后把它们重组并生成摘要。该文将事件定义为事件项以及与其关联的命名实体,并聚焦从外部语义资源获取的事件项语义关系。首先基于事件项语义关系创建事件项语义关系图并使用改进的DBSCAN算法对事件项进行聚类,接着为每类选择一个代表事件项或者选择一类事件项来表示文档集的主题,最后从文档抽取那些包含代表项并且最重要的句子生成摘要。该文的实验结果证明在多文档自动摘要中考虑事件项语义关系是必要的和可行的。

Abstract

Event-based extractive summarization attempts to extract sentences and re-organize them in a summary according to the important events that the sentences describe. In this paper, we define the event as event terms and their associated entities and emphasize on the event term semantic relations derived from external linguistic resource. Firstly, the graph based on the event term semantic relations is constructed and the event terms in the graph are grouped into clusters using the revised DBSCAN clustering algorithm. Then, we select one event term as the representative term for each cluster or one cluster to present the main topic of the documents. Lastly, we generate the summary by extracting the sentences which contain more informative representative terms from the documents. The evaluation on the DUC 2001 document sets shows it is necessary to take the semantic relations among the event terms into consideration and our summarization approach based on event term semantic relation graph clustering is effective.
Key wordsevent-based summarization; event semantic relation graph; DBSCAN clustering algorithm

关键词

基于事件的摘要 / 事件语义关系图 / DBSCAN聚类算法

Key words

event-based summarization / event semantic relation graph / DBSCAN clustering algorithm

引用本文

导出引用
刘茂福1,李文捷2,姬东鸿3. 基于事件项语义图聚类的多文档摘要方法. 中文信息学报. 2010, 24(5): 77-85
LIU Maofu1, LI Wenjie2, JI Donghong3. Multi-Document Summarization Based on Event Term Semantic Relation Graph Clustering. Journal of Chinese Information Processing. 2010, 24(5): 77-85

参考文献

[1] Naomi Daniel, Dragomir Radev and Timothy Allison. Sub-event based Multi-document Summarization[C]//Proceedings of the HLT-NAACL Workshop on Text Summarization. 2003: 9-16.
[2] Elena Filatova and Vasileios Hatzivassiloglou. Event-based Extractive Summarization[C]//Proceedings of ACL 2004 Workshop on Summarization, 2004: 104-111.
[3] 吴平博,陈群秀,马亮.基于时空分析的线索性事件的抽取与集成系统研究[J].中文信息学报,2006,20(1): 21-28.
[4] 袁毓林.用动词的论元结构跟事件模板相匹配——种由动词驱动的信息抽取方法[J].中文信息学报, 2005,19(5): 37-43.
[5] Wenjie Li, Wei Xu, Mingli Wu, et al. Extractive Summarization using Inter- and Intra- Event Relevance[C]//Proceedings of ACL 2006: 369-376.
[6] 徐永东,徐志明,王晓龙.基于信息融合的多文档自动文摘技术[J].计算机学报,2007,30(11): 2048-2054.
[7] 吴中勤,黄萱菁,吴立德. 基于语义关系三元组的问答式文摘[J].计算机工程,2008,34(6): 194-196.
[8] Vasileios Hatzivassiloglou, Judith L. Klavans, Melissa L. Holcombe, et al. Simfinder: A Flexible Clustering Tool for Summarization[C]//Workshop on Automatic Summarization, NAACL, 2001.
[9] Yohei Seki, Koji Eguchi and Noriko Kando. User-Focused Multi-Document Summarization with Paragraph Clustering and Sentence-Type Filtering[C]//Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies: Information Retrieval, QuestionAnswering, and Summarization, 2004: 459-466.
[10] 刘海涛,老松杨,韩智广.自动文摘系统中的段落自适应聚类研究[J].微计算机信息,2006,22(6): 288-291.
[11] 陈戈,段建勇,陆汝占.基于潜在语义索引和句子聚类的中文自动文摘[J].计算机仿真,2008,25(7): 82-85.
[12] Hongyuan Zha. Generic Summarization and keyphrase Extraction using Mutual Reinforcement Principle and Sentence Clustering[C]//Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 2002: 113-120.
[13] Advaith Siddharthan, Ani Nenkova and Kathleen McKeown. Syntactic Simplification for Improving Content Selection in Multi-Document Summarization[C]//Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), 2004: 896-902.
[14] Yi Guo and George Stylios. An Intelligent Summarization System Based on Cognitive Psychology[J]. Journal of Information Sciences, 2005, 174(1-2):1-36.
[15] 郭庆琳,吴克河,吴慧芳,李存斌.基于文本聚类的多文档自动文摘研究[J].计算机研究与发展,2007,44(z2): 140-144.
[16] Maofu Liu, Wenjie Li, Mingli Wu, et al. Extractive Summarization Based on Event Term Clustering[C]//Proceedings of ACL 2007, 185-188.
[17] Chklovski Timothy and Pantel Patrick. VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing, 2004.
[18] Martin Ester, Hans-peter Kriegel, S. J?rg, et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise [C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 1996: 226-231.
[19] Chin-Yew Lin and Eduard Hovy. Automatic Evaluation of Summaries using N-gram Cooccurrence Statistics[C]//Proceedings of HLTNAACL 2003, 71-78.
[20] Page Lawrence, Brin Sergey, Motwani Rajeev and Winograd Terry. The PageRank CitationRanking: Bring Order to the Web[R]. Technical Report,Stanford University, 1998.

基金

湖北省自然科学基金资助项目(2009CDB311);国家自然科学基金重大研究计划资助项目(90820005)
PDF(1359 KB)

653

Accesses

0

Citation

Detail

段落导航
相关文章

/