多文档自动文摘能够帮助人们自动、快速地获取信息,是目前的一个研究热点。相比于单文档自动文摘,多文档自动文摘需要更多考虑文档之间的相关性,以及文档信息之间的冗余性。因此如何控制信息冗余是多文档自动文摘的一个关键所在。该文在考虑文摘特性的基础上提出了一个冗余度控制模型,该模型通过计算文本单元在主题概率分布之间的相似度来决定句子的选择,从而达到控制冗余的目的。实验结果表明,该方法能够有效降低冗余度,且总体性能优于现有的自动文摘系统。
Abstract
Multi-document summarization can help people to access information automatically and fast. Compared to single-document summarization, multi-document lays more emphasis on the correlation and redundancy between documents. Therefore, how to control information redundancy is a key problem to multi-document summarization. This paper proposes a model of redundancy control based on the features of summary. In this model, various similarities among the text units over topics probability distribution are used to determine the choice of a sentence. Experimental results show that this method can reduce redundancy effectively, and produce better overall performance than existing systems.
Key wordsreduandancy control; multi-document summarization; Chinese automatic summarization
关键词
冗余度控制 /
多文档自动文摘 /
中文自动文摘
{{custom_keyword}} /
Key words
reduandancy control /
multi-document summarization /
Chinese automatic summarization
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 刘德喜,何炎祥,姬东鸿,等.一种基于演化算法进行句子抽取的多文档自动摘要系统SBGA[J].中文信息学报, 2006, 20(6):14-20.
[2] 傅间莲,陈群秀. 基于规则和统计的中文自动文摘系统[J].中文信息学报, 2006,20(6):10-16.
[3] 马慧芳,祁云平,杨小东. 一种基于文本关系图的多文档自动摘要技术[J].情报学报, 2007,23(3):67-69.
[4] 宋锐,林鸿飞. 基于文档语义图的中文多文档摘要生成集中[J].中文信息学报, 2009,23(3):110-115.
[5] 徐永东, 徐志明, 王晓龙. 基于信息融合的多文档自动文摘技术[J]. 计算机学报. 2007,30(11): 2048-2054.
[6] Radev, DR., H. Jing, M. Budzikowska. Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies[C]. ANLP/ NAACL 2000: 21-29.
[7] Radev, D., Jing, H., Sty s, M., et al. Centroid-based summarization of multiple documents[J]. Information Processing and Management 2004, 40:919-938.
[8] Haghighi A., Vanderwende L. Exploring Content Models for Multi-Document Summarization[C]//NAACL2009:362-370.
[9] Hongling Wang, Guodong Zhou. Topic-driven Multi-document Summarization[C]//IALP’2010.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60873150,60970056),江苏省高校自然科学基金资助项目(10KJB520016)
{{custom_fund}}