A Topic Based Multi-document Summarization for News
YUE Dapeng1, RAO Lan2, WANG Ting1
1. School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China; 2. School of Humanities and Social Sciences, National University of Defense Techology, Changsha, Hunan 410073, China
Abstract:Multi-document summarization aimed at minimizing unnecessary readings time is of great value nowadays. Considering that news today is usually arranged in topics, this paper takes this advantage and proposes a topic based multi-document summarization method employing MMR. This method treats key words of the topic description as the basis for sentences scoring, together with traditional features such as the sentence position. Experiments results on TDT4 corpus indicate that the proposed method performs better than 2 baseline systems, especially under the compression ratio of 5%. Key wordsautomatic summarization; topic; natural language process; news
[1] McKeown K R,Radev D R. Generating summaries of multiple news articles [C]//Proceedings of SIGIR95,1995: 74-82. [2] Luhn H P. The automatic creation of literature abstracts [J]. IBM Journal of Research Development, April, 1958: 159-165. [3] 洪宇, 张宇,刘挺,等.话题检测与跟踪的评测及研究综述[J]. 中文信息学报,2007,21(6):71-87. [4] 秦兵,刘挺,李生. 多文档自动文摘综述[J]. 中文信息学报, 2005,19(6):13-20. [5] 张其文, 李明. 多文档文摘提取方法的研究[J]. 兰州理工大学学报, 2007,1:96-99. [6] 徐永东, 徐志明, 王晓龙. 基于信息融合的多文档自动文摘技术[J]. 计算机学报, 2007,30(11):2048-2054. [7] Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries [C]//Proceedings of SIGIR’98, 1998:335-336. [8] 徐永东,王亚东,刘杨,等. 多文档文摘中基于时间信息的句子排序策略研究[J]. 中文信息学报, 2009,23(4): [9] 索红光,粱玉环,刘玉树. 基于时间戳的多文档自动文摘[J]. 计算机工程, 2007,33(16):164-172. [10] Bossard A. Using Document Structure for Automatic Summarization[C]//Proceedings of SIGIR 2009 , 2009:850-858. [11] Pitler E, Louis A, Nenkova A. Automatic Evaluation of Linguistic Quality in Multi-Document Summarization[C]//Proceedings of ACL2010,2010:544-552. [12] Lin C. ROUGE: A package for automatic evaluation of summaries[C]//Proceedings of Workshop on Text Summarization Branches Out, Association for Computational Linguistics, 2004:74-81. [13] Lin C, Hovy E. Automatic evaluation of summaries using n-gram co-occurrence statistics[C]//Proceedings of 2003 Language Technology Conference, 2003: 71-78.