一种针对新闻话题的多文档文摘技术

岳大鹏1,饶 岚2,王 挺1

PDF(1172 KB)
PDF(1172 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (6) : 79-85.
综述

一种针对新闻话题的多文档文摘技术

  • 岳大鹏1,饶 岚2,王 挺1
作者信息 +

A Topic Based Multi-document Summarization for News

  • YUE Dapeng1, RAO Lan2, WANG Ting1
Author information +
History +

摘要

多文档文摘技术能帮助用户减少不必要的阅读时间,有广阔的应用前景。该文以新闻报道为处理对象,以MMR(Maximal Marginal Relevance)文摘提取算法为基础,针对目前新闻报道往往以专题形式组织展现的特点,提出了一种基于话题的多文档文摘方法。这种方法以话题关键字为打分依据,同时考虑句子位置特征等信息对句子的重要性进行评分。 该文利用TDT4的新闻报道语料对上述文摘方法进行了试验评价,将基于话题的文摘系统和两个Baseline文摘系统进行比较,取得了较好的实验结果,尤其在5%的压缩比例下有明显优势。

Abstract

Multi-document summarization aimed at minimizing unnecessary readings time is of great value nowadays. Considering that news today is usually arranged in topics, this paper takes this advantage and proposes a topic based multi-document summarization method employing MMR. This method treats key words of the topic description as the basis for sentences scoring, together with traditional features such as the sentence position. Experiments results on TDT4 corpus indicate that the proposed method performs better than 2 baseline systems, especially under the compression ratio of 5%.
Key wordsautomatic summarization; topic; natural language process; news

关键词

自动文摘 / 话题 / 自然语言处理 / 新闻

Key words

automatic summarization / topic / natural language process / news

引用本文

导出引用
岳大鹏1,饶 岚2,王 挺1. 一种针对新闻话题的多文档文摘技术. 中文信息学报. 2012, 26(6): 79-85
YUE Dapeng1, RAO Lan2, WANG Ting1. A Topic Based Multi-document Summarization for News. Journal of Chinese Information Processing. 2012, 26(6): 79-85

参考文献

[1] McKeown K R,Radev D R. Generating summaries of multiple news articles [C]//Proceedings of SIGIR95,1995: 74-82.
[2] Luhn H P. The automatic creation of literature abstracts [J]. IBM Journal of Research Development, April, 1958: 159-165.
[3] 洪宇, 张宇,刘挺,等.话题检测与跟踪的评测及研究综述[J]. 中文信息学报,2007,21(6):71-87.
[4] 秦兵,刘挺,李生. 多文档自动文摘综述[J]. 中文信息学报, 2005,19(6):13-20.
[5] 张其文, 李明. 多文档文摘提取方法的研究[J]. 兰州理工大学学报, 2007,1:96-99.
[6] 徐永东, 徐志明, 王晓龙. 基于信息融合的多文档自动文摘技术[J]. 计算机学报, 2007,30(11):2048-2054.
[7] Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries [C]//Proceedings of SIGIR’98, 1998:335-336.
[8] 徐永东,王亚东,刘杨,等. 多文档文摘中基于时间信息的句子排序策略研究[J]. 中文信息学报, 2009,23(4):
[9] 索红光,粱玉环,刘玉树. 基于时间戳的多文档自动文摘[J]. 计算机工程, 2007,33(16):164-172.
[10] Bossard A. Using Document Structure for Automatic Summarization[C]//Proceedings of SIGIR 2009 , 2009:850-858.
[11] Pitler E, Louis A, Nenkova A. Automatic Evaluation of Linguistic Quality in Multi-Document Summarization[C]//Proceedings of ACL2010,2010:544-552.
[12] Lin C. ROUGE: A package for automatic evaluation of summaries[C]//Proceedings of Workshop on Text Summarization Branches Out, Association for Computational Linguistics, 2004:74-81.
[13] Lin C, Hovy E. Automatic evaluation of summaries using n-gram co-occurrence statistics[C]//Proceedings of 2003 Language Technology Conference, 2003: 71-78.

基金

国家自然科学基金资助项目(61170156); 国家高技术研究发展计划 (863)资助项目( 2010AA012505)
PDF(1172 KB)

498

Accesses

0

Citation

Detail

段落导航
相关文章

/