本文提出了基于段落匹配和分布密度的偏重文本摘要实现机制,旨在满足摘要的个性化要求。首先在关键字同义扩充的基础上,利用基于侧面相似度的段落匹配方法,获取相关的文本段落集合。然后通过计算文本窗口的分布密度函数,获取关键字集聚区域,依据覆盖区域的句子权重,输出的最终偏重摘要。最后进行了评价实验,通过问答测验和相似比较,效果良好,而且表明偏重摘要对于多主题文本更为有效。
Abstract
There is an important issue that text summarization has to embody the personal information need and provide the indicative message for user. In this paper, a mechanism of query-biased summarization is presented based on passage matching and density distribution. First, each keyword and its synonymies are regarded as a query profile, and then the relevant passages are retrieved by profile matching. The density of term in these passages is calculated by Hanning window function, and the centralizing areas of keywords are acquired. Considering the density distribution and the number of keywords included, the important sentences are extracted as the final output query-biased summarization. The evaluations were made through Question and Answering test and similarity comparison, and it showed that our mechanism improved the ability to meet personal information need and illustrated more effective on multi-theme texts.
关键词
计算机应用 /
中文信息处理 /
文本摘要 /
偏重摘要 /
同义扩充 /
段落匹配 /
分布密度
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
text summarization /
query-biased summarization /
synonymous expansion /
passage match /
density distribution
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 郭燕慧,钟义信,马志勇,姚均勇.自动文摘综述[J].情报学报,2002,21(5): 582-591.
[2] 郑义,黄萱菁,吴立德.文本自动综述系统的研究[J].计算机研究与发展,2003,40(11): 1606-1611.
[3] Ryen White, Joe M. Jose, Ian Ruthven, Using Top-Ranking Sentences for Web Search Result Presentation [A], In: Proceedings of the 12th International World Wide Web Conference[C], 2003.
[4] Tsutomu Hirao, Hideki Isozaki, Eisaku Maeda, Yuji Matsumoto, Extracting Important Sentences with Support Vector Machine [A], In: Proceedings of the 19th International Conference on Computational Linguistics[C],2002.
[5] Auastasios Tombros, Mark Sanderson. Advantages of Query Biased Summaries in Information Retrieval [A], In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C], 1998.
[6] Mingfang Wu, Ross Wilkinson,Cecile Paris, Evaluation a Query-biased Document Summarization Approach for the Question Answering Task [A], In: Proceedings of 2004 Australasian Language Technology Workshop (ALTW2004) [C],2004.
[7] Manabu Okumura,Hajime Mochizuki, Manabu, Hajime, Query-Biased Summarization Based on Lexical Chaining, Computational Intelligence[J], Vol.16, No.4, 578-585, 2000.
[8] Ryen White, Joe M. Jose, Ian Ruthven, Query-Biased Web Page Summarization: A Task-Oriented Evaluation [A], In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C], 2001.
[9] Mark Sanderson, Accurate User Directed Summarization from Existing Tools [A]. In: Proceedings of the 7th International Conference on Information and Knowledge Management[C],1998.
[10] 林鸿飞,姚天顺.基于潜在语义分析的文本浏览机制[J].中文信息学报,2000,14(5): 49-56.
[11] Koichi Kise, Markus Junker, Andreas Dengel, Keinosuke Matsumoto: Passage-Based Document Retrieval as a Tool for Text Mining with User’s Information Needs [A], In: Proceedings of the 4th International Conference on Discovery Science[C], Springer -Verlag: 155-169.2001.
[12] Koichi Kise, Markus Junker, Andreas Dengel, Keinosuke Matsumoto, Passage Retrieval Based on Density Distributions of Terms and Its Applications to Document Retrieval and Question Answering. Reading and Learning: Adaptive Content Recognition[C], Springer-Verlag, 306-327,2004.
[13] 林鸿飞,战学刚,姚天顺.基于概念的文本结构分析方法[J].计算机研究和发展,2000,37(3): 324-328.
[14] 新浪网新闻,http://news.sina.com.cn[EB].
[15] 林鸿飞、高仁璟.基于潜在语义分析的文本摘要系统[J].大连理工大学学报,2001,41(6): 744-748.
[16] 吴立德.大规模中文文本处理[M],复旦大学出版社,1997.
[17] 姚天顺.自然语言理解[M],清华大学出版社,2002.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60373095; 60673039)
{{custom_fund}}