应用图模型来研究多文档自动摘要是当前研究的一个热点,它以句子为顶点,以句子之间相似度为边的权重构造无向图结构。由于此模型没有充分考虑句子中的词项权重信息以及句子所属的文档信息,针对这个问题,该文提出了一种基于词项—句子—文档的三层图模型,该模型可充分利用句子中的词项权重信息以及句子所属的文档信息来计算句子相似度。在DUC2003和DUC2004数据集上的实验结果表明,基于词项—句子—文档三层图模型的方法优于LexRank模型和文档敏感图模型。
Abstract
Graph model has been widely applied to document summarization by using sentence as the graph nodes, and the similarity between sentences as the weights of edge. However, the knowledge of terms and documents are neglected in this model. In this paper, we propose a tri-layer graph model based on the term, the sentence and the documentto make full use of knowledge when computing the similarity of sentences. The experimental results on the data sets of DUC2003 and DUC2004 show that the proposed model outperforms the state-of-the-art LexRank model and Document Sensitive Ranking model.
关键词
图模型 /
多文档自动摘要 /
句子相似度 /
词项—句子—文档图
{{custom_keyword}} /
Key words
graph model /
multi-document summarization /
the similarity of sentences /
term-sentence-document graph
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 刘挺, 王开铸. 自动文摘的四种主要方法[J]. 情报学报, 1999, 18(1): 11-19.
[2] 秦兵, 刘挺, 李生. 多文档自动文摘综述[J]. 中文信息学报, 2005, 19(6):13-20.
[3] E padma lahari, D V N Siva Kumar. A Comprehensive Survey on Feature Extraction in Text Summarization[J]. Computer Technology and Applications, 2014, 5(1): 248-256.
[4] Radev D, Winkel A, Topper M. Multi document centroid-based text summarization[C]//Proceedings of ACL'2002 Demo Session. ACL, 2002.
[5] Radev D R, Jing H, Stys′ M, et al. Centroid-based summarization of multiple documents[J]. Information Processing and Management, 2004, 40(6): 919-938.
[6] Erkan G, Radev D R. LexRank: Graph-based lexical centrality as salience in text summarization[J]. Journal of Artificial Intelligence Research (JAIR), 2004, 22(1): 457-479.
[7] Chen H, Jin H, Zhao F. PSG: a two-layer graph model for document summarization[J]. Frontiers of Computer Science, 2014, 8(1): 119-130.
[8] Canhasi E, Kononenko I. Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization[J]. Expert Systems with Applications, 2014, 41(2): 535-543.
[9] 纪文倩, 李舟军, 巢文涵, 等. 一种基于LexRank 算法的改进的自动文摘系统[J]. 计算机科学, 2010, 37(5): 151-154.
[10] Radev D, Allison T, Blair-Goldensohn S, et al. MEAD-a platform for multidocument multilingual text summarization[C]//Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'2004). LREC, 2004.
[11] Page L, Brin S, Motwani R, et al. The PageRank citation ranking: Bringing order to the web[R]. California: Stanford InfoLab, 1999.
[12] Wei F, Li W, Lu Q, et al. A document-sensitive graph model for multi-document summarization[J]. Knowledge and information systems, 2010, 22(2): 245-259.
[13] Blanco R, Lioma C. Random walk term weighting for information retrieval[C]//Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007: 829-830.
[14] Blanco R, Lioma C. Graph-based term weighting for information retrieval[J]. Information retrieval, 2012, 15(1): 54-92.
[15] Rousseau F, Vazirgiannis M. Graph-of-word and TW-IDF: new approach to ad hoc IR[C]//Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 2013: 59-68.
[16] Lin C Y. Rouge: a package for automatic evaluation of summaries[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation. ACL, 2005: 74-81.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61272212,61163006,61203313)
{{custom_fund}}