一种基于云模型的文摘单元选取方法研究

陈劲光

PDF(8134 KB)
PDF(8134 KB)
中文信息学报 ›› 2016, Vol. 30 ›› Issue (5) : 187-194.
综述

一种基于云模型的文摘单元选取方法研究

  • 陈劲光
作者信息 +

A Summarization Unit Selecting Method Based on Cloud Model

  • CHEN Jinguang
Author information +
History +

摘要

该文提出了一种基于云模型的文摘单元选取方法,利用云模型,全面考虑文摘单元的随机性和模糊性,提高面向查询的多文档自动文摘系统的性能。首先计算文摘单元和查询条件的相关性,将文摘单元和各个查询词的相关度看成云滴,通过对云的不确定性的计算,找出与查询条件真正意义相关的文摘单元;随后利用文档集合重要度对查询相关的结果进行修正,将文摘句和其他各文摘句的相似度看成云滴,利用云的数字特征计算句子重要度,找出能够概括尽可能多的文档集合内容的句子,避免片面地只从某一个方面回答查询问题。为了证明文摘单元选取方法的有效性,在英文大规模公开语料上进行了实验,并参加了国际自动文摘公开评测,取得了较好的成绩。

Abstract

This paper proposes a summarization unit selection method based on the cloud model. The cloud model is used to consider randomness as well as fuzziness on distribution of summarization unit. In obtaining relevance between summarization unit and query, the scores of relevance between the word and each query word are seen as cloud drops. According to the uncertainty of cloud, a summarization unit which is more relevant to the query is given higher score. After that, the importance in the document set is also considered to evaluate the sentence's ability to summarize content of the document set. Similarities between a sentence and all sentences in document set are considered as cloud drops. All these cloud drops become a cloud, which indicates the sentence's ability to summarize content of the document set. The effectiveness of the proposed method is demonstrated on large-scale open benchmark corpus in English. The method was also examined by TAC (Text Analysis Conference) 2010 with satisfactory results. Key words: cloud model; query-focused multi-document summarization; uncertainty 收稿日期: 2016-00-00 定稿日期: 2016-00-00 基金项目: 教育部人文社会科学一般项目(13YJCZH013)、湖州师范学院人文社科预研究项目(KY27015A )

关键词

云模型 / 自动文摘 / 不确定性

引用本文

导出引用
陈劲光. 一种基于云模型的文摘单元选取方法研究. 中文信息学报. 2016, 30(5): 187-194
CHEN Jinguang. A Summarization Unit Selecting Method Based on Cloud Model. Journal of Chinese Information Processing. 2016, 30(5): 187-194

参考文献

[1] K Toutanova, C Brockett, M Gamon, et al.The Pythy Summarization System: Microsoft Research at DUC 2007[C]//Proceedings of Document Understanding Conference, 2007.
[2] X J Wan, J W Yang. Improved affinity graph based multi-document summarization[C]//Proceedings of HLTANNCL,2006: 181-184.
[3] A Haghighi, L Vanderwende. Exploring content models for multi-document summarization[C]//Proceedings of NAACL-HLT, 2009: 362-370.
[4] L Ferrier. A Maximum Entropy Approach to Text Summarization[D]. School of Artificial Intelligence, Division of Informatics, University of Edinburgh,2001.
[5] G Ravindra, N Balakrishnan, K R Ramakrishnan. Multi-Document Automatic Text Summarization Using Entropy Estimates[C]//Proceedings of SOFSEM, 2004: 289-300.
[6] F R Isfahani, F Kyoomarsi, H Khosravi, et al. Application of Fuzzy Logic in the Improvement of Text Summarization[C]//Proceedings of IADIS International Conference Informatics, 2008: 347-352.
[7] M S Binwahlan, N Salim, L Suanmali. Fuzzy Swarm Based Text Summarization Journal of Computer Science[J] 2009,5(5): 338-346.
[8] R Witte,S Bergler. Fuzzy Coreference Resolution for Summarization[C]//Proceedings of International Symposium on Reference Resolution and Its Applications to Question Answering and Summarization (ARQAS), 2003: 43-50.
[9] D Y Li, X Shi, M M Gupta. Soft Inference Mechanism Based on Cloud Models[C]//Proceedings of the 1st International Workshop on Logic Programming and Soft Computing: Theory and Applications (LPSC), 1996: 38-62.
[10] 李德毅, 杜鹢.不确定性人工智能[M], 国防工业出版社, 2005年第1版。[11] 邸凯昌, 李德毅.云理论及其在空间数据发掘和知识发展中的应用[J].中国图象图形学报: A辑,1999,4(11): 930-935.
[12] 杜鹢, 宋自林, 李德毅. 基于云模型的关联规则挖掘方法[J].解放军理工大学学报(自然科学版), 2000,1(1): 29-34.
[13] 蒋嵘, 李德毅.基于形态表示的时间序列相似性搜索[J].计算机研究与发展,2000,37(5): 601-608.
[14] D Y Li, H Chen, J H Fan, et al. A Novel Qualitative Control Method to Inverted Pendulum Systems[C]//Proceedings of the 14th International Federation of Automatic Control World Congress, 1999.
[15] H Long, Z H He, S Q Li, et al. Automated Summarization Evaluation Based on Clouds Model[C]//Proceedings of China Information Retrieval Conference (CCIR 2009), 2009: 9-16.
[16] G Salton, A Wong, C S Yang. A Vector Space Model for Automatic Indexing[J]. In Communications of the ACM, 1975,18(11): 613-620.
[17] J L Neto, A D Santos, C A A Kaestner, et al. Document clustering and text summarization[C]//Proceedings of 4th Int. Conf. Practical Applications of Knowledge Discovery and Data Mining, 2000: 41-55.
[18] K Lund, C Burgess. Producing high-dimensional semantic spaces from lexical co-occurrence[J]. Behavior Research Methods, Instrumentation, and Computers, 1996,28: 203-208.
[19] J Jagarlamudi, P Pingali, V Varma. A Relevance-Based Language Modeling Approach to DUC 2005[C]//Proceedings of Document Understanding Conference, 2005.
[20] G Minnen, J Carroll, D Pearce.Applied morphological processing of English[J]. Natural Language Engineering, 2001,7(3): 207-223.
[21] J G Carbonell,J Goldstein. The use of MMR, diversity-based re-ranking for reordering documents and producing summaries[C]//Proceedings of SIGIR, 1998: 335-336.
[22] C Y Lin E Hovy. “Automatic evaluation of summaries using n-gram co-occurrence statistics[C]//Proceedings of NLT-NAACL, 2003: 71-78.
[23] 陈劲光.基于云模型的中文面向查询多文档自动文摘研究[D].华中师范大学,2011.

基金

教育部人文社会科学一般项目(13YJCZH013)、湖州师范学院人文社科预研究项目(KY27015A )
PDF(8134 KB)

Accesses

Citation

Detail

段落导航
相关文章

/