Abstract:This paper proposes a summarization unit selection method based on the cloud model. The cloud model is used to consider randomness as well as fuzziness on distribution of summarization unit. In obtaining relevance between summarization unit and query, the scores of relevance between the word and each query word are seen as cloud drops. According to the uncertainty of cloud, a summarization unit which is more relevant to the query is given higher score. After that, the importance in the document set is also considered to evaluate the sentence's ability to summarize content of the document set. Similarities between a sentence and all sentences in document set are considered as cloud drops. All these cloud drops become a cloud, which indicates the sentence's ability to summarize content of the document set. The effectiveness of the proposed method is demonstrated on large-scale open benchmark corpus in English. The method was also examined by TAC (Text Analysis Conference) 2010 with satisfactory results. Key words: cloud model; query-focused multi-document summarization; uncertainty 收稿日期: 2016-00-00 定稿日期: 2016-00-00 基金项目: 教育部人文社会科学一般项目(13YJCZH013)、湖州师范学院人文社科预研究项目(KY27015A )
[1] K Toutanova, C Brockett, M Gamon, et al.The Pythy Summarization System: Microsoft Research at DUC 2007[C]//Proceedings of Document Understanding Conference, 2007. [2] X J Wan, J W Yang. Improved affinity graph based multi-document summarization[C]//Proceedings of HLTANNCL,2006: 181-184. [3] A Haghighi, L Vanderwende. Exploring content models for multi-document summarization[C]//Proceedings of NAACL-HLT, 2009: 362-370. [4] L Ferrier. A Maximum Entropy Approach to Text Summarization[D]. School of Artificial Intelligence, Division of Informatics, University of Edinburgh,2001. [5] G Ravindra, N Balakrishnan, K R Ramakrishnan. Multi-Document Automatic Text Summarization Using Entropy Estimates[C]//Proceedings of SOFSEM, 2004: 289-300. [6] F R Isfahani, F Kyoomarsi, H Khosravi, et al. Application of Fuzzy Logic in the Improvement of Text Summarization[C]//Proceedings of IADIS International Conference Informatics, 2008: 347-352. [7] M S Binwahlan, N Salim, L Suanmali. Fuzzy Swarm Based Text Summarization Journal of Computer Science[J] 2009,5(5): 338-346. [8] R Witte,S Bergler. Fuzzy Coreference Resolution for Summarization[C]//Proceedings of International Symposium on Reference Resolution and Its Applications to Question Answering and Summarization (ARQAS), 2003: 43-50. [9] D Y Li, X Shi, M M Gupta. Soft Inference Mechanism Based on Cloud Models[C]//Proceedings of the 1st International Workshop on Logic Programming and Soft Computing: Theory and Applications (LPSC), 1996: 38-62. [10] 李德毅, 杜鹢.不确定性人工智能[M], 国防工业出版社, 2005年第1版。[11] 邸凯昌, 李德毅.云理论及其在空间数据发掘和知识发展中的应用[J].中国图象图形学报: A辑,1999,4(11): 930-935. [12] 杜鹢, 宋自林, 李德毅. 基于云模型的关联规则挖掘方法[J].解放军理工大学学报(自然科学版), 2000,1(1): 29-34. [13] 蒋嵘, 李德毅.基于形态表示的时间序列相似性搜索[J].计算机研究与发展,2000,37(5): 601-608. [14] D Y Li, H Chen, J H Fan, et al. A Novel Qualitative Control Method to Inverted Pendulum Systems[C]//Proceedings of the 14th International Federation of Automatic Control World Congress, 1999. [15] H Long, Z H He, S Q Li, et al. Automated Summarization Evaluation Based on Clouds Model[C]//Proceedings of China Information Retrieval Conference (CCIR 2009), 2009: 9-16. [16] G Salton, A Wong, C S Yang. A Vector Space Model for Automatic Indexing[J]. In Communications of the ACM, 1975,18(11): 613-620. [17] J L Neto, A D Santos, C A A Kaestner, et al. Document clustering and text summarization[C]//Proceedings of 4th Int. Conf. Practical Applications of Knowledge Discovery and Data Mining, 2000: 41-55. [18] K Lund, C Burgess. Producing high-dimensional semantic spaces from lexical co-occurrence[J]. Behavior Research Methods, Instrumentation, and Computers, 1996,28: 203-208. [19] J Jagarlamudi, P Pingali, V Varma. A Relevance-Based Language Modeling Approach to DUC 2005[C]//Proceedings of Document Understanding Conference, 2005. [20] G Minnen, J Carroll, D Pearce.Applied morphological processing of English[J]. Natural Language Engineering, 2001,7(3): 207-223. [21] J G Carbonell,J Goldstein. The use of MMR, diversity-based re-ranking for reordering documents and producing summaries[C]//Proceedings of SIGIR, 1998: 335-336. [22] C Y Lin E Hovy. “Automatic evaluation of summaries using n-gram co-occurrence statistics[C]//Proceedings of NLT-NAACL, 2003: 71-78. [23] 陈劲光.基于云模型的中文面向查询多文档自动文摘研究[D].华中师范大学,2011.