文本情感摘要任务旨在对带有情感的文本数据进行浓缩、提炼进而产生文本所表达的关于情感意见的摘要,用以帮助用户更好地阅读、理解情感文本的内容。该文主要研究多文档的文本情感摘要问题,重点针对网络上存在的同一个产品的多个评论进行摘要抽取。在情感文本中,情感相关性是一个重要的特点,该文将充分考虑情感信息对文本情感摘要的重要影响。同时,对于评论语料,质量高的评论或者说可信度高的评论可以帮助用户更好的了解评论中所评价的对象。因此,该文将充分考虑评论质量对文本情感摘要的影响。并且为了进行关于文本情感摘要的研究,该文收集并标注了一个基于产品评论的英文多文档文本情感摘要语料库。实验证明,情感信息和评论质量能够帮助多文档文本情感摘要,提高摘要效果。
Abstract
Opinion summarization aims to concentrate and refine the text data so as to generate a summary of the text regarding the expressed opinion. It helps users reading and understanding the content of the opinion text. This study focuses on multi-document opinion summarization where the main task is to generate a summary given amounts of reviews towards the same product. Opinion relevance is an important feature for opinion text, which is considered in our opinion summarization method. Meanwhile,users can better understand the objects that mentioned in the reviews by the help of high quality reviews or high credibility reviews, which is also considered in our method. We further collect and annotate an English multi-document corpus on product reviews. Empirical studies on the corpus demonstrate that incorporating opinion and quality information is effective for multi -document opinion summarization.
关键词
情感摘要 /
多文档 /
评论质量
{{custom_keyword}} /
Key words
opinion summarization /
multi-document /
reviews quality
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1]Ganesan K, C Zhai, J Han. Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions[C]//Proceeding of Coling-2008,2008.
[2] Chen P, Dhanasobhon S, Smith M. All Reviews Are Not Created Equal: The Disaggregate Impact of Reviews on Sales on Amazon.com[J]//Carnegie Mellon University.
[3] Soo-Min Kim, Patrick Pantel, Tim Chklovski, et al. Automatically Assessing Re-view Helpfulness[C]//Proceeding of EMNLP-2006, 2006.
[4] Hong Y, J Lu, J Yao, et al. What reviews are satisfactory: novel features for automatic helpfulness voting[C]//Proceeding of SIGIR-2012.
[5] Luhn H P. The Automatic Creation of Literature Abstracts[C]//Proceedings of the IRE National Convention.
[6] Lin C. Training a Selection Function for Extraction[C]//Proceedings of CIKM-1999.
[7] Radev D, H Jing, M Stys, et al. Centroid-based Summarization of Multiple Documents[J]. Information Processing and Management.2004,919-938.
[8] Radev DR, K McKe-own. Generating natural language summaries from multiple on-line sources[J]. Computational Linguistics,1998,24(3):1-31.
[9] Celikyilmaz A, D Hakkani-Tur. Discovery of Topically Coherent Sentences for Extractive Summarization[C]//Proceeding of ACL-2011.
[10] Pang B, Lillian L S. Vaithyanathan. Thumbs up sentiment classification using machine learning techniques[C]//Proceeding of EMNLP-2002.
[11] Hu M, B Liu. Mining and summarizing customer reviews[C]//Proceeding of KDD -2004.
[12] Ana-Maria Popescu, O Etzioni. Extracting product features and opinions from reviews[C]//Proceeding of HLT-EMNLP-2005.
[13] Snyder B, R Barzilay. Multiple aspect ranking using the good grief algorithm[C]//Proceeding of HLT-NAACL-2007.
[14] Lu Yue, ChengXiang Zhai, Neel Sundaresan. Rated aspect summarization of short comments[C]//Proceeding of WWW-2009.
[15] Lerman, Kevin, Sasha Blair-Goldensohn, et al. Sentiment summarization: Evaluating and learning user preferences[C]//Proceeding of EACL-2009.
[16] Ivan Titov R. Mcdonald. A joint model of text and aspect ratings for sentiment summarization[C]//Proceeding of ACL-2008.
[17] Wang H, Y Lu, C Zhai. Latent aspect rating analysis on review text data: a rating regression approach[C]//Proceeding of KDD-2010.
[18] Liu J, Cao Y, Lin C Y, et al. Low-quality product review detection in opinion summarization[C]//Proceeding of EMNLP-Coling- 2007.
[19] Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking: Bringing Order to the Web[R]. Technical report, Stanford Digital Libraries.
[20] Wan X, Yang J. Multi-document Summarization using Cluster-based Link Analysis[C]//Proceeding of SIGIR-2008.
[21] Li F, Tang Y, Huang M, et al. Answering Opinion Questions with Random Walks on Graphs[C]//Proceeding of ACL-2010.
[22] Baeza-Yates R., Ribeiro-Neto B. Modern Information Retrival[M]. ACM Press and Addison Wesley.
[23] Kleinberg M. Authoritative Sources in a Hyperlinked Environment.[C]//Proceeding of the ACM-SLAM.
[24] Lin C. ROUGE: a Package for Automatic Evaluation of Summaries[C]//Proceeding of ACL-2004.
[25] 宗成庆,统计自然语言处理[M],清华大学出版社, 2008.5.
[26] 张瑾,王小磊,许洪波,自动文摘评价方法综述[J],中文信息学报,2008,22(3):81-88.
[27] 秦兵,刘挺,李生,多文档自动文摘综述[J],中文信息学报,2005,19(6):13-20.
[28] 苗家,马军,陈竹敏,一种基于HITS算法的Blog文摘方法[J],中文信息学报,2011,25(1):104-109.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61003155,60873150);模式识别国家重点实验室开放课题基金资助项目
{{custom_fund}}