以微博为代表的社会媒体的飞速发展为情感分析方向带来巨大的资源,同时也对情感分析算法的性能提出了更大的挑战。其中,现有的情感词典尤其是中文情感词典规模不足是影响情感分析性能的一个重要因素。为此,该文基于海量的微博数据,使用简单的文本统计算法,构建了一个十万词语/词组的大规模情感词典。我们以情感分析的基础任务——情感分类为例,将大规模情感词典作为特征用于该任务上,实验结果表明大规模词典有助于情感分类性能的提高。
Abstract
Rapid development of social media, such as Micro-blog, brings lots of information as well as challenges for sentiment analysis. The limited size of Chinese sentiment lexicon is one critical influence on the performances of sentiment analysis. This paper proposes a simple statistical method to mine large amounts of sentiment words or phrases to construct a large scale 100,000 words/phrases from microblogs. We apply this large-scale lexicon to Chinese microblog sentiment classification, and the results confirm a clear performance improvement.
关键词
情感词典 /
情感分析 /
情感分类 /
微博
{{custom_keyword}} /
Key words
sentiment lexicon /
sentiment analysis /
sentiment classification /
chinese microblog
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 赵妍妍,秦兵,刘挺.文本情感分析[J].软件学报,2010,21(8): 1834-1848.
[2] Pang B, Lee L. Opinion mining and sentiment analysis[J]. Foundations and Trends in Information Retrieval. 2008,2(1-2): 1-135.
[3] L Velikovich, S Blair-Goldensohn, K. Hannan, R McDonald. The viability of web-derived polarity lexicons[C]//Proceedings of the NAACL, 2010: 777-785.
[4] S Mohammad, S Kiritchenko, X Zhu. NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets[C]//Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM), 2013: 321-327.
[5] V Hatzivassiloglou, K McKeown. Predicting the semantic orientation of adjectives[C]//Proceedings of the EACL, 1997: 174-181.
[6] J Wiebe. Learning subjective adjectives from corpora[C]//Proceedings of the AAAI, 2000: 735-740.
[7] P Turney, M Littman. Measuring praise and criticism: Inference of semantic orientation from association[J]. ACM Trans. on Information Systems, 2003,21(4): 315-346.
[8] SKim, E Hovy. Automatic detection of opinion bearing words and sentences[C]//Proceedings of the IJCNLP, 2005: 61-66.
[9] S Kim, E Hovy. Identifying and analyzing judgment opinions[C]//Proceedings of the NAACL, 2006: 200-207.
[10] D Rao, D Ravichandran. Semi-Supervised polarity lexicon induction[C]//Proceedings of the EACL, 2009: 675-682.
[11] 徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185.
[12] 李军. 中文评论的褒贬义分类实验研究[D].清华大学硕士学位论文,2008.
[13] F Li, S Pan, O Jin, et al. Cross-Domain Co-Extraction of Sentiment and Topic Lexicons[C]//Proceedings of the 50th ACL, 2012: 410-419.
[14] B Pang, L Lillian, V Shivakumar. Thumbs up? Sentiment Classification using Machine Learning Techniques[C]//Proceedings of the EMNLP, 2002: 79-86.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
中国博士后科学基金(2012M520740, 2013T60373, 2012M520142)
{{custom_fund}}