Large-scale Sentiment Lexicon Collection and Its Application in Sentiment Classification
ZHAO Yanyan1, QIN Bing2 , SHI Qiuhui2, LIU Ting2
1 Department of Media Technology and Art, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China; 1 Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
Abstract:Rapid development of social media, such as Micro-blog, brings lots of information as well as challenges for sentiment analysis. The limited size of Chinese sentiment lexicon is one critical influence on the performances of sentiment analysis. This paper proposes a simple statistical method to mine large amounts of sentiment words or phrases to construct a large scale 100,000 words/phrases from microblogs. We apply this large-scale lexicon to Chinese microblog sentiment classification, and the results confirm a clear performance improvement.
[1] 赵妍妍,秦兵,刘挺.文本情感分析[J].软件学报,2010,21(8): 1834-1848. [2] Pang B, Lee L. Opinion mining and sentiment analysis[J]. Foundations and Trends in Information Retrieval. 2008,2(1-2): 1-135. [3] L Velikovich, S Blair-Goldensohn, K. Hannan, R McDonald. The viability of web-derived polarity lexicons[C]//Proceedings of the NAACL, 2010: 777-785. [4] S Mohammad, S Kiritchenko, X Zhu. NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets[C]//Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM), 2013: 321-327. [5] V Hatzivassiloglou, K McKeown. Predicting the semantic orientation of adjectives[C]//Proceedings of the EACL, 1997: 174-181. [6] J Wiebe. Learning subjective adjectives from corpora[C]//Proceedings of the AAAI, 2000: 735-740. [7] P Turney, M Littman. Measuring praise and criticism: Inference of semantic orientation from association[J]. ACM Trans. on Information Systems, 2003,21(4): 315-346. [8] SKim, E Hovy. Automatic detection of opinion bearing words and sentences[C]//Proceedings of the IJCNLP, 2005: 61-66. [9] S Kim, E Hovy. Identifying and analyzing judgment opinions[C]//Proceedings of the NAACL, 2006: 200-207. [10] D Rao, D Ravichandran. Semi-Supervised polarity lexicon induction[C]//Proceedings of the EACL, 2009: 675-682. [11] 徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185. [12] 李军. 中文评论的褒贬义分类实验研究[D].清华大学硕士学位论文,2008. [13] F Li, S Pan, O Jin, et al. Cross-Domain Co-Extraction of Sentiment and Topic Lexicons[C]//Proceedings of the 50th ACL, 2012: 410-419. [14] B Pang, L Lillian, V Shivakumar. Thumbs up? Sentiment Classification using Machine Learning Techniques[C]//Proceedings of the EMNLP, 2002: 79-86.