张志琳,宗成庆. 基于多样化特征的中文微博情感分类方法研究[J]. 中文信息学报, 2015, 29(4): 134-143.
ZHANG Zhilin, ZONG Chengqing. Sentiment Analysis of Chinese Micro Blog Based on Rich-features. , 2015, 29(4): 134-143.
基于多样化特征的中文微博情感分类方法研究
张志琳,宗成庆
中国科学院 自动化研究所 模式识别国家重点实验室,北京 100190
Sentiment Analysis of Chinese Micro Blog Based on Rich-features
ZHANG Zhilin, ZONG Chengqing
1. Shijiazhuang Vocational Technology Institute, Shijiazhuang, Hebei 050081, China; 2. Beijing Institute of Technology, Beijing 100081, China
Abstract:Micro blog, a new information-sharing platform, is now playing an important role in people’s daily live with the rise of Web 2.0. And micro blog sentiment analysis research also attracts more attention in recent years. This paper provides an in-depth analysis on the difference of feature representation and feature selection between the traditional sentiment classification and micro blog sentiment analysis. To avoid the drawbacks of feature selection of existing methods, we propose three simple but effective approaches for feature representation and selection, including the lexicalization hashtag feature, the sentiment word feature, and the probabilistic sentiment lexicon feature. Experimental results show that our proposed methods significantly boost the micro blog sentiment classification accuracy from 73.17% to 84.17%, outperforming the state-of-the-art method significantly.
[1]A Das, S Bandyopadhyay. Dr Sentiment knows everything![C]//Proceedings of the ACL-HLT, 2011: 50-55. [2] A Joshi, A Balamurali, P Bhattacharyya, et al. C-feel-it: A sentiment analyzer for micro-blogs[C]//Proceedings of the ACL-HLT, 2011 :127-132. [3] P Chesley, B Vincent, L Xu, et al. Using verbs and adjectives to automatically classify blog sentiment[J] .Training, 2006, 580(263). [4] 刘鲁,刘志明. 基于机器学习的中文微博情感分类实证研究[J]. 计算机工程与应用, 2012,48(1):1-4. [5] L Jiang, M Yu, M Zhou, et al. Target -dependent twitter sentiment classification[C]//Proceedings of ACL-HLT, 2011:151-160. [6] S Prasad. Micro-blogging Sentiment Analysis Using Bayesian Classification Methods[N]. Technical Report, Stanford University, 2010, Available at http://www-nlp.stanford.edu/courses/ [7] Y Lu, M Castellanos, U Dayal, et al. Automatic construction of a context-aware sentiment lexicon: an optimization approach[C]//Proceedings of the 20th international conference on World wide web, 2011:347-356. [8] P D Turney. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002 :417-424. [9] B Pang, L Lee, S Vaithyanathan. Thumbs up?: sentiment classification using machine learning techniques[C]//Proceedings of EMNLP, 2002:79-86. [10] T Mullen, N Collier. Sentiment Analysis using Support Vector Machines with Diverse Information Sources[C]//Proceedings of EMNLP, 2004: 412-418. [11] A Go, R Bhayani, L Huang. Twitter sentiment classification using distant supervision[J]. CS224N Project Report, Stanford University, 2009: 1-12. [12] A Pak, P Paroubek. Twitter as a corpus for sentiment analysis and opinion mining[C]//Proceedings of LREC, 2010:1320-1326. [13] D Davidov, O Tsur, A Rappoport. Enhanced sentiment learning using twitter hashtags and smileys[C]//Proceedings of the 23rd International Conference on Computational Linguistics,2010:241-249. [14] 谢丽星, 周明,孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报, 2012, 26(1):73-82. [15] 宗成庆. 统计自然语言处理[M]. 北京: 清华大学出版社, 2008. [16] T Dunning. Accurate methods for the statistics of surprise and coincidence[J]. Computational linguistics, 1993, 19(1): 61-74. [17] Dong Z, Dong Q. HowNet [EB/OL]. Available at http://www.keenage.com/ 2000 [18] C C Chang, C J Lin. LIBSVM: a library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology (TIST),2011,2(3):1-27. [19] K Wang, C Zong, K Y Su. A character-based joint model for Chinese word segmentation[C]//Proceedings of the 23rd International Conference on Computational Linguistics, 2010:1173-1181.