Abstract:Recently, sentiment classification has become a hot research topic in natural language processing. In this paper, we focus on semi-supervised approaches for this issue. In contrast to the traditional method based on co-training, this paper presents a semi-supervised sentiment classification via voting based ensemble learning. We construct a set of diversified sub classifiers by choosing different training sets, feature parameters and classification methods. During each voting round, samples with highest confidence are picked out to double the size of training set and then to update the model. This new method also allows sub classifiers to share useful attributes sets. It has a logarithmic time complexity and can be used for non-equilibrium corpus. Experiments show that this method has achieved good results in the sentiment classification task with corpus in different languages, areas, sizes, and both balanced and unbalanced corpus.
[1] 来火尧, 刘功申. 基于主题相关性分析的文本倾向性研究[J]. 信息安全与通信保密, 2009, 3: 77-81. [2] 赵妍妍, 秦兵, 刘挺. 文本情感分析[J]. 软件学报, 2010, 21(8): 1834-1848. [3] 唐慧丰, 谭松波, 程学旗. 基于监督学习的中文情感分类技术比较研究 [J]. 中文信息学报, 2007, 21(6): 88-94. [4] 周志华, 王珏. 半监督学习中的协同训练风范[J]. 机器学习及其应用, 北京: 清华大学出版社, 2007: 259-275. [5] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training[C]//Proceedings of the eleventh annual conference on computational learning theory. ACM, 1998: 92-100. [6] 苏艳, 居胜峰, 王中卿, 等. 基于随机特征子空间的半监督情感分类方法研究[J]. 中文信息学报, 2012, 26(4): 85-90. [7] Dietterich T G. Ensemble methods in machine learning[M].Multiple classifier systems. Springer Berlin Heidelberg, 2000: 1-15. [8] Whitehead M, Yaeger L. Sentiment mining using ensemble classification models[M].Innovations and Advances in Computer Sciences and Engineering. Springer Netherlands, 2010: 509-514. [9] 李寿山, 黄居仁. 基于 Stacking 组合分类方法的中文情感分类研究[J]. 中文信息学报, 2010, 24(5): 56-61. [10] Su Y, Zhang Y, Ji D, et al. Ensemble learning for sentiment classification[M]//Chinese Lexical Semantics. Springer Berlin Heidelberg, 2013: 84-93. [11] Pang B, Lee L, Vaithyanathan S. Thumbs up?: sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 conference on empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002: 79-86. [12] Cui H, Mittal V, Datar M. Comparative experiments on sentiment classification for online product reviews[C]//Proceedings of the AAAI. 2006, 6: 1265-1270. [13] Pang B, Lee L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts[C]//Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2004: 271. [14] Breiman L. Bagging predictors[J]. Machine learning, 1996, 24(2): 123-140. [15] Schapire R E. The strength of weak learnability[J]. Machine learning, 1990, 5(2): 197-227. [16] Wolpert D H. Stacked generalization[J]. Neural networks, 1992, 5(2): 241-259. [17] Ho T K. The random subspace method for constructing decision forests[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 1998, 20(8): 832-844. [18] Wan X. Co-training for cross-lingual sentiment classification[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1. Association for Computational Linguistics, 2009: 235-243. [19] Li S, Huang C R, Zhou G, et al. Employing personal/impersonal views in supervised and semi-supervised sentiment classification[C]//Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, 2010: 414-423. [20] 高伟, 王中卿, 李寿山. 基于集成学习的半监督情感分类方法研究[J]. 中文信息学报, 2013, 27(3): 120-126. [21] http://www.cs.cornell.edu/People/pabo/movie-review-data/ [22] http://ai.stanford.edu/~amaas/data/sentiment/ [23] Blitzer J, Dredze M, Pereira F. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification[C]//Proceedings of the ACL.2007, 7: 440-447. [24] http://www.searchforum.org.cn/tansongbo/corpus-senti.htm [25] Zhou Y, Goldman S. Democratic co-learning[C]//Proceedings of the Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on. IEEE, 2004: 594-602.