情感文本分类(简称情感分类)是一种面向主观信息分类的文本分类任务。目前,由于其广泛的应用前景,该任务在自然语言处理研究领域中得到了普遍关注,相继出现多种用于情感文本分类的有监督的分类方法。该文具体研究四种不同的分类方法在中文情感分类上的应用,并且采用一种基于Stacking的组合分类方法,用以组合不同的分类方法。实验结果表明,该组合方法在所有领域都能够获得比最好基分类方法更好的分类效果。从而克服了分类方法领域依赖的困境(不同领域需要选择不同基分类方法才能获得更好的分类结果)。
Abstract
Sentiment-based text categorization (for short, sentiment classification) is a task of classifying text according to the subjective information in the text. Nowadays, it has been closely studied in the research field of natural language processing (NLP) due to its wide real applications. As a result, many supervised machine learning classification approaches have been applied to this task. In this paper, we research on four classification approaches and propose a new combination method based on stacking to combine these four approaches. Experimental results show that our combination method achieves better performances than the best single one. Therefore, this combination method can avoid selecting a suitable classification approach according to different domains.
Key wordscomputer application; natural language processing; sentiment classification; multiple classifier combination
关键词
计算机应用 /
中文信息处理 /
情感分类 /
组合分类器
{{custom_keyword}} /
Key words
computer application /
natural language processing /
sentiment classification /
multiple classifier combination
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-02).2002.
[2] 徐军, 丁宇新, 王晓龙. 使用机器学习方法进行新闻的情感自动分类[J]. 中文信息学报,2007,21(6): 95-100.
[3] 朱嫣岚, 闵锦, 周雅倩, 黄萱菁, 吴立德. 基于HowNet的词汇语义倾向计算[J]. 中文信息学报,2006,20(1): 14-20.
[4] 徐琳宏, 林鸿飞, 杨志豪. 基于语义理解的文本倾向性识别机制[J]. 中文信息学报,2007,21(1): 96-100.
[5] B. Pang and L. Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts [C]//Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL-04). 2004.
[6] E. Riloff, S. Patwardhan, and J. Wiebe. Feature subsumption for opinion analysis [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-06). 2006.
[7] H. Cui, V. Mittal, and M. Datar. Comparative experiments on sentiment classification for online product reviews [C]//Proceedings of AAAI-06, the 21st National Conference on Artificial Intelligence. 2006.
[8] 唐慧丰,谭松波,程学旗. 基于监督学习的中文情感分类技术比较研究[J]. 中文信息学报,2007,21(6): 88-94.
[9] S. Tan and J. Zhang. An empirical study of sentiment analysis for Chinese documents [J]. Expert Systems with Applications. 2008,34(4): 2622-2629.
[10] J. Li and M. Sun. Experimental study on sentiment classification of Chinese review using machine learning techniques [C]//Processing of International Conference on Natural Language Processing and Knowledge Engineering, (NLP-KE-07), 2007.
[11] M. Sahami. Learning limited dependence Bayesian classifiers [C]//Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996:335-338.
[12] V. Vapnik. The Nature of Statistical Learning Theory [M]. Springer, Berlin, 2005.
[13] T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms [C]//Proceedings of International Conference on Machine Learning (ICML-04). 2004.
[14] J. Blitzer, M. Dredze, and F. Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain adaptation for sentiment classification [C]//Proceedings of Annual Meeting on Association for Computational Linguistics (ACL-07). 2007.
[15] J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas. On combining classifiers [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20:226-239.
[16] R. Vilalta and Y. Drissi. A perspective view and survey of meta-learning [J]. Artificial Intelligence Review, 2002, 18(2): 77-95.
[17] Saso Dzeroski and Bernard Zenko: Is combining classifiers with stacking better than selecting the best one? [J].Machine Learning. 2004, 54(3): 255-273.
[18] Rie Ando and Tong Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data [J]. Journal of Machine Learning Research, 2005,6:1817-1853.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}