Abstract:In this paper, we study how to apply machine learning techniques to solve sentiment classification problems. The main task of sentiment classification is to determine whether news or reviews is negative or positive. Naive Bayes and Maximum Entropy classification are used for the sentiment classification of Chinese news and reviews. The experimental results show that the methods we employed perform well. The accuracy of classification can achieve about 90%. Moreover, we find that selecting the words with polarity as features, negation tagging and representing test documents as feature presence vectors can improve the performance of sentiment classification. Conclusively, sentiment classification is a more challenging problem.
[1] Kjersti Aas, Line Eikvil. Text Categorisation: A Survey [EB/OL]. Technical Report. Norwegian Computing Center, 1999. [2] Vasileios Hatzivassiloglou, Kathleen R. McKeown. Predicting the Semantic Orientation of Adjectives [A].In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the ACL[C]. 1997.174-181. [3] Turney Peter, Littman Michael. Measuring Praise and Criticism: Inference of Semantic Orientation from Association [J]. ACM Transactions on Information Systems, 2003, 21(4): 315-346. [4] Esuli, Andrea, Sebastiani, Fabrizio. Determining the Semantic Orientation of Terms Through Gloss Classification [A].In: Proceedings of CIKM-05, the ACM SIGIR Conference on Information and Knowledge Management [C]. 2005. 617-624. [5] Turney Peter. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews [A]. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics [C]. 2002. 417- 424. [6] Bo Pang, Lillian Lee, Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification Using Machine Learning Techniques [A]. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing [C]. 2002. 79-86. [7] Rebecca Bruce, Janyce Wiebe. Recognizing Subjectivity: A Case Study in Manual Tagging [J]. Natural Language Engineering, 1999, 5(2):1-16. [8] Janyce Wiebe, Ellen Riloff. Creating Subjective and Objective Sentence Classifiers from Unannotated Texts [A]. In: Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing [C]. 2005. [9] Pang, B., Lee, L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts [A]. In: Proceedings of ACL 2004 [C]. 2004. 217-278. [10] M.Gamon, A.Aue, et al. Pulse: Mining customer opinions from free text [A]. In: Proceedings of the 6th International Symposium on Intelligent Data Analysis [C]. 2005. 121-132. [11] Bing Liu, Minqing Hu, Junsheng Cheng. Opinion Observer: Analyzing and Comparing Opinions on the Web [A]. In: Proceedings of WWW2005 [C]. 2005. 324-351. [12] Sanjiv D, M.Chen. Yahoo! for Amazon: Extracting Market Sentiment from Stock Message Boards [A].In: Proceedings of the Asia Pacific Finance Association Annual Conference (APFA) [C]. 2001. [13] 朱嫣岚, 闵锦等. 基于HowNet 的词汇语义倾向计算 [J]. 中文信息学报, 2005, 20(1):14-20. [14] R W M Yuen, T Y W Chan et al. Morpheme-based Derivation of Bipolar Semantic Orientation of Chinese Words [A].In: Proceedings of the 20th International Conference on Computational Linguistics (COLING-2004) [C]. 2004. 1008-1014. [15] B K Tsou, R W M Yuen et al. Polarity Classification of Celebrity Coverage in the Chinese Press[A].In: International Conference on Intelligence Analysis[C]. Virginia, USA: 2005. [16] Lan M, S Y Sung et al. A Comparative Study on Term Weighting Schemes for Text Categorization [A]. International Joint Conference on Neural Networks[C]. 2005. [17] 王治敏,朱学锋,俞士汶. 基于现代汉语语法信息词典的词语情感评价研究 [J].Computational Linguistics and Chinese Language Processing.2005,10(4):581-592.