本文主要研究机器学习方法在新闻文本的情感分类中的应用,判断其是正面还是负面。我们利用朴素贝叶斯和最大熵方法进行新闻及评论语料的情感分类研究。实验表明,机器学习方法在基于情感的文本分类中也能取得不错的分类性能,最高准确率能达到90%。同时我们也发现,对于基于情感的文本分类,选择具有语义倾向的词汇作为特征项、对否定词正确处理和采用二值作为特征项权重能提高分类的准确率。总之,基于情感的文本分类是一个更具挑战性的工作。
Abstract
In this paper, we study how to apply machine learning techniques to solve sentiment classification problems. The main task of sentiment classification is to determine whether news or reviews is negative or positive. Naive Bayes and Maximum Entropy classification are used for the sentiment classification of Chinese news and reviews. The experimental results show that the methods we employed perform well. The accuracy of classification can achieve about 90%. Moreover, we find that selecting the words with polarity as features, negation tagging and representing test documents as feature presence vectors can improve the performance of sentiment classification. Conclusively, sentiment classification is a more challenging problem.
关键词
计算机应用 /
中文信息处理 /
文本分类 /
情感分析 /
贝叶斯 /
最大熵
{{custom_keyword}} /
Key words
: computer application /
Chinese information processing /
text categorization /
sentiment analysis /
Nave Bayes /
maximum entropy
/
/
/
/
/
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Kjersti Aas, Line Eikvil. Text Categorisation: A Survey [EB/OL]. Technical Report. Norwegian Computing Center, 1999.
[2] Vasileios Hatzivassiloglou, Kathleen R. McKeown. Predicting the Semantic Orientation of Adjectives [A].In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the ACL[C]. 1997.174-181.
[3] Turney Peter, Littman Michael. Measuring Praise and Criticism: Inference of Semantic Orientation from Association [J]. ACM Transactions on Information Systems, 2003, 21(4): 315-346.
[4] Esuli, Andrea, Sebastiani, Fabrizio. Determining the Semantic Orientation of Terms Through Gloss Classification [A].In: Proceedings of CIKM-05, the ACM SIGIR Conference on Information and Knowledge Management [C]. 2005. 617-624.
[5] Turney Peter. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews [A]. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics [C]. 2002. 417- 424.
[6] Bo Pang, Lillian Lee, Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification Using Machine Learning Techniques [A]. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing [C]. 2002. 79-86.
[7] Rebecca Bruce, Janyce Wiebe. Recognizing Subjectivity: A Case Study in Manual Tagging [J]. Natural Language Engineering, 1999, 5(2):1-16.
[8] Janyce Wiebe, Ellen Riloff. Creating Subjective and Objective Sentence Classifiers from Unannotated Texts [A]. In: Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing [C]. 2005.
[9] Pang, B., Lee, L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts [A]. In: Proceedings of ACL 2004 [C]. 2004. 217-278.
[10] M.Gamon, A.Aue, et al. Pulse: Mining customer opinions from free text [A]. In: Proceedings of the 6th International Symposium on Intelligent Data Analysis [C]. 2005. 121-132.
[11] Bing Liu, Minqing Hu, Junsheng Cheng. Opinion Observer: Analyzing and Comparing Opinions on the Web [A]. In: Proceedings of WWW2005 [C]. 2005. 324-351.
[12] Sanjiv D, M.Chen. Yahoo! for Amazon: Extracting Market Sentiment from Stock Message Boards [A].In: Proceedings of the Asia Pacific Finance Association Annual Conference (APFA) [C]. 2001.
[13] 朱嫣岚, 闵锦等. 基于HowNet 的词汇语义倾向计算 [J]. 中文信息学报, 2005, 20(1):14-20.
[14] R W M Yuen, T Y W Chan et al. Morpheme-based Derivation of Bipolar Semantic Orientation of Chinese Words [A].In: Proceedings of the 20th International Conference on Computational Linguistics (COLING-2004) [C]. 2004. 1008-1014.
[15] B K Tsou, R W M Yuen et al. Polarity Classification of Celebrity Coverage in the Chinese Press[A].In: International Conference on Intelligence Analysis[C]. Virginia, USA: 2005.
[16] Lan M, S Y Sung et al. A Comparative Study on Term Weighting Schemes for Text Categorization [A]. International Joint Conference on Neural Networks[C]. 2005.
[17] 王治敏,朱学锋,俞士汶. 基于现代汉语语法信息词典的词语情感评价研究 [J].Computational Linguistics and Chinese Language Processing.2005,10(4):581-592.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
863专题目标导向类项目(2006AA01Z197);国家自然科学基金重点项目(60435020)
{{custom_fund}}