基于集成学习的半监督情感分类方法研究

高 伟,王中卿,李寿山

PDF(1850 KB)
PDF(1850 KB)
中文信息学报 ›› 2013, Vol. 27 ›› Issue (3) : 120-127.
综述

基于集成学习的半监督情感分类方法研究

  • 高 伟,王中卿,李寿山
作者信息 +

Semi-Supervised Sentiment Classification with a Ensemble Strategy

  • GAO Wei,WANG Zhongqing,LI Shoushan
Author information +
History +

摘要

情感分类旨在对文本所表达的情感色彩类别进行分类的任务。该文研究基于半监督学习的情感分类方法,即在很少规模的标注样本的基础上,借助非标注样本提高情感分类性能。为了提高半监督学习能力,该文提出了一种基于一致性标签的集成方法,用于融合两种主流的半监督情感分类方法:基于随机特征子空间的协同训练方法和标签传播方法。首先,使用这两种半监督学习方法训练出的分类器对未标注样本进行标注;其次,选取出标注一致的未标注样本;最后,使用这些挑选出的样本更新训练模型。实验结果表明,该方法能够有效降低对未标注样本的误标注率,从而获得比任一种半监督学习方法更好的分类效果。

Abstract

Sentiment classification aims to predict the sentimental orientation expressed in the text. In this paper, we investigate the semi-supervised approaches for sentiment classification in a ensemble learning framework where a abound of unlabeled data is leveraged to enhance the classification performance together with a small amount of labeled data. To improve the performance of the semi-supervised learning approach, we propose a novel ensemble method based on label consistency. Specifically, we combine two popular semi-supervised methodsco-training with random feature subspaces and label propagation to generate the pseudo labeled data for updating the initial labeled data. First, the unlabeled data are labeled by the two semi-supervised learning approaches separately. Then, the unlabeled samples with the consistent labels are considered as pseudo labeled data. Finally, the labeled data is updated with the pseudo labeled data. Experimental study shows that our approach is capable of effectively reducing the error of the pseudo labeled data and thus achieves much better performances than some other approaches for semi-supervised sentiment classification.
Key wordssentiment classification; semi-supervised learning; ensemble learning

关键词

情感分类 / 半监督 / 集成学习

Key words

sentiment classification / semi-supervised learning / ensemble learning
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
高 伟,王中卿,李寿山. 基于集成学习的半监督情感分类方法研究. 中文信息学报. 2013, 27(3): 120-127
GAO Wei,WANG Zhongqing,LI Shoushan. Semi-Supervised Sentiment Classification with a Ensemble Strategy. Journal of Chinese Information Processing. 2013, 27(3): 120-127

参考文献

[1] 黄萱菁, 赵军. 中文文本情感分析[J]. 中国计算机学会通讯, 2008, 4(2).
[2] 赵军,许洪波,黄萱菁,等. 中文倾向性分析评测技术报告[C]//第一届中文倾向性分析评测会议, 2008.
[3] 刘鸿宇,赵妍妍,秦兵,等. 评价对象抽取及其倾向性分析[J]. 中文信息学报, 2010, 24(1): 84-88.
[4] 唐慧丰, 谭松波, 程学旗. 基于监督学习的中文情感分类技术比较研究[J]. 中文信息学报, 2007, 6(2).
[5] Pang B, L Lee, S Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques[C]//Proceedings of EMNLP-02,2002.
[6] Zagibalov T, J Carroll. Automatic Seed Word Selection for Unsupervised Sentiment Classification of Chinese Test[C]//Proceedings of COLING-08,2008.
[7] Yarowsky D. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods[C]//Proceedings of ACL-95:189-196.
[8] Dasgupta S, V Ng. Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification[C]//Proceedings of ACL-IJCNLP-09,2009.
[9] Wan X. Co-Training for Cross-Lingual Sentiment Classification[C]//Proceedings of ACL-IJCNLP-09,2009.
[10] Li S., C. Huang, G. Zhou, and S. Lee. 2010. Employing Personal/Impersonal Views in Supervised and Semi-supervised Sentiment Classification[C]//Proceedings of ACL-10.
[11] 苏艳,王中卿,居胜峰,等.基于随机特征子空间的半监督情感分类方法研究[J].中文信息学报,2012,26(4): 85-92.
[12] Zhu X. and Z. Ghahramani. 2002. Learning from Labeled and Unlabeled Data with Label Propagation. CMU CALD Technical Report.CMU-CALD-02-107.
[13] Turney P. Thumbs up or Thumbs down? Semantic Orientation Applied to Unsupervised Classification of reviews[C]//Proceedings of ACL.2002.
[14] 李寿山, 黄居仁.基于 Stacking组合分类方法的中文情感分类研究[J].中文信息学报,2010,24(5): 56-61.
[15] Wan X. Co-Training for Cross-Lingual Sentiment Classification[C]//Proceedings of ACL-IJCNLP-09.
[16] Li S, C Huang, G Zhou, et al. Employing Personal/Impersonal Views in Supervised and Semi-supervised Sentiment Classification[C]//Proceedings of ACL-10,2010.
[17] Sindhwani V, P Melville. Document-Word Co-regularization for Semi-supervised Sentiment Analysis[C]//Proceedings of ICDM-08,2008.
[18] Li T, Y Zhang, V Sindhwani. A Non-negative Matrix Tri-factorization Approach to Sentiment Classification with Lexical Prior Knowledge[C]//Proceedings of ACL-IJCNLP-09,2009.
[19] Blum A, T Mitchell. Combining Labeled and Unlabeled Data with Co-training[C]//Proceedings of COLT-98,1998.

        ()()

基金

国家自然科学基金资助项目(90920004, 61070123, 61003153, 60970056);模式识别国家重点实验室开放课题基金资助项目;国家863计划资助项目(2012AA011102)
PDF(1850 KB)

613

Accesses

0

Citation

Detail

段落导航
相关文章

/