情感分类是目前自然语言处理领域的一个热点研究问题。该文关注情感分类中的半监督学习方法(即基于少量标注样本和大量未标注样本进行学习的方式),提出了一种新的基于动态随机特征子空间的半监督学习方法。首先,动态生成多个随机特征子空间;然后,基于协同训练(Co-training)在每个特征子空间中挑选置信度高的未标注样本;最后使用这些挑选出的样本更新训练模型。实验结果表明我们的方法明显优于传统的静态产生方式及其他现有的半监督方法。此外该文还探索了特征子空间的划分数目问题。
Abstract
Recently, sentiment classification has become a hot research topic in Natural Language Processing. In this paper, we focus on semi-supervised learning paradigm for this task where only small amount of labeled data with many unlabeled samples are available for learning. Specifically, we propose a novel approach to semi-supervised learning for sentiment classification based on random subspace method. First, various random subspaces of the feature space are dynamically generated; Then, co-training algorithm is applied to choose high-confidential samples from the unlabeled data with the subspaces as the different views. Finally, the trained model is updated with the new obtained high-confidential samples. Experimental study across four product domains shows that our approach clearly outperforms the static way of the subspace generation and achieves much better performances than many other existing approaches for semi-supervised sentiment classification. In addition, this paper also explores the issues of different feature subspaces numbers.
Key wordssentiment classification; semi-supervised learning; feature subspace method
关键词
情感分类 /
半监督学习方法 /
特征子空间
{{custom_keyword}} /
Key words
sentiment classification /
semi-supervised learning /
feature subspace method
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 黄萱菁, 赵军. 中文文本情感分析[J]. 中国计算机学会通讯, 2008, 4(2).
[2] 赵军, 许洪波, 黄萱菁, 等. 中文倾向性分析评测技术报告[A]. 第一届中文倾向性分析评测会议, 2008.
[3] 刘鸿宇, 赵妍妍, 秦兵, 等. 评价对象抽取及其倾向性分析[J]. 中文信息学报, 2010, 24(1):84-88.
[4] 唐慧丰, 谭松波, 程学旗. 基于监督学习的中文情感分类技术比较研究[J]. 中文信息学报, 2007, 6(2).
[5] Pang B., L. Lee, S. Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques. [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-02). 2002.
[6] Zagibalov T.,J. Carroll. Automatic Seed Word Selection for Unsupervised Sentiment Classification of Chinese Test. [C]//Proceedings of the 22nd International Conference on Computational Linguistics (COLING-08). 2008.
[7] Yarowsky D. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods[C]//Proceedings of Annual Meeting on Association for Computational Linguistics(ACL-05). 1995.
[8] Dasgupta S.,V. Ng. Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification[C]//Proceedings of Annual Meeting on Association for Computational Linguistics (ACL-09). 2009.
[9] Wan X. Co-Training for Cross-Lingual Sentiment Classification[C]//Proceedings of Annual Meeting on Association for Computational Linguistics (ACL-09). 2009.
[10] Li S., C. Huang, G. Zhou, et al. Employing Personal/Impersonal Views in Supervised and Semi-supervised Sentiment Classification[C]//Proceedings of Annual Meeting on Association for Computational Linguistics (ACL-10). 2010.
[11] Blum A., T. Mitchell. Combining Labeled and Unlabeled Data with Co-training[C]//Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT-98).1998.
[12] Turney P. Thumbs up or Thumbs down? Semantic Orientation Applied to Unsupervised Classification of Reviews[C]//Proceedings of Annual Meeting on Association for Computational Linguistics (ACL-02). 2002.
[13] Kennedy A., D. Inkpen. Sentiment Classification of Movie Reviews using Contextual Valence Shifters[J]. Computational Intelligence, 2006,22(2), 110-125.
[14] Zagibalov T., J. Carroll. Automatic Seed Word Selection for Unsupervised Sentiment Classification of Chinese Test[C]//Proceedings of the 22nd International Conference on Computational Linguistics (COLING-08). 2008.
[15] Pang B., L. Lee. A Sentimental Education: Sentiment Analysis using Subjectivity Summarization based on Minimum Cuts[C]//Proceedings of Annual Meeting on Association for Computational Linguistics (ACL-04). 2004.
[16] Riloff E., S. Patwardhan, J. Wiebe. Feature Subsumption for Opinion Analysis[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-06). 2006.
[17] McDonald R., K. Hannan, T. Neylon, et al. Structured Models for Fine-to-coarse Sentiment Analysis[C]//Proceedings of Annual Meeting on Association for Computational Linguistics (ACL-07). 2007.
[18] Zhou S., Q. Chen, X. Wang. Active Deep Networks for Semi-Supervised Sentiment Classification[C]//Proceedings of the 23rd International Conference on Computational Linguistics(COLING-10). 2010.
[19] Sindhwani V., P. Melville. Document-Word Co-regularization for Semi-supervised Sentiment Analysis. [C]//Proceedings of 8th IEEE International Conference on Data Mining (ICDM-08). 2008.
[20] Li T., Y. Zhang, V. Sindhwani. A Non-negative Matrix Tri-factorization Approach to Sentiment Classification with Lexical Prior Knowledge[C]//Proceedings of Annual Meeting on Association for Computational Linguistics (ACL-09). 2009.
[21] Ho T. The Random Subspace Method for Constructing Decision Forests[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8), 832-844.
[22] Blitzer J., M. Dredze, F. Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification[C]//Proceedings of Annual Meeting on Association for Computational Linguistics (ACL-07). 2007.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(61003155,60873150);模式识别国家重点实验室开发课题基金资助项目
{{custom_fund}}