面向中文网络评论情感分类的集成学习框架

黄佳锋,薛云,卢昕,刘志煌,吴威,黄英仁,李万理,陈鑫

PDF(8797 KB)
PDF(8797 KB)
中文信息学报 ›› 2018, Vol. 32 ›› Issue (9) : 113-122.
情感分析与社会计算

面向中文网络评论情感分类的集成学习框架

  • 黄佳锋1,薛云1,2,卢昕1,刘志煌1,吴威1,黄英仁1,李万理1,陈鑫1,3
作者信息 +

An Ensemble Learning Framework for Sentiment Classification of Chinese Online Reviews

  • HUANG Jiafeng1, XUE Yun1,2, LU Xin1, LIU Zhihuang1, WU Wei1, HUANG Yingren1, LI Wanli1, CHEN Xin1,3
Author information +
History +

摘要

该文针对中文网络评论情感分类任务,提出了一种集成学习框架。首先针对中文网络评论复杂多样的特点,采用词性组合模式、频繁词序列模式和保序子矩阵模式作为输入特征。然后采用基于信息增益的随机子空间算法解决文本特征繁多的问题,同时提高基分类器的分类性能。最后基于产品属性构造基分类器算法综合评论文本中每个属性的情感信息,进而判别评论的句子级情感倾向。实验结果表明了该框架在中文网络评论情感分类任务上的有效性,特别是在Logistic Regression分类算法上准确率达到90.3%。

Abstract

We propose an ensemble learning framework for sentiment classification of Chinese online reviews. Firstly,according to the complicated characteristics of Chinese online reviews,we combine the POS pattern,the frequent word sequence pattern and the OPSM pattern as the input features. Secondly,to deal with the massive features in the reviews,we use the random subspace based on information gain algorithm,which can enhance the base classifiers simultaneously. Finally,we design base classifiers for each product aspect so as to combine the sentiment information of each aspect in a review. The experimental results show that our framework leads to significant improvement in sentiment classification of Chinese online reviews,with an accuracy of 90.3% on Logistic Regression.

关键词

网络评论 / 情感分类 / 集成学习 / 特征提取

Key words

online reviews / sentiment classification / ensemble learning / feature extraction

引用本文

导出引用
黄佳锋,薛云,卢昕,刘志煌,吴威,黄英仁,李万理,陈鑫. 面向中文网络评论情感分类的集成学习框架. 中文信息学报. 2018, 32(9): 113-122
HUANG Jiafeng, XUE Yun, LU Xin, LIU Zhihuang, WU Wei, HUANG Yingren, LI Wanli, CHEN Xin. An Ensemble Learning Framework for Sentiment Classification of Chinese Online Reviews. Journal of Chinese Information Processing. 2018, 32(9): 113-122

参考文献

[1] 刘颖. 计算语言学[M].北京:清华大学出版社,2002.
[2] Xia R,Zong C,Li S. Ensemble of feature sets and classification algorithms for sentiment classification[J].Information Sciences,2011,181(6):1138-1152.
[3] Sivic J,Zisserman A. Efficient visual search of videos cast as text retrieval[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(4):591-606.
[4] Harris Z S. Distributional structure[J].Word,1954,10(2-3):146-162.
[5] Ho T K. The random subspace method for constructing decision forests[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,1998,20(8):832-844.
[6] Tan Songbo,Chnsenticorp [EB/OL] ,2010-06-29,http://www.datatang.com/data/14614.
[7] 杨立公,朱俭,汤世平. 文本情感分析综述[J].计算机应用,2013(06):1574-1578,1607.
[8] Pang B,Lee L,Vaithyanathan S. Thumbs Up? Sentiment classification using machine learning techniques[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing(EMNLP),2002:79-86.
[9] Salton G,Yu C T. On the construction of effective vocabularies for information retrieval[C]//Proceedings of ACM SIGIR Forum,1973:48-60.
[10] Bengio Y,Ducharme R,Vincent P,et al. A neural probabilistic language model[J].Journal of Machine Learning Research,2003,3(6):1137-1155.
[11] Mikolov T,Chen K,Corrado G,et al. Efficient estimation of word representations in vector space[J].Computer Science,2013.
[12] Tan S,Zhang J. An empirical study of sentiment analysis for chinese documents[J].Expert Systems with Applications,2008,34(4):2622-2629.
[13] Polikar R. Ensemble based systems in decision making[J].IEEE Circuits and Systems Magazine,2006,6(3):21-44.
[14] Wang G,Sun J,Ma J,et al. Sentiment classification:The contribution of ensemble learning[J].Decision Support Systems,2014,57(1):77-93.
[15] Deriu J,Gonzenbach M,Uzdilli F,et al. Swiss Cheese at SemEval-2016 Task 4:Sentiment Classification Using an Ensemble of Convolutional Neural Networks with Distant Supervision[C]//Proceedings of the SemEval@ NAACL-HLT. 2016:1124-1128.
[16] Turney P D. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of Meeting on Association for Computational Linguistics. Association for Computational Linguistics,2002:417-424.
[17] Matsumoto S,Takamura H,Okumura M. Sentiment classification using word subsequences and dependency subtrees[C]//Proceedings of Advances in Knowledge Discovery and Data Mining,2005:301-311.
[18] Pei J,Han J,Mortazavi-Asl B,et al. Mining sequential patterns by pattern-growth:The prefixspan approach[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(11):1424-1440.
[19] Liu Z W,Xue Y,Li M,et al. Discovery of deep order-preserving submatrix in DNA microarray data based on sequential pattern mining[J].International Journal of Data Mining and Bioinformatics,2017.
[20] Hu M,Liu B. Opinion feature extraction using class sequential rules[J].AAAI Spring Symposium,2006:61-66.
[21] Liu Y,Chen F,Kong W,et al. Identifying web spam with the wisdom of the crowds[J].ACM Transactions on the Web(TWEB),2012,6(1):1-30.
[22] Zhang H P. Liu Q. Ictclas Institute of Computing Technology,Chinese Acadery of Science[EB/OL].http://www.ict.ac.cn/freewacre/003_Tctclas,asp.2002.
[23] Pedregosa F,Varoquaux G,Gramfort A,et al. Scikit-Learn:Machine learning in python[J].Journal of Machine Learning Research,2011:2825-2830.
[24] Witten I H,Frank E,Hall M A. Data mining:Practical machine learning tools and techniques[M].Burlington:Morgan Kaufmann Publishers,2011.
[25] Abadi M,Agarwal A,Barham P,et al. Tensorflow:Large-scale machine learning on heterogeneous distributed systems[J].arXiv preprint arXiv:1603.04467,2016.

基金

全国统计科学研究项目(2016LY98);广东省科技计划项目(2016A010101020,2016A010101021,2016A010101022);深圳市科创委基础研究项目(JCYJ20160527172144272);广东省数据科学工程技术研究中心课题(2016KF09,2016KFl0);广东科学技术职业学院科研项目(XJSC2016206)
PDF(8797 KB)

741

Accesses

0

Citation

Detail

段落导航
相关文章

/