|
|
A Sentiment Classification Method Based on Sentiment-Specific Word Embedding |
DU Hui1;2; XU Xueke1; WU Dayong 1; LIU Yue1; YU Zhihua 1; CHENG Xueqi 1 |
1.CAS Key Laboratory of Newtwork Data Science and Technology, Institute of Computing Technology,
Chinese Academy Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences, Beijing 100190, China |
|
|
Abstract We present a method for sentiment classification based on sentiment-specific word embedding (SSWE). Word embedding is the distributed vector representation of a word with fixed length in real topological space. Algorithms for learning word embedding, like word2vec, obtain this representation from large un-annotated corpus, without considering sentiment information. We make sentiment improvement for the initial word embedding and get the sentiment-specific word embedding that contains both syntactic and sentiment information.Then text representations are built based on sentiment-specific word embeddings. Sentiment polarities of texts are obtained through machine learning approaches. Experiments show that the presented algorithm performs better than sentiment classification method based on texts modeling by word, N-gram and word embeddings from word2vec.
|
Received: 20 September 2015
|
|
|
|
|
[1] Bo Pang, Lillian Lee, Shivakumar Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 conference on Empirical methods in natural language(EMNLP),2002,V(10): 79-86.
[2] Aliaksei Severyn, Alessandro Moschitti. Twitter sentiment analysis with deep convolutional neural networks[C]//Proceedings of the SIGIR, 2015.
[3] Peter D.Turney. Thumbs up or thumbs down semantic orientate-on applied to unsupervised classificationof reviews[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002: 417-424.
[4] 朱嫣岚, 闵锦, 周雅倩,等. 基于 HowNet 的词汇语义倾向计算[J]. 中文信息学报, 2006, 20(1): 14-20.
[5] Soo-Min Kim, Eduard Hovy. Automatic identification of pro and con reasons in online reviews[C]//Proceedings of the COLING/ACL, 2006:483-490.
[6] 唐慧丰, 谭松波, 程学旗. 基于监督学习的中文情感分类技术比较研究[J]. 中文信息学报,2007, 21(6): 88-94.
[7] Yoshua Bengio, Rejean Ducharme, Pascal Vincent, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, V(3): 1137-1155.
[8] Tomas Mikolov.word2vec project[DB/OL]. http://code.google.com/p/word2vec/.
[9] Tomas Mikolov, Ilya Sutskever, Kai Chen, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of NIPS, 2013: 3111-3119.
[10] Tomas Mikolov, Kai Chen, Greg Corrado,et al. Efficient estimation of word representations in vector space[C]//Proceedings of Workshop at ICLR, 2013.
[11] 杨阳, 刘龙飞, 魏现辉,等. 基于词向量的情感新词发现[J]. 山东大学学报(理学版), 2014, 11(49): 51-58.
[12] 梁军, 柴玉梅, 原慧斌,等. 基于深度学习的微博情感分析[J]. 中文信息学报,2014,28(5): 155-161.
[13] http://www.liip.cn/CCIR2014/pc.html[OL].
[14] HowNet. HowNets Home Page[DB/OL]. http://www.keenage.com.
[15] 徐琳宏, 林鸿飞, 潘宇,等. 情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185.
[16] http://nlp.csai.tsinghua.edu.cn/site2/index.php/zh/resources/13-v10[OL].
[17] Zhu Xiaojin, Ghahramani Zoubin. Learning from labeled and unlabeled data with label propagation[R]. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.
|
|
|
|