该文提出了一种基于情感词向量的情感分类方法。词向量采用连续实数域上的固定维数向量来表示词汇,能够表达词汇丰富的语义信息。词向量的学习方法,如word2vec,能从大规模语料中通过上下文信息挖掘出潜藏的词语间语义关联。本文在从语料中学习得到的蕴含语义信息的词向量基础上,对其进行情感调整,得到同时考虑语义和情感倾向的词向量。对于一篇输入文本,基于情感词向量建立文本的特征表示,采用机器学习的方法对文本进行情感分类。该方法与基于词、N-gram及原始word2vec词向量构建文本表示的方法相比,情感分类准确率更高、性能和稳定性更好。
Abstract
We present a method for sentiment classification based on sentiment-specific word embedding (SSWE). Word embedding is the distributed vector representation of a word with fixed length in real topological space. Algorithms for learning word embedding, like word2vec, obtain this representation from large un-annotated corpus, without considering sentiment information. We make sentiment improvement for the initial word embedding and get the sentiment-specific word embedding that contains both syntactic and sentiment information.Then text representations are built based on sentiment-specific word embeddings. Sentiment polarities of texts are obtained through machine learning approaches. Experiments show that the presented algorithm performs better than sentiment classification method based on texts modeling by word, N-gram and word embeddings from word2vec.
关键词
情感分析 /
情感分类 /
词向量 /
机器学习
{{custom_keyword}} /
Key words
sentiment analysis /
sentiment classification /
word embedding /
machine learning
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Bo Pang, Lillian Lee, Shivakumar Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 conference on Empirical methods in natural language(EMNLP),2002,V(10): 79-86.
[2] Aliaksei Severyn, Alessandro Moschitti. Twitter sentiment analysis with deep convolutional neural networks[C]//Proceedings of the SIGIR, 2015.
[3] Peter D.Turney. Thumbs up or thumbs down semantic orientate-on applied to unsupervised classificationof reviews[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002: 417-424.
[4] 朱嫣岚, 闵锦, 周雅倩,等. 基于 HowNet 的词汇语义倾向计算[J]. 中文信息学报, 2006, 20(1): 14-20.
[5] Soo-Min Kim, Eduard Hovy. Automatic identification of pro and con reasons in online reviews[C]//Proceedings of the COLING/ACL, 2006:483-490.
[6] 唐慧丰, 谭松波, 程学旗. 基于监督学习的中文情感分类技术比较研究[J]. 中文信息学报,2007, 21(6): 88-94.
[7] Yoshua Bengio, Rejean Ducharme, Pascal Vincent, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, V(3): 1137-1155.
[8] Tomas Mikolov.word2vec project[DB/OL]. http://code.google.com/p/word2vec/.
[9] Tomas Mikolov, Ilya Sutskever, Kai Chen, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of NIPS, 2013: 3111-3119.
[10] Tomas Mikolov, Kai Chen, Greg Corrado,et al. Efficient estimation of word representations in vector space[C]//Proceedings of Workshop at ICLR, 2013.
[11] 杨阳, 刘龙飞, 魏现辉,等. 基于词向量的情感新词发现[J]. 山东大学学报(理学版), 2014, 11(49): 51-58.
[12] 梁军, 柴玉梅, 原慧斌,等. 基于深度学习的微博情感分析[J]. 中文信息学报,2014,28(5): 155-161.
[13] http://www.liip.cn/CCIR2014/pc.html[OL].
[14] HowNet. HowNets Home Page[DB/OL]. http://www.keenage.com.
[15] 徐琳宏, 林鸿飞, 潘宇,等. 情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185.
[16] http://nlp.csai.tsinghua.edu.cn/site2/index.php/zh/resources/13-v10[OL].
[17] Zhu Xiaojin, Ghahramani Zoubin. Learning from labeled and unlabeled data with label propagation[R]. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家973计划(2014CB340406,2013CB329602);国家863计划(2014AA015204);国家自然科学基金(61232010)
{{custom_fund}}