文本情感分析是自然语言处理的热点问题之一,而词汇是情感分析的基础。汉字通过声音和形状表达意义,该文综合考虑词汇中每个字的部首和音位等信息,构建了一个情感词汇分类模型。在模型中,将词汇的字、部首和音位三种信息向量化,与原始词汇向量融合,生成新的情感词汇表示,最后采用前馈神经网络和卷积神经网络对情感词汇的极性进行分类。实验结果表明,三种细粒度特征都能有效地提高情感词汇的分类效果,并且该文在COAE评测的语料上验证了模型的有效性。
Abstract
Text Sentiment Analysis,one of the hot topics in natural language processing,is based on the analysis of lexicon. Considering Chinese characters,the constituents of lexicon,convey their meaning through sounds and logograph,this paper aims at building a taxonomy of sentiment lexicon by the comprehensive analysis of the radicals and phonemes of each character. In our model,each Chinese character,radicals and phonemes are vectorized and then integrated with the original word vector to generate new expressions of sentiment lexicon,and finally the polarities of sentiment lexicon are categorized with feedforward neural network,convolutional neural network and other approaches. Experiment results reveal that three types of vector features have effectively improved the accuracy of sentiment lexicon classification,as well as a better sentiment sentence classification. results in COAE materials.
关键词
部首 /
音位 /
神经网络
{{custom_keyword}} /
Key words
radical /
phoneme /
neural network
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Chrupala Grzegorz.Text segmentation with character-level text embeddings[C]//Proceedings of the ICML Workshop on Deep Learning for Audio, Speech and Language Processing,2013.
[2] Cicero Nogueira dos Santos,Bianca Zadrozny.Learning character-level representations for part of speech tagging[C]//Proceedings of the 31st International Conference on Machine Learning, 2014.
[3] Cicero Nogueira dos Santos, Maira Gatti.Deep convolutional neural networks for sentiment analysis of short texts[C]//Proceedings of the 25th International Conference on Computational Linguistics,2014:69-78.
[4] Yue Zhang,Stephen Clark.A fast decoder for joint word segmentation and POS tagging using a single discriminative model[C]//Proceedings of the EMNLP, 2010:843-852.
[5] Meishan Zhang, Yue Zhang, Wanxiang Che,et al. Chinese parsing exploiting characters[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics,2013(1):125-134.
[6] Xinxiong Chen, Lei Xu, Zhiyuan Liu,et al.Joint learning of character and word embeddings[C]//Proceedings of the 24th International Conference on Artificial Intelligence,2015:1236-1242.
[7] Yaming Sun, Lei Lin, Nan Yang,et al.Radical-enhanced chinese character embedding[C]//Proceedings of the 21st International Conference on Neural Information Processing,2014:279-286.
[8] Yanran Li, Wenjie Li, Fei Sun,et al.Component-enhanced Chinese character embeddings[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015:829-834.
[9] R Yin, W Quan, L Rui,et al.Multi-granularity chinese word embedding[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016:981-986
[10] Hatzivassiloglou V,Mc Keown K R.Predicting the semantic orientation of adjectives[C]//Proceedings of ACL297,35th Annual Meeting of the Association for Computational Linguistics,1997:174-181.
[11] Turney P D,Littman M L.Measuring praise and criticism:Inference of semantic orientation from association[J].ACM Transactions on Information Systems,2003,21(4):315-346.
[12] M J,Vermeij M.The orientation of user options through advers, verbs and nouns[C]//Proceedings of the 3rd Twente Student Conference on IT,2005.
[13] 徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J].情报学报,2008,27(2):180-185.
[14] 徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报, 2007,21(1):96-100.
[15] 徐琳宏,林鸿飞.基于语义特征和本体的语篇情感计算[J].计算机研究与发展,2007(Z2):356-360.
[16] 杨亮,张绍武,林鸿飞,等.基于图排序的词汇情感消歧研究[J].中文信息学报,2014,28(6):129-136.
[17] 乌达巴拉,汪增福.一种扩展式CRFs的短语情感倾向性分析方法研究[J].中文信息学报,2015,29(1):155-162.
[18] Duyu Tang, Furu Wei.Learning sentiment-specific word embedding for twitter sentiment clssification[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics,2014:1555-1565.
[19] Bespalov Dmitriy, Bai Bing, Qi Yanjun, et al.Sentiment classification based on supervised latent n-gram analysis[C]//Proceedings of the Conference on Information and Knowledge Management, 2011:375-382.
[20] R Collobert,J Weston.A unified architecture for natural language processing:Deep neural networks with multitask learning[C]//Proceedings of the ICML,2008.
[21] Tomas Mikolov,Kai Chen,Greg Corrado and Jeffrey Dean.Efficient estimation of word representations in vector space[C]//Proceedings of the ICLR Workshop Track,2013.
[22] 段玉裁.说文解字注[M].北京:中华书局,2013.
[23] 王世华.文字假借不是词义引申[J].中国语文,2003(5):477-478.
[24] 曹剑芬.汉语声调与语调的关系[J].中国语文,2002(3):195-286.
[25] 陈其光.音位标音的几种选择[J].中国语文, 1994(4):266-273.
[26] 王理嘉.音位归纳的多重可能性[J].汉语学习,1988(3):1-7.
[27] 董琨.汉语的词义蕴含与汉字的兼义造字[J].中国语文,1994(3):226-230.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家社会科学基金(15BYY028);辽宁省自然科学基金(2015020017,20170540230,20170540232);辽宁省优秀人才项目(LJQ2014127)
{{custom_fund}}