该文提出了一种以商品评论为对象的基于语义融合的跨语言情感分类算法。该算法首先从短文本语义表示的角度出发,基于开源工具Word2Vec预先生成词嵌入向量来获得不同语言下的信息表示;其次,根据不同语种之间的词向量的统计关联性提出使用自联想记忆关系来融合提取跨语言文档语义;然后利用卷积神经网络的局部感知性和权值共享理论,融合自联想记忆模型下的复杂语义表达,从而获得不同长度的短语融合特征。深度神经网络将能够学习到任意语种语义的高层特征致密组合,并且输出分类预测。为了验证算法的有效性,将该模型与最新几种模型方法的实验结果进行了对比。实验结果表明,此模型适用于跨语言情感语料正负面情感分类,实验效果明显优于现有的其他算法。
Abstract
A cross-linguistic sentiment classification algorithm based on semantic fusion is proposed for product reviews. First, information of different languages is generated by the open-source tool Word2Vec in advance. Then, the auto-associative memory relationship is proposed to extract the cross-lingual document semantic, according to statistical relevance of word vector between different languages. Local perception and weight sharing techniques of convolutional neural networks are applied to amalgamate of complex semantic expression in auto-associative memory model, so as to generate the phrase features of different lengths. The dense combination of high-level semantic features is learned by deep neural network for all languages, which paves the way for classification predictions. It is demonstrated that, for positive and negative sentiment classification of cross-lingual sentiment corpus, the proposed model is much more effective than other existing algorithms
关键词
跨语言情感分类 /
自联想记忆 /
词共现 /
卷积神经网络
{{custom_keyword}} /
Key words
cross-lingual sentiment classification /
auto-associative memory relationship /
word co-occurrence /
convolutional neural networks
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Gui L,et al. Amixed model for cross- lingual opinion analysis[C]//Proceedings of Natural Language Processing and Chinese Computing,2013:93-104.
[2] Gui L,et al. Cross-lingual opinion analysis via negative transfer detection[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics,2014:860-865.
[3] Chen Q,et al. Learning to adapt credible knowledge in cross-lingual sentiment analysis[C]//Proceedings of the Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing,2015:419-429.
[4] Wang Y,Huang M,Zhao L. Attention-based lstm for aspect-level sentiment classification[C]//Proceedings of the 2016 conference on empirical methods in natural language processing,2016:606-615.
[5] Li Z,et al. End-to-end adversarial memory network for cross-domain sentiment classification[C]//Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2017),2017.
[6] 陈龙. 基于深度学习的跨语言信息抽取研究[D]. 大连理工大学硕士学位论文. 2016.
[7] Gliozzo A M,Strapparava C. Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization[C]//Proceedings of the International Conference on Computational Linguistics and Meeting of the Association for Computational Linguistics (ACL),2006:553-560.
[8] Hanneman G,Lavie A. Automatic category label coarsening for syntax-based machine translation[C]//Proceedings of the Workshop on Syntax,Semantics and Structure in Statistical Translation(ACL),2011:98-106.
[9] Faruqui M,Dyer C. Improving vector space word representations using multilingual correlation[C]//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics,2014:462-471.
[10] Guo J,et al. Cross-lingual dependency parsing based on distributed representations[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing,2015:1234-1244.
[11] Harris Z S. Mathematical structures of language[M]. Interscience Tracts in Pure and Applied Mathematics. NewYork:Wiley,1968:111-112.
[12] Jiao L,Rongyi C,Yahui Z. Cross-lingual similar documents retrieval based on co-occurrence projection[C]//Proceedings of the 6th International Conference on Computer Science and Network Technology,2017:11-15.
[13] Mikolov T,et al. Efficient estimation of word representations in vector space[J/OL]. [7 Sep 2013],arXiv:1301.3781v3.
[14] Mikolov T,V L Q,Sutskever I. Exploiting similarities among languages for machine translation[J/OL]. [17 Sep 2013]. arXiv:1309.4168v1.
[15] Kim Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),2014:1746-1751.
[16] Ioffe S,Szegedy C. Batch Normalization:Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning,2015:448-456.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家语委“十三五”科研规划项目(YB135-76);延边大学外国语言文学世界一流学科建设科研项目(18YLPY13)
{{custom_fund}}