越南语网络评论的情感分类是越南语事件观点分析的基础。越南语资源匮乏,标注困难,可借助中文标注语料进行跨语言情感分类,实现越南语评论的情感极性预测。但现有的跨语言情感分类模型忽略了主题信息对加强情感表征学习、减小语言差异的作用。为此,该文提出了一种融入主题特征的中越跨语言情感分类模型。将中文①和越南语的主题词分布作为外部知识引入模型,利用门控机制将主题表征与语义表征进行融合编码,并通过对抗过程使模型学习到语言分布差异最小的表征,最终完成情感分类任务。实验结果表明,该模型能更快拟合出语言分布差异,其宏F1值较多个基线模型均有明显提高。
Abstract
Sentiment classification of Vietnamese online comments is the viald for the opinion analysis of Vietnamese event . As Vietnamese is a low-resource language, the cross-lingual sentiment classification can be performed with the help of Chinese annotated corpus to help the sentiment polarity prediction of Vietnamese. In this paper, a cross-lingual sentiment classification model of Chinese and Vietnamese comments incorporating confrontational topic features is proposed. The topic distributions of Chinese and Vietnamese are introduced into the model as external knowledge, and a gate layer is used to encode representation from topic representations with semantic representations. The model is optimized to learn the representations with the smallest differences in language distributions through the adversarial learning to finally complete the sentiment classification task. The experimental results show that the proposed model can significantly improve marco F1 values compared with several baseline models.
关键词
跨语言情感分析 /
主题模型 /
社交媒体评论 /
对抗学习
{{custom_keyword}} /
Key words
cross-lingual sentiment classification /
topic modeling /
social media comments /
adversarial learning
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Wan X. Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis[C]//Proceedings of the Conference on Empiric-al Methods in Natural Language Processing. Honolulu, Hawaii: EMNLP, 2008: 553-561.
[2] Wan X. Co-training for cross-lingual sentiment classification[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Suntec, Singapore: ACL, 2009: 235-243.
[3] Balamurali A R, Joshi A, Bhattacharyya P. Cross-lingual sentiment analysis for Indian languages using linked wordnets[C]//Proceedings of the COLING 2012 Organizing Committee, 2012: 73-82.
[4] Barnes J, Klinger R, im Walde S S. Bilingual sentiment embeddings: Joint projection of sentiment across languages[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: ACL, 2018: 2483-2493.
[5] Banea C, Mihalcea R, Wiebe J. Multilingual subjectivity: Are more languages better?[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Beijing, China: Coling 2010 Organizing Committee, 2010: 28-36.
[6] Li N, Zhai S, Zhang Z, et al. Structural correspondence learning for cross-lingual sentiment classification with one-to-many mappings[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Palo Alto, California: AAAI Press, 2017: 3490-3496.
[7] Zhou X, Wan X, Xiao J. Cross-lingual sentiment classification with bilingual document representation learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: ACL, 2016: 1403-1412.
[8] Meng X, Wei F, Liu X, et al. Cross-lingual mixture model for sentiment classification[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics.Jeju Island, Korea: ACL, 2012: 572-581.
[9] Chandar AP S, Lauly S, Larochelle H, et al. An autoencoder approach to learning bilingual word representations[J]. Advances in Neural Information Processing Systems, 2014, 27(c): 1853-1861.
[10] Hermann K M, Blunsom P. Multilingual models for compositional distributed semantics[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, Maryland: ACL, 2014: 58-68.
[11] Zhou H, Chen L, Shi F, et al. Learning bilingual sentiment word embeddings for cross-language sentiment classification[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China: ACL, 2015: 430-440.
[12] Chen X, Sun Y, Athiwaratkun B, et al. Adversarial deep averaging networks for cross-lingual sentiment classification[J]. Transactions of the Association for Computational Linguistics, 2018, 6(c): 557-570.
[13] Chen Z, Shen S, Hu Z, et al. Emoji-powered representation learning for cross-lingual sentiment classification[C]//Proceedings of the World Wide Web Conference. New York, USA: ACM, 2019: 251-262.
[14] He D, Feng Z, Jin D, et al. Joint identification of network communities and semantics via integrative modeling of network topologies and node contents[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Palo Alto, California: AAAI Press, 2017: 116-124.
[15] Ganin Y, Lempitsky V. Unsupervised domain adaptation by backpropagation[C]//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: PMLR, 2015: 1180-1189.
[16] Ganin Y, Ustinova E, Ajakan H, et al. Domain-adversarial training of neural networks[J]. The Journal of Machine Learning Research, 2016, 17(1): 2096-2030.
[17] Lin Y, Lei H, Wu J, et al. An empirical study on sentiment classification of Chinese review using word embedding[C]//Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. Shanghai, China: ACL, 2015: 258-266.
[18] Yoon Kim. Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Doha, Qatar: ACL, 2014: 1746–1751.
[19] Chen M, Xu Z, Weinberger K Q, et al. Marginalized denoising autoencoders for domain adaptation[C]//Proceedings of the 29th International Conference on International Conference on Machine Learning. Wisconsin, US: ACM, 2012: 1627-1634.
[20] Conneau A, Khandelwal K, Goyal N, et al. Unsupervised cross-lingual representation learning at scale[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: ACL, 2020: 8440-8451.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
云南省重大科技专项计划项目(202002AD080001);国家重点研发计划(2018YFC0830105,2018YFC0830100);国家自然科学基金(61762056,61472168,61972186)
{{custom_fund}}