一种基于情感特征表示的跨语言文本情感分析模型

徐月梅,施灵雨,蔡连侨

PDF(2153 KB)
PDF(2153 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (2) : 129-141.
情感分析与社会计算

一种基于情感特征表示的跨语言文本情感分析模型

  • 徐月梅,施灵雨,蔡连侨
作者信息 +

A Cross-lingual Sentiment Analysis Model Based on Sentiment-aware Feature Representation

  • XU Yuemei, SHI Lingyu, CAI Lianqiao
Author information +
History +

摘要

基于深度学习的跨语言情感分析模型需要借助预训练的双语词嵌入(Bilingual Word Embedding, BWE)词典获得源语言和目标语言的文本向量表示。为了解决BWE词典较难获得的问题,该文提出一种基于词向量情感特征表示的跨语言文本情感分析方法,引入源语言的情感监督信息以获得源语言情感感知的词向量表示,使得词向量的表示兼顾语义信息和情感特征信息,用于跨语言文本的情感预测。实验以英语为源语言,分别以汉语、法语、德语、日语、韩语和泰语6种语言为目标语言进行跨语言情感分析。实验结果表明,该文所提模型与机器翻译方法、不采用情感特征表示的跨语言情感分析方法比较,能够分别提高约9.3%和8.7%预测准确率。该模型在德语上的跨语言情感分析效果最好,英语与德语同属日耳曼语族,在语法和语义上更为接近,符合实验预期。实验部分对影响跨语言情感分析模型的相关因素进行了分析。

Abstract

In cross-lingual sentiment analysis, pre-trained Bilingual Word Embedding (BWE) dictionaries are leveraged to generate text vector representations of source and target languages. In order to obtain a qualified BWE dictionary, a novel model is proposed to utilize the affective features in source language as supervised information for word representation generation. The representations we pre-trained contain both semantic and emotional information , suitable for sentiment prediction in target language. In our cross-lingual sentiment analysis experiments, the source language is English, and the target languages include Chinese, French, German, Japanese, Korean and Thai. The results show that the accuracy of our proposed model is about 9.3% higher than Machine Translation (MT) based method, and 8.7% higher than parallel method without sentiment-aware representations. As expected, the experiments on English and German sentiment classification achieved best performance, for both languages belong to the Germanic language group and are more similar in grammar and semantics.

关键词

跨语言情感分析 / 情感感知 / 生成对抗网络

Key words

cross-lingual sentiment analysis / sentiment-aware / generative adversarial network

引用本文

导出引用
徐月梅,施灵雨,蔡连侨. 一种基于情感特征表示的跨语言文本情感分析模型. 中文信息学报. 2022, 36(2): 129-141
XU Yuemei, SHI Lingyu, CAI Lianqiao. A Cross-lingual Sentiment Analysis Model Based on Sentiment-aware Feature Representation. Journal of Chinese Information Processing. 2022, 36(2): 129-141

参考文献

[1] Wan X. Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2008: 553-561.
[2] Wan X. Co-training for cross-lingual sentiment classification [C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL, 2009:235-243.
[3] Lu B, Tan C, Cardie C,et al. Joint bilingual sentiment classification with unlabeled parallel corpora[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011:320-330.
[4] Zhou X, Wan X, Xiao J. Cross-lingual sentiment classification with bilingual document representation learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016:1403-1412.
[5] Zhou G, He T, Zhao J, et al. A subspace learning framework for cross-lingual sentiment classification with partial parallel data[C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence, 2015:1426-1432
[6] Mikolov T, Sutskever I, Chen K,et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of Advances in Neural Information Processing Systems, 2013:3111-3119.
[7] Chen X, Sun Y, Athiwarakun B, et al. Adversarial deep averagingnetworks for cross-Lingual sentiment classification[J]. Transactions of the Association for Computational Linguistics, 2018, 6:557-570.
[8] 余传明, 王峰, 胡莎莎, 安璐. 基于生成对抗网络的跨语言文本情感分析[J]. 情报理论与实践, 2019: 135-141.
[9] Mikolov T, Chen K, Corrado G, et al. Efficientestimation of word representations in vector space[J]. Computer Science, 2013:1-12.
[10] Zhang M, Liu Y, Luan H,et al. Adversarial training for unsupervised bilingual lexicon induction[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1959-1970.
[11] Conneau A, Lample G, Ranzato MA et al. Word translation without parallel data [C]//Proceedings of the International Conference on Learning Representations, 2018:1-14.
[12] Artetxe M, Labaka G, Agirre E. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018:789-798.
[13] Sgaard A, Ruder S, Vuli′c I. On the limitations of unsupervised bilingual dictionary induction[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 456-463.
[14] Yan Q, David A, Evans J G,et al. Mining multilingual options through classification and translation[C]//Proceeding of the AAAI, 2004: 122-126.
[15] Vulic I, Moens M F. Monolingual and cross-lingual information retrieval models based on bilingual word embeddings[C]//Proceedings of the ACM SIGIR International Conference on the Theory of Information Retrieval. 2015: 363-372.
[16] Thang L, Hieu P, Christopher D M. Bilingual word representations with monolingual quality in mind [C]//Proceeding of the 1th Workshop on Vector Space Modeling for Natural Language Processing, 2015: 151-159.
[17] Carmen B, Rada M, Janyce W, et al. Multilingual subjectivity analysis using machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008: 127-135.
[18] 陈强. 跨语言情感分析方法[D], 武汉: 武汉大学博士学位论文, 2017.
[19] Barnes J, Klinger R, Schulte S, et al. Bilingual sentiment embeddings: Joint projection of sentiment across languages[C]//Proceedings of the ACM, 2018: 2483-2493
[20] Turney P D. Semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002:417-424.
[21] Wan X. Using only cross-document relationships for both generic and topic focused multi-document summarizations[J]. Information Retrieval Journal, 2008, 11(1):25-49.
[22] F Topse, Fuglede B, Jensen-Shannon divergence and Hilbert space embedding[C]//Proceedings of the International Symposium on Information Theory. IEEE, 2005: 1234-1239.
[23] Bo P, Lillian L, Shivakumar V. Thumps up? Sentimentclassification using machine learning techniques[C]//Proceeding of the Conference on Empirical Methods in Natural Language Processing, 2002: 79-86.
[24] Feng Y, Wan X. Towards a unified end-to-end approach for fully unsupervised cross-lingual sentiment analysis[C]//Proceedings of the 23rd Conference on Computational Natural Language Learning, 2019: 1035-1044.
[25] Prettenhofer P, Stein B. Cross-language text classification using structural correspondence learning[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010: 1118-1127.
[26] Maas. Nsmc: Naver sentiment movie corpus v1.0 [DB/OL].https://github.com/e9t/nsmc[2021-05-15].
[27] Arthit S, Ekapol C, Pattarawat C,et al. PyThaiNLP/wisesight-sentiment: First release[DB/OL]. https://zenodo.org/record/3457447#.YihsJRBBz_Q[2021-05-15].
[28] Zou W Y, Socher R, Cer D, et al. Bilingual word embeddings for phrase-based machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2013: 1393-1398.
[29] Kristin L,Sainani. Introduction to principal components analysis[J]. PMR the Journal of Injury Function and Rehabilitation, 2014,6(3):275-278.
[30] Meng Z, Yang L, Luan H, et al. Adversarial training for unsupervised bilingual lexicon induction[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1959-1970.
[31] Sisman B, Zhang M, Dong M, et al. On the study of generative adversarial networks for cross-lingual voice conversion[C]//Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019: 878-885.
[32] Merity S, Keskar N S, Socher R.Regularizing and optimizing LSTM language models[J]. arXiv preprint arXiv:1708.02182, 2017.

基金

中央高校基本科研业务费专项资金(2022JJ006)
PDF(2153 KB)

2255

Accesses

0

Citation

Detail

段落导航
相关文章

/