基于多层次特征融合注意力网络的多模态情感分析

王靖豪,刘箴,刘婷婷,王媛怡,柴艳杰

PDF(4562 KB)
PDF(4562 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (10) : 145-154.
情感分析与社会计算

基于多层次特征融合注意力网络的多模态情感分析

  • 王靖豪1,刘箴1,刘婷婷2,王媛怡2,柴艳杰1
作者信息 +

Multimodal Sentiment Analysis Based on Multilevel Feature Fusion Attention Network

  • WANG Jinghao1, LIU Zhen1, LIU Tingting2 , WANG Yuanyi2, CHAI Yanjie1
Author information +
History +

摘要

现有分析社会媒体中用户情感的方法,大多依赖于某一种模态数据的信息,缺少多种模态数据的信息融合,并且现有方法缺少分析多种模态数据的信息层次结构之间的关联。针对上述问题,该文提出多层次特征融合注意力网络,在分别提取社会媒体中文本和图像多层次特征的基础上,通过计算“图文”特征与“文图”特征,实现多模态的情感特征互补,从而准确感知社会媒体中用户的情感。在Yelp和MultiZOL数据集上的实验结果表明,该文方法可有效提升多模态数据情感分类的准确率。

Abstract

Existing methods for sentiment analysis in social media usually deal with single modal data, without capturing the relationship between multimodal information. This paper propose to treat the hierarchical structure relations between texts and images in social media as complementarity. This paper designs a multi-level feature fusion attention network to capture both the ‘images-text’ and the ‘text-images’ relations to perceive the user’s sentiments in social media. Experimental results on Yelp and MultiZOL datasets show that this method can effectively improve the sentiment classification accuracy for multimodal data.

关键词

情感分析 / 注意力机制 / 多模态

Key words

sentiment analysis / attention mechanism / multimodal

引用本文

导出引用
王靖豪,刘箴,刘婷婷,王媛怡,柴艳杰. 基于多层次特征融合注意力网络的多模态情感分析. 中文信息学报. 2022, 36(10): 145-154
WANG Jinghao, LIU Zhen, LIU Tingting , WANG Yuanyi, CHAI Yanjie. Multimodal Sentiment Analysis Based on Multilevel Feature Fusion Attention Network. Journal of Chinese Information Processing. 2022, 36(10): 145-154

参考文献

[1] Yadav S,Ekbal A, Saha S, et al. Medical sentiment analysis using social media: Towards building a patient assisted system[C]//Proceedings of the 11th International Conference on Language Resources and Evaluation,2018:2790-2797.
[2] Drus Z, Khalid H. Sentiment analysis in social media and its application: Systematic literature review[J]. Procedia Computer Science, 2019, 161: 707-714.
[3] Tavoschi L, Quattrone F, D’Andrea E, et al. Twitter as a sentinel tool to monitor public opinion on vaccination: an opinion mining analysis from September 2016 to August 2017 in Italy[J]. Human Vaccines &Immunotherapeutics, 2020, 16(5): 1062-1069.
[4] Balazs J A, Velásquez J D. Opinion mining and information fusion: A survey[J]. Information Fusion, 2016, 27: 95-110.
[5] Yang J, She D, Sun M, et al. Visualsentiment prediction based on automatic discovery of affective regions[J]. IEEE Transactions on Multimedia, 2018, 20(9): 2513-2525.
[6] Poria S, Cambria E, Bajpai R, et al. A review of affective computing: From unimodal analysis to multimodal fusion[J]. Information Fusion, 2017, 37: 98-125.
[7] Baltruaitis T, Ahuja C, Morency L P. Multimodal machine learning: A survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(2): 423-443.
[8] Zadeh A, Liang P P, Poria S, et al. Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence,2018: 5634-5641.
[9] Zadeh A, Liang P P, Mazumder N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2018: 5642-5649.
[10] Mittal T, Mathur P, Bera A, et al. Affect2mm: Affective analysis of multimedia content using emotion causality[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021: 5661-5671.
[11] Huang F, Zhang X, Zhao Z, et al. Image-text sentiment analysis via deep multimodal attentive fusion[J]. Knowledge-Based Systems, 2019, 167: 26-37.
[12] Xu N, Mao W, Chen G. Multi-interactive memory network for aspect based multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2019, 33(01): 371-378.
[13] Han W, Chen H, Gelbukh A, et al. Bi-Bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]//Proceedings of the International Conference on Multimodal Interaction,2021: 6-15.
[14] STONE J. Thematic text analysis-new agendas for analyzing text content[J]. Test Analysis for the Social Sciences-Methods for Drawing Statistical Inferences from Texts and Transcripts, 1997: 35-54.
[15] Wiebe J, Bruce R, O’Hara T P. Development and use of a gold-standard data set for subjectivity classifications[C]//Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics,1999: 246-253.
[16] Kim Y. Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing .New York: ACM,2014: 1746-1751.
[17] Socher R, Pennington J, Huang E H, et al. Semi-supervised recursive autoencoders for predicting sentiment distributions[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2011: 151-161.
[18] Wang X, Liu Y, Sun C J, et al. Predicting polarities of tweets by composing word embeddings with long short-term memory[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing,2015: 1343-1353.
[19] Basiri M E, Nemati S, Abdar M, et al. ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis[J]. Future Generation Computer Systems, 2021, 115: 279-294.
[20] You Q, Luo J, Jin H, et al. Robust image sentiment analysis using progressively trained and domain transferred deep networks[C]//Proceedings of the AAAI conference on Artificial Intelligence,2015: 381-388.
[21] Sun M, Yang J, Wang K, et al. Discovering affective regions in deep convolutional neural networks for visual sentiment prediction[C]//Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, 2016: 1-6.
[22] You Q, Jin H, Luo J. Visual sentiment analysis by attending on local image regions[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2017: 231-237.
[23] He X, Zhang H, Li N, et al. A multi-attentive pyramidal model for visual sentiment analysis[C]//Proceedings of the International Joint Conference on Neural Networks. IEEE, 2019: 1-8.
[24] Ou H, Qing C, Xu X, et al. Multi-Level context pyramid network for visual sentiment analysis[J]. Sensors, 2021, 21(6): 2136.
[25] Rosas V P, Mihalcea R, Morency L P. Multimodal sentiment analysis of spanish online videos[J]. IEEE Intelligent Systems, 2013, 28(3): 38-45.
[26] Poria S, Chaturvedi I, Cambria E, et al. Convolutional MKL based multimodal emotion recognition and sentiment analysis[C]//Proceedings of the 16th International Conference on Data Mining. IEEE, 2016: 439-448.
[27] Wu T, Peng J, Zhang W, et al. Video sentiment analysis with bimodal information-augmented multi-head attention[J]. Knowledge-Based Systems, 2021: 107676.
[28] Yu Y, Lin H, Meng J, et al. Visual and textual sentiment analysis of a microblog using deep convolutional neural networks[J]. Algorithms, 2016, 9(2): 41.
[29] Setiawan E I, Juwiantho H, Santoso J, et al. Multiview sentiment analysis with image-text-concept features of indonesian social media posts[J]. Internutional Journual of Intelligent Engineering and System, 2021,4(2): 521-535.
[30] Poria S, Cambria E, Gelbukh A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2015: 2539-2544.
[31] You Q, Cao L, Jin H, et al. Robust visual-textual sentiment analysis: When attention meets tree-structured recursive neural networks[C]//Proceedings of the 24th ACM International Conference on Multimedia,2016: 1008-1017.
[32] Zadeh A, Chen M, Poria S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the Empirical Methodsin Natural Language Processing, 2017: 1103-1114.
[33] Truong Q T, Lauw H W. Vistanet: Visual aspect attention network for multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2019, 33(01): 305-312.
[34] Wen H, You S, Fu Y. Cross-modal context-gated convolution for multi-modal sentiment analysis[J]. Pattern Recognition Letters, 2021, 146: 252-259.
[35] Pennington J, Socher R, Manning C D. Glove: Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2014: 1532-1543.
[36] Mnih V, Heess N, Graves A. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems,2014: 2204-2212.
[37] Simonyan K, Zisserman A.Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[38] Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Aral Mach Intell,2019,39(6): 1137-1149.
[39] Tang D, Qin B, Liu T. Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2015: 1422-1432.
[40] Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2016: 1480-1489.
PDF(4562 KB)

Accesses

Citation

Detail

段落导航
相关文章

/