基于多模态异质动态融合的情绪分析研究

丁健,杨亮,林鸿飞,王健

PDF(2504 KB)
PDF(2504 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (5) : 112-124.
情感分析与社会计算

基于多模态异质动态融合的情绪分析研究

  • 丁健,杨亮,林鸿飞,王健
作者信息 +

Dynamic Fusion of Multi-modal Heterogeneous Data for Sentiment Analysis

  • DING Jian, YANG Liang, LIN Hongfei, WANG Jian
Author information +
History +

摘要

近年来,利用多模态数据进行情绪分析是一个非常热门的领域。如何对模态内部信息及模态之间的相互作用进行更好的利用,是一个值得探讨的研究问题。而多个模态之间的相互作用,并不是一个静态的过程,而是动态变化的,且模态对于不同的任务而言也存在动态的强弱差异。若不能妥善处理,将导致模型性能的下降。该文针对时序多模态情绪数据提出了一种异质的动态融合方法,通过层次化的异质动态融合方式更完备地进行模态融合,并且动态地捕捉到模态间的相互作用。因此,该方法在提高模型性能的同时也提高了模态融合过程的可解释性。同时,该文利用多任务学习策略,将异质动态融合网络联合单个模态的自监督学习网络,获得模态的一致性及差异性特征。通过CMU-MOSI及CMU-MOSEI数据集上的实验表明该模型相比于主流模型具有优势,且模态融合的过程更具可解释性。

Abstract

In recent years, sentiment analysis has been extended to multi-modal data, and the dynamic instead of static interaction of the intra modality data is worth exploring. This paper proposes a dynamic fusion method for heterogeneous multi-modal emotional stream data to completely capture the interaction between modalities. And using multi-task learning strategy, the heterogeneous dynamic fusion network is combined with a single modality self-supervised learning network to obtain the consistency and difference characteristics of the modality. Experiments on the CMU-MOSI and CMU-MOSEI indicate the advantage of the proposed method over mainstream models, as well as its interpretability.

关键词

多模态融合 / 多任务学习 / 情绪分析

Key words

multi-modal fusion / multi-task learning / sentiment analysis

引用本文

导出引用
丁健,杨亮,林鸿飞,王健. 基于多模态异质动态融合的情绪分析研究. 中文信息学报. 2022, 36(5): 112-124
DING Jian, YANG Liang, LIN Hongfei, WANG Jian. Dynamic Fusion of Multi-modal Heterogeneous Data for Sentiment Analysis. Journal of Chinese Information Processing. 2022, 36(5): 112-124

参考文献

[1] Morency L P, Mihalcea R, Doshi P. Towards multimodal sentiment analysis: harvesting opinions from the web[C]//Proceedings of the 13th International Conference on Multimodal Interfaces, 2011: 169-176.
[2] Poria S, Hussain A, Cambria E. Multimodal senti-ment analysis[M]. Cham: Springer International Publishing, 2018:60-61.
[3] Qi H, Wang X, Iyengar S S, et al. Multisensor data fusion in distributed sensor networks using mobile agents[C]//Proceedings of 5th International Conference on Information Fusion, 2001: 11-16.
[4] Kettenring J R. Canonical analysis of several sets of variables[J]. Biometrika, 1971, 58(3): 433-451.
[5] Yu W, Xu H, Yuan Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 10790-10797.
[6] Picard R W, Healey J. Affective wearables[J]. Personal Technologies, 1997, 1(4): 231-240.
[7] Poria S, Cambria E, Bajpai R, et al. A review of affective computing: From unimodal analysis to multimodal fusion[J]. Information Fusion, 2017, 37: 98-125.
[8] Emerich S, Lupu E, Apatean A. Bimodal approach in emotion recognition using speech and facial expressions[C]//Proceedings of the International Symposium on Signals, Circuits and Systems. IEEE, 2009: 1-4.
[9] Tsai Y H H, Bai S, Liang P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the Conference Association for Computational Linguistics Meeting, NIH Public Access, 2019: 6558.
[10] Mnih V, Heess N, Graves A, et al. Recurrent models of visual attention[J]. arXiv preprint arXiv:1406.6247, 2014.
[11] Zadeh A, Chen M, Poria S, et al. Tensor fusion network for multimodal sentiment analysis[J]. arXiv preprint arXiv:1707.07250, 2017.
[12] Zadeh A, Zellers R, Pincus E, et al. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv preprint arXiv:1606.06259, 2016.
[13] Zadeh A A B, Liang P P, Poria S, et al. Multimodal language analysis in the wild: cmu-mosei dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 2236-2246.
[14] Zhu X J. Semi-supervised learning literature survey[R]. University of Wisconsis-Madison Department of Computer Sciences, 2005.
[15] Jing L, Tian Y. Self-supervised visual feature learning with deep neural networks: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11): 4037-4058.
[16] Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[17] Qiu L, Zhang W, Hu C, et al. Selc: A self-supervised model for sentiment classification[C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009: 929-936.
[18] Lan Z, Chen M, Goodman S, et al. Albert: a lite BERT for self-supervised learning of language representations[J]. arXiv preprint arXiv:1909.1 1942, 2019.
[19] Socher R, Perelygin A, Wu J, et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2013: 1631-1642.
[20] Zhang Y, Yang Q. A survey on multi-task learning[C]//Proceedings of the IEEE Transactions on Knowledge and Data Engineering, 2021.
[21] 吴良庆,张栋,李寿山,等.基于多任务学习的多模态情绪识别方法[J].计算机科学,2019,46(11):284-290.
[22] 林子杰,龙云飞,杜嘉晨,等.一种基于多任务学习的多模态情感识别方法[J/OL].北京大学学报(自然科学版) ,2021,57(01):7-15. https://doi.org/10.13209/j.0479-8023.2020.085[2021-04-27].
[23] Zadeh A, Liang P P, Poria S, et al. Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
[24] Zadeh A, Liang P P, Mazumder N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
[25] Wang Y, Shen Y, Liu Z, et al. Words can shift: dynamically adjusting word representations using nonverbal behaviors[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(01): 7216-7223.
[26] Tsai Y H H, Bai S, Liang P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the Conference. Association for Computational Linguistics. Meeting. NIH Public Access, 2019: 6558.
[27] Yang Y, Ye H J, Zhan D C, et al. Auxiliary information regularized machine for multiple modality feature learning[C]//Proceedings of the IJCAI, 2015, 15: 1033-1039.

基金

国家重点研发计划(2018YFC0832101);国家自然科学基金(61702080,61806038,61632011,61772103)
PDF(2504 KB)

Accesses

Citation

Detail

段落导航
相关文章

/