基于多层LSTM融合的多模态情绪识别

张亚伟,吴良庆,王晶晶,李寿山

PDF(2574 KB)
PDF(2574 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (5) : 145-152.
情感分析与社会计算

基于多层LSTM融合的多模态情绪识别

  • 张亚伟,吴良庆,王晶晶,李寿山
作者信息 +

Multi-modal Emotion Recognition Based on Multi-LSTMs Fusion

  • ZHANG Yawei, WU Liangqing, WANG Jingjing, LI Shoushan
Author information +
History +

摘要

情绪分析一直是自然语言处理领域的研究热点,而多模态情绪分析是当前该领域的一个挑战。已有研究在上下文信息和不同模态时间序列信息交互方面存在不足,该文提出了一个新颖的多层LSTM融合模型(Multi-LSTMs Fusion Model,MLFN),通过分层LSTM分别设置单模态模内特征提取层、双模态和三模态模间融合层进行文本、语音和图像三个模态之间的深度融合,在考虑模态内部信息特征的同时深度捕获模态之间的交互信息。实验结果表明,基于多层LSTM多模态融合网路能够较好地融合多模态信息,大幅度提升多模态情绪识别的准确率。

Abstract

Sentiment analysis is a popular research issue in the field of natural language processing, and multimodal sentiment analysis is the current challenge in this task. Existing studies are defected in capturing context information and combining information streams of different models. This paper proposes a novel multi-LSTMs Fusion Model Network (MLFN), which performs deep fusion between the three modalities of text, voice and image via the internal feature extraction layer for single-modal, and the inter-modal fusion layer for dual-modal and tri-modal. This hierarchical LSTM framework takes into account the information features inside the modal while capturing the interaction between the modals. Experimental results show that the proposed method can better integrate multi-modal information, and significantly improve the accuracy of multi-modal emotion recognition.

关键词

多模态 / 情绪分析 / LSTM

Key words

multi-modal / emotion analysis / LSTM

引用本文

导出引用
张亚伟,吴良庆,王晶晶,李寿山. 基于多层LSTM融合的多模态情绪识别. 中文信息学报. 2022, 36(5): 145-152
ZHANG Yawei, WU Liangqing, WANG Jingjing, LI Shoushan. Multi-modal Emotion Recognition Based on Multi-LSTMs Fusion. Journal of Chinese Information Processing. 2022, 36(5): 145-152

参考文献

[1] Zadeh A, Liang P P, Vanbriesen J, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the Meeting of the Association for Computational Linguistics, 2018: 2236-2246.
[2] Zadeh A, Chen M, Poria S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 1103-1114.
[3] Zadeh A, Liang P, Mazumder N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. arXiv preprint arXiv:1802.00927, 2018.
[4] Ramachandram D, Taylor G W. Deep multimodal learning: a survey on recent advances and trends[J]. IEEE Signal Processing Magazine, 2017, 34(6): 96-108.
[5] Wang H, Meghawat A, Morency L P, et al. Select-additive learning: improving generalization in multimodal sentiment analysis[J].arXiv:1609.05244,2016.
[6] Poria S, Chaturvedi I, Cambria E, et al. Convolutional MKL based multimodal emotion recognition and sentiment analysis[C]//Proceedings of the IEEE 16th International Conference,2016: 439-448.
[7] Wrtwein T, Scherer S. What really matters: an information gain analysis of questions and reactions in automated PTSD screenings[C]//Proceedings of the Affective Computing and Intelligent Interaction, 2017: 15-20
[8] Nojavanasghari B, Gopinath D, Koushik J, et al. Deep multimodal fusion for persuasiveness prediction[C]//Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016, 284-288.
[9] Amir Z, Paul P L, Soujanya P, et al. Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018, arXiv:1802.00923.
[10] Ekman P. An argument for basic emotions[J]. Cognition and emotion, 1992, 6(3-4): 169-200.
[11] Degottex G, Kane J, Drugman T, et al. COVAREP: A collaborative voice analysis repository for speech technologies[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2014: 960-964.
[12] Kingma D P, Ba J L. Adam: a method for stochastic optimization[C]//Proceedings of the 3rd International Conference on Learning Representations. San Diego,2015.
[13] Tong E, Zadeh A, Jones C, et al. Combating human trafficking with multimodal deep models[C]//Proceedings of the Meeting of the Association for Computational Linguistics, 2017: 1547-1556.
[14] Kahou S E, Michalski V, Konda K, et al. Recurrent neural networks for emotion recognition in video[C]//Proceedings of International Conference on Multimodal Interaction. ACM, 2015: 467-474.
[15] Hochreiter S, Schmidhuber J. Long short-term memory.[J]. Neural Computation, 1997, 9(8): 1735-1780.
[16] Srivastava R K, Greff K, Schmidhuber J. Training very deep networks[J]. Advances in Neural Information Processing Systems, 2015: 2377-2385
[17] Zilly J G, Srivastava R K, Koutník J, et al. Recurrent highway networks[J]. arXiv preprint arXiv:1607.03474, 2016.
[18] Trigeorgis G, Ringeval F, Brueckner R, et al. Adieu features: end-to-end speech emotion recognition using a deep convolutional recurrent network[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2016: 5200-5204.
[19] Lim W, Jang D, Lee T. Speech emotion recognition using convolutional and recurrent neural networks[C]//Proceedings of Signal and Information Processing Association Summit and Conference.IEEE, 2017: 1-4.
[20] Ho T K. The random subspace method for constructing decision forests[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8): 832-844.
[21] Iyyer M, Manjunatha V, Boyd-Graber J, et al. Deep unordered composition rivals syntactic methods for text classification[C]//Proceedings of the Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, 2015: 1681-1691.
[22] Cortes C, Vapnik V. Support-vector networks[J]. Machine Learning, 1995, 20(3): 273-297.
[23] Nojavanasghari B, Gopinath D, Koushik J, et al. Deep multimodal fusion for persuasiveness prediction[C]//Proceedings of International Conference on Multimodal Interaction. ACM, 2016: 284-288.
[24] Zadeh A, Liang P P, Poria S, et al. Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018.

基金

国家自然科学基金(62006166,61976146,62076176);中国博士后科学基金(2019M661930);江苏高校优势学科建设工程自主项目
PDF(2574 KB)

3585

Accesses

0

Citation

Detail

段落导航
相关文章

/