近年来,幽默识别逐渐成为自然语言处理领域的热点研究之一。已有的研究多聚焦于文本上的幽默识别,在多模态数据上开展此任务的研究相对较少,现有方法在学习模态间交互信息上存在不足。该文提出了基于注意力机制的模态融合模型,首先对单模态上下文进行独立编码,得到单一模态的特征向量;然后将注意力机制作用于两种模态的特征序列,使用层级注意力结构捕获多模态信息在段落上下文中的关联与交互。该文在UR-FUNNY公开数据集上进行了实验,相比之前最优结果在精确率上提升了1.37%。实验表明,该文提出的模型能很好地对多模态上下文进行建模,引入多模态交互信息和段落上下文信息可提高幽默识别的性能。
Abstract
Current humor detection is focused on textual humor recognition rather than carrying out this task on multimodal data. This paper proposes a modal fusion approach to humor detection based on the attention mechanism. Firstly, the model encodes each single-modal context to obtain the feature vector, and then the hierarchical attention mechanism is applied on feature sequences to capture the correlation of multi-modal information in the paragraph context. Tested on the UR-FUNNY public data set, the proposed model achieves an improvement of 1.37% in accuracy compared to the previous best result.
关键词
幽默识别 /
多模态融合 /
注意力机制
{{custom_keyword}} /
Key words
humor detection /
modal fusion /
attention mechanism
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] GARNER R. Humor, analogy, and metaphor: HAM it up in teaching[J]. Radical Pedagogy, 2005, 6(2): 1-1.
[2] WANZER M B,FRYMIER A B, IRWIN J. An explanation of the relationship between instructor humor and student learning: Instructional humor processing theory[J]. Communication Education, 2010, 59(1): 1-18.
[3] MORSE D R. Use of humor to reduce stress and pain and enhance healing in the dental setting[J]. Journal of the New Jersey Dental Association, 2007, 78(4): 32-36.
[4] 林鸿飞,张冬瑜,杨亮,等.幽默计算及其应用研究[J].山东大学学报(理学版),2016,51(07):1-10.
[5] HASAN M K, RAHMAN W, ZADEH A, et al. Ur-funny: A multimodal language dataset for understanding humor[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 2046-2056.
[6] CHOUBE A, SOLEYMANI M. Punchline detection using context-aware hierarchical multimodal fusion[C]//Proceedings of the International Conference on Multimodal Interaction, 2020: 675-679.
[7] MIHALCEA R, STRAPPARAVA C. Making computers laugh: Investigations in automatic humor recognition[C]//Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2005: 531-538.
[8] ZHANG R, LIU N. Recognizing humor on twitter[C]//Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, 2014: 889-898.
[9] YANG D, LAVIE A, DYER C, et al. Humor recognition and humor anchor extraction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015: 2367-2376.
[10] BARBIERI F, SAGGION H. Automatic detection of irony and humour in Twitter[C]//Proceedings of the 15th International Conference on Computational Creativity, 2014: 155-162.
[11] BAZIOTIS C, PELEKIS N, DOULKERIDIS C. Datastories at Semeval-2017 task 6: Siamese LSTM with attention for humorous text comparison[C]//Proceedings of the 11th International Workshop on Semantic Evaluation, 2017: 390-395.
[12] ORTEGA-BUENO R, MUNIZ-CUZA C E, PAGOLA J E M, et al. UO UPV: Deep linguistic humor detection in Spanish social media[C]//Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages co-located with 34th Conference of the Spanish Society for Natural Language Processing, 2018: 204-213.
[13] BLINOV V, BOLOTOVA-BARANOVA V, BRASLAVSKI P. Large dataset and language model fun-tuning for humor recognition[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 4027-4032.
[14] ZADEH A, CHEN M,PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 1103-1114.
[15] GU Y, CHEN S,MARSIC I. Deep multimodal learning for emotion recognition in spoken language[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2018: 5079-5083.
[16] PORIA S, CAMBRIA E, HAZARIKA D, et al. Context-dependent sentiment analysis in user-generated videos[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 873-883.
[17] BERTERO D, FUNG P. Predicting humor response in dialogues from TV sitcoms[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2016: 5780-5784.
[18] EYBEN F, WENINGER F, GROSS F, et al. Recent developments in open smile, the munich open-source multimedia feature extractor[C]//Proceedings of the 21st ACM International Conference on Multimedia, 2013: 835-838.
[19] BERTERO D, FUNG P. Deep learning of audio and language features for humor prediction[C]//Proceedings of the 10th International Conference on Language Resources and Evaluation, 2016: 496-501.
[20] PENNINGTON JEFFREY, et al. Glove: Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 1532-1543.
[21] DEGOTTEX G, et al. COVAREP — A collaborative voice analysis repository for speech technologies[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2014: 960-964.
[22] BALTRUSAITIS T, et al. OpenFace: An open source facial behavior analysis toolkit[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2016: 1-10.
[23] YUAN,J H, MARK L. Speaker identification on the SCOTUS corpus[J]. Journal of the Acoustical Society of America, 2008, 123 (5): 3878-3878.
[24] KINGMA D P, BA J. Adam: A method for stochastic optimization[C]//Proceedings of 3rd International Conference for Learning Representations, 2015: 1-12.
[25] CHEN L, LEE C M I. Predicting audiences laughter using convolutional neural network[C]//Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 2017: 86-90.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(62076046,61702080)
{{custom_fund}}