基于去偏对比学习的多模态命名实体识别

PDF(6218 KB)

中文信息学报 ›› 2023, Vol. 37 ›› Issue (11) : 49-59.

信息抽取与文本挖掘

基于去偏对比学习的多模态命名实体识别

张鑫¹,袁景凌^1,2,李琳^1,2,刘佳^3,4

作者信息 +

Debiased Contrastive Learning for Multimodal Named Entity Recognition

ZHANG Xin¹, YUAN Jingling^1,2, LI Lin^1,2, LIU Jia^3,4

Author information +

History +

摘要

命名实体识别作为信息抽取的关键环节,在自然语言处理领域有着广泛应用。随着互联网上多模态信息的不断涌现,研究发现视觉信息有助于文本实现更加准确的命名实体识别。现有工作通常将图像视为视觉对象的集合,试图将图像中的视觉对象与文本中的实体显式对齐。然而,当二者在数量或语义上不一致时,这些方法往往不能很好地应对模态偏差,从而难以实现图像和文本之间的准确语义对齐。针对此问题,该文提出了一种基于去偏对比学习的多模态命名实体识别方法(DebiasCL),利用视觉对象密度指导视觉语境丰富的图文作为扩充样本,通过去偏对比学习优化图文共享的潜在语义空间学习,实现图像与文本间的隐式对齐。在Twitter-2015和Twitter-2017上进行实验,DebiasCL的F₁值分别达到75.04%和86.51%,在“PER.”和“MISC.”类别数据中F₁分别提升了5.23%和5.2%。实验结果表明,该方法可以有效缓解模态偏差,从而提升多模态命名实体识别系统性能。

Abstract

Recent studies show that visual information can help text achieve more accurate named entity recognition. However, most of the exiting work treats an image as a collection of visual objects and attempts to explicitly align visual objects with entities in text, fails to cope with modal bias well when visual objects and the entities are quantitatively and semantically inconsistent. To deal with this problem, we propose a debiased contrastive learning approach (DebiasCL) for multimodal named entity recognition. We utilize the visual objects density to guide visual context-rich sample mining, which enhances debiased contrastive learning to achieve better implicit alignment by optimizing the latent semantic space learning between visual and textual representations. Empirical results shows that the DebiasCL achieves a F₁-value of 75.04% and 86.51%, with 5.23% and 5.2% increased on "PER" and "MISC" entity type data in Twitter-2015 and Twitter-2017, respectively.

导出引用

张鑫,袁景凌,李琳,刘佳. 基于去偏对比学习的多模态命名实体识别. 中文信息学报. 2023, 37(11): 49-59

ZHANG Xin, YUAN Jingling, LI Lin, LIU Jia. Debiased Contrastive Learning for Multimodal Named Entity Recognition. Journal of Chinese Information Processing. 2023, 37(11): 49-59

参考文献

[1] ZHANG Q, FU J L, LIU X Y, et al. Adaptive co-attention network for named entity recognition in tweets[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018: 5674-5681.
[2] MOON S, NEVES L, CARVALHO V. Multimodal named entity recognition for short social media posts[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 852-860.
[3] LU D, NEVES L, CARVALHO V, et al. Visual attention model for name tagging in multimodal social media[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2018: 1990-1999.
[4] YU J, JIANG J, YANG L, et al. Improving multimodal named entity recognition via entity span detection with unified multimodal transformer[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2020: 3342-3352.
[5] WU Z, ZHENG C, CAI Y, et al. Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts[C]//Proceedings of the ACM International Conference on Multimedia, 2020: 1038-1046.
[6] ZHANG D, WEI S, LI S, et al. Multi-modalgraph fusion for named entity recognition with targeted visual guidance[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 14347-14355.
[7] JU X, ZHANG D, LI J, et al. Transformer-based label set generation for multi-modal multi-label emotion detection[C]//Proceedings of the ACM International Conference on Multimedia, 2020: 512-520.
[8] 李春楠,王雷,孙媛媛,等.基于BERT的盗窃罪法律文书命名实体识别方法[J].中文信息学报,2021,35(08): 73-81.
[9] 郭力华,李旸,王素格,等.基于匹配策略和社区注意力机制的法律文书命名实体识别[J].中文信息学报,2022,36(02): 85-92.
[10] HE K, FAN H, WU Y, et al. Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9726-9735.
[11] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]//Proceedings of the International Conference on Machine Learning, 2020: 1597-1607.
[12] KIM S, JEONG S, KIM E, et al. Self-supervised pre-training and contrastive representation learning for multiple-choice video QA[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 13171-13179.
[13] YAN Y, LI R, WANG S, et al. ConSERT: A contrastive framework for self-supervised sentence representation transfer[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, 2021: 5065-5075.
[14] GIORGI J M, NITSKI O, WANG B, et al. DeCLUTR: Deepcontrastive learning for unsupervised textual representations[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, 2021: 879-895.
[15] LI W, GAO C, NIU G, et al. UNIMO: Towardsunified-modal understanding and generation via cross-modal contrastive learning[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, 2021: 2592-2607.
[16] DEVLIN J, CHANG M-W, LEE K,et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186.
[17] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, 2017: 5998-6008.
[18] CHOI H, KIM J, JOE S, et al. Evaluation of BERT and ALBERTsentence embedding performance on downstream NLP tasks[C]//Proceedings of the International Conference on Pattern Recognition, 2020: 5482-5487.
[19] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[20] HE K, GKIOXARI G, DOLLR P, et al. Mask R-CNN[C]//Proceedings of the International Conference on Computer Vision, 2017: 2980-2988.
[21] MA X, HOVY E H. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2016.
[22] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 260-270.

基金

科技大数据湖北省重点实验室(中国科学院武汉文献情报中心)开放基金课题资助项目(20211h0437);湖北重点研发计划项目(2021BAA030);湖北省制造业高质量发展项目(2206-420118-89-04-959008)

PDF(6218 KB)

728

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金

Published
2024-01-25
Issue Date
2024-01-28