基于标签增强和对比学习的鲁棒小样本事件检测

高怡,纪焘,吴苑斌,牟小峰,王椗

PDF(7709 KB)
PDF(7709 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (4) : 98-108.
信息抽取与文本挖掘

基于标签增强和对比学习的鲁棒小样本事件检测

  • 高怡1,纪焘1,吴苑斌1,牟小峰2,王椗2
作者信息 +

Robust Few Shot Event Detection Based on Label Augmentation and Contrastive Learning

  • GAO Yi1, JI Tao1, WU Yuanbin1, MOU Xiaofeng2, WANG Ding2
Author information +
History +

摘要

小样本事件检测旨在通过少量样本进行事件检测。由于训练规模的限制,现有小样本事件检测系统存在稳定性和鲁棒性较差的问题。为了提高小样本事件检测的稳定性和鲁棒性,该文提出一个基于标签增强和对比学习的小样本学习算法。在原型网络的基础上通过模板引入事件标签表示作为模型先验来降低模型对于数据的敏感性,同时引入对比学习从高维空间上优化句子表示,提高模型的鲁棒性。相比于小样本事件检测的强基线系统,该文提出的模型在FewEvent数据集 5-way-5-shot的情况下F1值提高了4.7%,MAVEN数据集提高了9.2%。另外在数据中混有40%噪声的情况下,该文模型相较于其他强基线系统也有10%的增益,实验证明,该文所提模型在鲁棒性和稳定性明显提高的同时,性能也有显著提高。

Abstract

Few-shot event detection identifies predefined event types by learning only a few annotated instances. To improve the stability and robustness of few-shot event detection, we propose a few-shot event detection algorithm based on label augmentation and contrastive learning. Based on the prototype network, we introduce the event label representation as the model prior to reduce the sensitivity of the model to the data, and the contrastive learning to optimize the sentence representation from the high-dimensional space to improve the robustness of the model. Compared to the existing meta-learning baseline systems, our model improves in the FewEvent dataset by 4.7% according to F1 in the 5-way-5-shot setting, and in the MAVEN by 9.2%. In addition, with 40% noise mixed in the data, our model has a 10% gain compared to other strong baseline systems.

关键词

小样本学习 / 原型网络 / 标签增强 / 对比学习

Key words

few shot event detection / prototype network / label augmentation / contrastive learning
 
/   /   /
 
/   /  

引用本文

导出引用
高怡,纪焘,吴苑斌,牟小峰,王椗. 基于标签增强和对比学习的鲁棒小样本事件检测. 中文信息学报. 2023, 37(4): 98-108
GAO Yi, JI Tao, WU Yuanbin, MOU Xiaofeng, WANG Ding. Robust Few Shot Event Detection Based on Label Augmentation and Contrastive Learning. Journal of Chinese Information Processing. 2023, 37(4): 98-108

参考文献

[1] DEVLIN J, CHANG M W, LEE K, et al.BERT: Pretraining of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL, 2019: 4171-4186.
[2] LIU X,ZHENG Y, DU Z, et al. GPT understands, too[J]. arXiv preprint arXiv:2103.10385, 2021.
[3] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[EB/OL]. https://gwern.net/doc/ai/nn/transformer/gpt/2019-radford.pdf.[2022-10-15].
[4] LIU Y, OTT M, GOYAL N, et al.RoBERTa: A robustly optimized BERT pretraining approach[J]. arXiv preprint arXiv:1907.11692, 2019.
[5] AHN D. The stages of event extraction[C]//Proceedings of the Workshop on Annotating and Reasoning about Time and Events, 2006: 1-8.
[6] LIAO S, GRISHMAN R. Using document level cross-event inference to improve event extraction[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010: 789-797.
[7] TORREY L,SHAVLIK J. Transfer learning[M]. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. IGI global, 2010: 242-264.
[8] BENGIO Y. Deep learning of representations for unsupervised and transfer learning[C]//Proceedings of ICML Workshop on Unsupervised and Transfer Learning. JMLR Workshop and Conference Proceedings, 2012: 17-36.
[9] SNELL J,SWERSKY K, ZEMEL R S. Prototypical networks for few-shot learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 4080-4090.
[10] DENG S, ZHANG N, KANG J, et al. Meta-learning with dynamic-memory-based prototypical network for few-shot event detection[C]//Proceedings of the 13th International Conference on Web Search and Data Mining, 2020: 151-159.
[11] WANG X, WANG Z, HAN X, et al. MAVEN: A massive general domain event detection dataset[C]//Proceedings of EMNLP, 2020: 1652-1671.
[12] KOCH G,ZEMEL R, SALAKHUTDINOV R. Siamese neural networks for one-shot image recognition[C]//Proceedings of the ICML Deep Learning Workshop, 2015.
[13] YANGARBER R, GRISHMAN R. Customization of information extraction systems[C]//Proceedings of the International Workshop on Lexically Driven Information Extraction, 1997: 1-11.
[14] SURDEANU M, HARABAGIU S. Infrastructure for open-domain information extraction[C]//Proceedings of the Human Language Technology Conference, 2002, 330.
[15] BOGURAEV B, MUNOZ R, PUSTEJOVSKY J. Proceedings of the workshop on annotating and reasoning about time and events[C]//Proceedings of the Workshop on Annotating and Reasoning about Time and Events, 2006.
[16] JI H, GRISHMAN R. Refining event extraction through cross-document inference[C]//Proceedings of ACL: Hlt, 2008: 254-262.
[17] HONG Y, ZHANG J, MA B, et al. Using cross-entity inference to improve event extraction[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011: 1127-1136.
[18] LI Q, JI H, HONG Y, et al. Constructing information networks using one single model[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 1846-1851.
[19] ZHU L, ZHENG H. Biomedical event extraction with a novel combination strategy based on hybrid deep neural networks[J]. BMC Bioinformatics, 2020, 21(1): 1-12.
[20] CHEN Y, XU L, LIU K, et al. Event extraction via dynamic multi-pooling convolutional neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015: 167-176.
[21] PENG H, SONG Y, ROTH D. Event detection and co-reference with minimal supervision[C]//Proceedings of the Conference on Empirical Methods In natural Language Processing, 2016: 392-402.
[22] LI S, LIU L,XIE Y, et al. PILED: An identify-and-localize framework for few-shot event detection[J]. arXiv preprint arXiv:2202.07615, 2022.
[23] LAI V D,DERNONCOURT F, NGUYEN T H. Extensively matching for few-shot learning event detection[J]. arXiv preprint arXiv:2006.10093, 2020.
[24] RUSU A A, RAO D, SYGNOWSKI J, et al. Meta-learning with latent embedding optimization[C]//Proceedings of the ICLR, 2019: 1-17.
[25] FINN C,ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the International Conference on Machine Learning. PMLR, 2017: 1126-1135.
[26] NICHOL A,ACHIAM J, SCHULMAN J. On first-order meta-learning algorithms[J]. arXiv preprint arXiv:1803.02999, 2018.
[27] MUNKHDALAI T, YU H. Meta networks[C]//Proceedings of the International Conference on Machine Learning. PMLR, 2017: 2554-2563.
[28] SANTORO A, BARTUNOV S, BOTVINICK M, et al. Meta-learning with memory-augmented neural networks[C] //Proceedings of the International Conference on Machine Learning. PMLR, 2016: 1842-1850.
[29] VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching Networks for One Shot Learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems,2016: 3637-3645.
[30] SUNG F, YANG Y, ZHANG L, et al. Learning to compare: Relation network for few-shot learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 1199-1208.
[31] GAO T, YAO X, CHEN D.Simcse: Simple contrastive learning of sentence embeddings[C]//Proceedings of EMNLP, 2021: 6894-6910.
[32] CONG X, CUI S, YU B, et al. Few-shot event detection with prototypical amortized conditional random field[C]//Proceedings of the Association for Computational Linguistics, 2021: 28-40.
[33] Chen J, Lin H, Han X, et al. Honey or poison?: Solving the trigger curse in few-shot event detection via causal intervention[C]//Proceedings of EMNLP, 2021: 8078-8088.
[34] GARCIA V, BRUNA J. Few-shot learning with graph neural networks[J].arXiv preprint arXiv:1711.04043, 2017.
[35] MISHRA N,ROHANINEJAD M, CHEN X, et al. A simple neural attentive meta-learner[C]//Proceedings of ICLR, 2018: 1-17.
[36] ZENG Y, YANG H, FENG Y, et al. A convolution BiLSTM neural network model for Chinese event extraction[M]. Natural Language Understanding and Intelligent Applications. Springer, Cham, 2016: 275-287.
[32] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
PDF(7709 KB)

1382

Accesses

0

Citation

Detail

段落导航
相关文章

/