事件检测是一项经典的自然语言处理任务。然而在实践中,获取高质量的标注数据需要耗费大量的人力,这使得现有的基于监督学习的方法在面对大量未定义的新事件类型时表现不佳。面对零样本事件检测的困境,现有方法或者需要预定义的事件类型作为启发规则,或者由于自编码器类间特征提取能力不足,无法进一步归类发现的未知事件。为此,该文提出了一种基于对比学习与数据增强的零样本事件检测方法,通过对事件描述的重构与复写,自动为无监督的对比学习提供训练样本。模型只需要部分已知事件类别标准数据,便可以从大量文本中自动发现并归类新的事件类型。实验表明,该方法在保持对已知类别事件识别能力的同时,能够显著提升对未知事件类别识别的准确率。
Abstract
Event detection is a classic natural language processing task. In practice, acquiring high-quality labelled data is labor-intensive, which makes existing supervised-learning-based methods underperform in dealing with large numbers of new, undefined event types. In this paper, we propose a zero-shot event detection model based on contrastive learning and data augmentation, which automatically provides training examples for unsupervised comparative learning by refactoring and rewriting event descriptions. Our model can automatically discover and categorize new event types from a large amount of text, which requires only a fraction of the standard data of known event categories. Experiments show that our approach can significantly improve the accuracy of identifying unknown event categories while maintaining the ability to identify known categories from events.
关键词
零样本 /
事件抽取 /
对比学习
{{custom_keyword}} /
Key words
zero-shot /
event extraction /
contrastive learning
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] JI H, GRISHMAN R. Refining event extraction through cross-document inference[C]//Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, 2008: 254-262.
[2] CHEN Y, XU L, KANG L, et al. Event extraction via dynamic multi-pooling convolutional neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, 2015: 167-176.
[3] NGUYEN T M, NGUYEN T H. One for all: Neural joint modeling of entities and events[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 6851-6858.
[4] 王捷,洪宇,陈佳丽,等.基于共享BERT和门控多任务学习的事件检测方法[J].中文信息学报,2021, 35 (12): 94-102.
[5] 仲伟峰,杨航,陈玉博,等,基于联合标注和全局推理的篇章级事件抽取[J].中文信息学报,2019, 33 (9): 88-95.
[6] HUANG L, JI H, CHO K, et al. Zero-shot transfer learning for event extraction[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 2160-2170.
[7] HUANG L, JI H. Semi-supervised new event type induction and event detection[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020: 718-724.
[8] DENG S, ZHANG N, KANG J, et al. Meta-learning with dynamic-memory-based prototypical network for few-shot event detection[C]//Proceedings of the International Conference on Web Search and Data Mining, 2020: 151-159.
[9] HJELM R D, FEDOROV A, LAVOIEMARCHILDON S, et al. Learning deep representations by mutual information estimation and maximization[C]//Proceedings of the 7th International Conference on Learning Representations, 2019.
[10] MIKOLOV T, SUTSKEVER I, KAI C, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the Advances in Neural Information Processing Systems, 2013, 26: 3111-3119.
[11] WU Z, XIONG Y, YU S X, et al. Unsupervised feature learning via non-parametric instance discrimination[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 3733-3742.
[12] HE K, FAN H, WU Y, et al. Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9726-9735.
[13] LOGESWARAN L, LEE H.An efficient framework for learning sentence representations[C]//Proceedings of the International Conference on Learning Representations, 2018.
[14] DEVLIN J, CHANG M W, LEE K, et al.BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT, 2019: 4171-4186.
[15] HAN J, MORAGA C. The influence of the sigmoid function parameters on the speed of backpropagation learning[C]//Proceedings of International Workshop on Artificial Neural Networks: from Natural to Artificial Neural Computation, 1995: 195-201.
[16] HADSELL R, CHOPRA S, LECUN Y. Dimensionality reduction by learning an invariant mapping[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006: 1735-1742.
[17] KINGMA D, BA J. Adam: A method for stochastic optimization[C]//Proceedings of the International Conference on Learning Representations, 2015.
[18] ZHANG D, NAN F, WEI X, et al. Supporting clustering with contrastive learning[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021: 5419-5430.
[19] YANG, FAN L, LI Q, et al. Unknown intent detection using Gaussian mixture model with an application to zero-shot intent classification[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 1050-1060.
[20] 周志华.机器学习[M].北京: 清华大学出版社,2016.
[21] EDWARD B F, COLIN L M. A method for comparing two hierarchical clusterings[J]. Journal of the American Statistical Association, 1983, 78(383): 553-569.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61972155);上海市科学技术委员会基金(20DZ1100300)
{{custom_fund}}