文档级事件抽取研究综述

王人玉,项威,王邦,代璐

PDF(4861 KB)
PDF(4861 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (6) : 1-14.
综述

文档级事件抽取研究综述

  • 王人玉,项威,王邦,代璐
作者信息 +

A Survey of Document-level Event Extraction

  • WANG Renyu, XIANG Wei, WANG Bang, DAI Lu
Author information +
History +

摘要

事件抽取旨在从非结构化的文本中抽取出人们感兴趣的事件信息并对其进行结构化表示。事件抽取具有广泛的应用,包括自动问答、机器翻译、推荐系统、信息检索、知识图谱构建等。现有的事件抽取研究综述,主要围绕句子级的事件抽取任务和实现方法展开。但事件的描述、事件元素和元素角色通常分布在整篇文档的多个句子中,更完整的事件抽取应从文档层面进行,即进行文档级事件抽取。近年来,随着深度学习技术的发展和多个文档级事件抽取数据集的公开发布,使文档级事件抽取受到了广泛的关注。该文对文档级事件抽取的相关研究进行了全面的综述: 首先介绍了文档级事件抽取任务的定义和常用数据集,然后对典型方法进行了梳理和分析,最后对未来的研究方向进行了展望。

Abstract

Event extraction aims to extract structural event information from massive unstructured text, which is widely used in intelligent question answering, machine translation, recommendation, information retrieval, knowledge graph construction, and etc. Existing event extraction surveys mainly summarize the event extraction task at sentence level. However, since an event and its arguments are usually described across multiple sentences in a document, a complete understanding of an event requires document-level event extraction. Recently the document-level event extraction has attracted extensive attention of researchers. This paper provides a comprehensive yet up-to-date survey for document-level event extraction. We not only introduce the task definition, public datasets for document-level event extraction, but also provide a taxonomy for its solution approaches. Finally, we discuss the future directions of this research.

关键词

文档级事件抽取 / 信息抽取 / 自然语言处理 / 神经网络

Key words

document-level event extraction / information extraction / natural language processing / neural networks

引用本文

导出引用
王人玉,项威,王邦,代璐. 文档级事件抽取研究综述. 中文信息学报. 2023, 37(6): 1-14
WANG Renyu, XIANG Wei, WANG Bang, DAI Lu. A Survey of Document-level Event Extraction. Journal of Chinese Information Processing. 2023, 37(6): 1-14

参考文献

[1] DODDINGTON G R, MITCHELL A, PRZYBOCKI M A, et al. The automatic content extraction program-tasks, data, and evaluation[C]//Proceedings of the International Conference on Language Resources and Evaluation, 2004: 837-840.
[2] BOYD-GRABER J, BRSCHINGER B. What question answering can learn from trivia nerds[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 7422-7435.
[3] CAO Q, TRIVEDI H, BALASUBRAMANIAN A, et al. Deformer: Decomposing pretrained transformers for faster question answering[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 4487-4497.
[4] HE T, TAN X, XIA Y, et al. Layer-wise coordination between encoder and decoder for neural machine translation[C]//Proceedings the 32nd Conference on Neural Information Processing Systems, 2018: 1-11.
[5] TAN X, REN Y, HE D, et al. Multilingual neural machine translation with knowledge distillation[C]//Proceedings of EMHLP-IJCNLP,2019: 963-973.
[6] LIU C Y, ZHOU C, WU J, et al. Cpmf: A collective pairwise matrix factorization model for upcoming event recommendation[C]//Proceedings of the International Joint Conference on Neural Networks. IEEE, 2017: 1532-1539.
[7] GAO L, WU J, QIAO Z, et al. Collaborative social group influence for event recommendation[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2016: 1941-1944.
[8] ZHANG W, ZHAO X, ZHAO L, et al. DRL4IR: 2nd workshop on deep reinforcement learning for information retrieval[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021: 2681-2684.
[9] KUHNLE A, AROCA-OUELLETTE M, BASU A, et al. Reinforcement learning for information retrieval[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021: 2669-2672.
[10] BOSSELUT A, LE BRAS R, CHOI Y. Dynamic neuro-symbolic knowledge graph construction for zero-shot commonsense question answering[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 4923-4931.
[11] WU X, WU J, FU X, et al. Automatic knowledge graph construction: A report on the icdm/icbk contest[C]//Proceedings of the IEEE International Conference on Data Mining. IEEE, 2019: 1540-1545.
[12] GUAN S, CHENG X, BAI L, et al. What is event knowledge graph: A survey[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 14(8): 1-20.
[13] 郭喜跃,何婷婷.信息抽取研究综述[J].计算机科学,2015,42(02): 14-17.
[14] 秦彦霞,张民,郑德权.神经网络事件抽取技术综述[J].智能计算机与应用,2018,8(03): 1-5.
[15] 高李政,周刚,罗军勇,等.元事件抽取研究综述[J].计算机科学,2019,46(08): 9-15.
[16] XIANG W, WANG B. A survey of event extraction from text[J]. IEEE Access, 2019, 7: 173111-173137.
[17] 项威,王邦.中文事件抽取研究综述[J].计算机技术与发展,2020,30(02): 1-6.
[18] LI Q, LI J, SHENG J, et al. A compact survey on event extraction: Approaches and applications[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 14(9): 1-21.
[19] 黄河燕,刘啸.面向新领域的事件抽取研究综述[J].智能系统学报,2022,17(01): 201-212.
[20] 马春明,李秀红,李哲,等.事件抽取综述[J]. 计算机应用,2022,42(10): 2975-2989.
[21] GRISHMAN R, SUNDHEIM B. Design of the MUC-6 evaluation[C]//Proceedings of the 6th Conference on Message Understanding, 1995: 1-11.
[22] MCLEAN V. Fourth message understanding conference (muc-4)[C]//Proceedings of 4th Message Understanding Conference, 1992.
[23] EBNER S, XIA P, CULKIN R, et al. Multi-sentence argument linking[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 8057-8077.
[24] LI S, JI H, HAN J. Document-level event argument extraction by conditional generation[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021: 894-908.
[25] YANG H, CHEN Y, LIU K, et al. DCFEE: A document-level chinese financial event extraction system based on automatically labeled training data[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, System Demonstrations, 2018: 50-55.
[26] CHEN P, YANG H, LIU K, et al. Reconstructing event regions for event extraction via graph attention networks[C]//Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020: 811-820.
[27] ZHENG S, CAO W, XU W, et al. Doc2EDAG: An end-to-end document-level framework for Chinese financial event extraction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processings, 2019: 337-346
[28] PYYSALO S, OHTA T, MIWA M, et al. Event extraction across multiple levels of biological organization[J]. Bioinformatics, 2012, 28(18): i575-i581.
[29] PATWARDHAN S, RILOFF E. A unified model of phrasal and sentential evidence for information extraction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2009: 151-160.
[30] HUANG R, RILOFF E. Peeling back the layers: Detecting event role fillers in secondary contexts[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011: 1137-1147.
[31] HUANG R, RILOFF E. Modeling textual cohesion for event extraction[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2012, 26(1): 1664-1670.
[32] HUANG R, RILOFF E. Bootstrapped training of event extraction classifiers[C]//Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012: 286-295.
[33] BORO E, BESANON R, FERRET O, et al. Event role extraction using domain-relevant word representations[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 1852-1857.
[34] LUO N, DU X, HE Y, et al. A framework for document-level cybersecurity event extraction from open source data[C]//Proceedings of the IEEE 24th International Conference on Computer Supported Cooperative Work in Design. IEEE, 2021: 422-427.
[35] 仲伟峰,杨航,陈玉博,等.基于联合标注和全局推理的篇章级事件抽取[J].中文信息学报,2019,33(09): 88-95,106.
[36] XU G, MENG Y, ZHOU X, et al. Chinese event detection based on multi-feature fusion and BiLSTM[J]. IEEE Access, 2019, 7: 134992-135004.
[37] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint arXiv: 1508.01991, 2015.
[38] 王雷,李瑞轩,李玉华,等.文档级无触发词事件抽取联合模型[J].计算机科学与探索,2021,15(12): 2327-2334.
[39] 葛君伟,乔蒙蒙,方义秋.基于上下文融合的文档级事件抽取方法[J].计算机应用研究,2022,39(01): 48-53.
[40] 郭鑫,高彩翔,陈千,等.面向新冠新闻的三阶段篇章级事件抽取方法[J/OL].计算机工程与应用, 2023,59(03): 150-157.
[41] WANG J, HAN B, WANG F, et al. Document-level core events extraction based on QA[J]. Journal of Physics: Conference Series. IOP Publishing, 2022, 2171(1): 012062.
[42] DU X, CARDIE C. Document-level event role filler extraction using multi-granularity contextualized encoding[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 8010-8020.
[43] LOU D, LIAO Z, DENG S, et al. MLBiNet: A cross-sentence collective event detection network[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021: 4829-4839.
[44] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[45] LIU Y, ZHANG L, YIN S, et al. An effective system for multi-format information extraction[C]//Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing. Springer, Cham, 2021: 460-471.
[46] YANG H, SUI D, CHEN Y, et al. Document-level event extraction via parallel prediction networks[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021: 6298-6308.
[47] LU Y, LIN H, XU J, et al. Text2event: Controllable sequence-to-structure generation for end-to-end event extraction[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021: 2795-2806.
[48] LIU Z, XU H, WANG H, et al. Structural dependency self-attention based hierarchical event model for Chinese financial event extraction[C]//Proceedings of the Conference on Knowledge Graph and Semantic Computing. Springer, Singapore, 2021: 76-88.
[49] XU Z, WANG Y, BAI L, et al. Writing style aware document-level event extraction[J]. arXiv preprint arXiv: 2201.03188, 2022.
[50] DU X, RUSH A M, CARDIE C. GRIT: Generative role-filler transformers for document-level event entity extraction[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021: 634-644.
[51] DU X, RUSH A M, CARDIE C. Template filling with generative transformers[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021: 909-914.
[52] HUANG K H, TANG S, PENG N. Document-level entity-based extraction as template generation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processings, 2021: 5257-5269.
[53] LEWIS M, LIU Y, GOYAL N, et al. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 7871-7880.
[54] ZHANG N, YE H, DENG S, et al. Contrastive information extraction with generative transformer[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3077-3088.
[55] DU X, LI S, JI H. Dynamic global memory for document-level argument extraction[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022: 5264-5275.
[56] CHEN Y, CHEN T, VAN DURME B. Joint modeling of arguments for event understanding[C]//Proceedings of the 1st Workshop on Computational Approaches to Discourse, 2020: 96-101.
[57] LIANG Y, JIANG Z, YIN D, et al. RAAT: Relation-augmented attention transformer for relation modeling in document-level event extraction[C]//Proceedings of the Conference of the North American Chapter of Association for Computational Linguistics: Human Language Technologies, 2022: 4985-4997.
[58] LIU P, YUAN W, FU J, et al. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing[J]. ACM Computing Surveys, 2023,59(9): 1-35.
[59] MA Y, WANG Z, CAO Y, et al. Prompt for extraction?: PAIE: prompting argument interaction for event argument extraction[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022: 6759-6774.
[60] LIU J, CHEN Y, XU J. Document-level event argument linking as machine reading comprehension[J]. Neurocomputing, 2022, 488: 414-423.
[61] ZHAO W, ZHANG J, YANG J, et al. A novel joint biomedical event extraction framework via two-level modeling of documents[J]. Information Sciences, 2021, 550: 27-40.
[62] XU R, LIU T, LI L, et al. Document-level event extraction via heterogeneous graph-based interaction model with a tracker[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021: 3533-3546.
[63] HUANG Y, JIA W. Exploring sentence community for document-level event extraction[C]//Proceedings of the Association for Computational Linguistics: EMNLP, 2021: 340-351.
[64] VEYSEH A P B, DERNONCOURT F, TRAN Q, et al. Inducing rich interaction structures between words for document-level event argument extraction[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, 2021: 703-715.
[65] VEYSEH A P B, VAN NGUYEN M, DERNONCOURT F, et al. Document-level event argument extraction via optimal transport[C]//Proceedings of the Association for Computational Linguistics: ACL, 2022: 1648-1658.
[66] CHAMBERS N, JURAFSKY D. Template-based information extraction without the templates[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011: 976-986.
[67] RUSU D, HODSON J, KIMBALL A. Unsupervised techniques for extracting and clustering complex events in news[C]//Proceedings of the 2nd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, 2014: 26-34.
[68] HUANG L, CASSIDY T, FENG X, et al. Liberal event extraction and event schema induction[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 258-268.
[69] AHN N. Inducing event types and roles in reverse: Using function to discover theme[C]//Proceedings of the Events and Stories in the News Workshop, 2017: 66-76.
[70] CHAMBERS N. Event schema induction with a probabilistic entity-driven model[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2013: 1797-1807.
[71] NGUYEN K H, TANNIER X, FERRET O, et al. Generative event schema induction with entity disambiguation[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015: 188-197.
[72] SHA L, LI S, CHANG B, et al. Joint learning templates and slots for event schema induction[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 428-434.
[73] CHEUNG J C K, POON H, VANDERWENDE L. Probabilistic frame induction[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2013: 837-846.
[74] LIU X, HUANG H, ZHANG Y. Open domain event extraction using neural latent variable models[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2860-2871.
[75] HAMBORG F, BREITINGER C, GIPP B. Giveme5w1h: A universal system for extracting main events from news articles[C]//Proceedings of the 7th International Workshop on News Recommendation and Analytics, 2019: 1-8.
[76] ZHANG Z, KONG X, LIU Z, et al. A two-step approach for implicit event argument detection[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 7479-7485.

基金

国家自然科学基金(62172167)
PDF(4861 KB)

1849

Accesses

0

Citation

Detail

段落导航
相关文章

/