临床发现事件抽取即从电子病历中检测和提取所需事件的属性。由于事件属性的多样性、多事件属性的重叠性、垂直领域语料的专业性、样本分布的不均衡性增加了事件抽取任务的复杂程度,常规的方法无法很好地解决问题。为了适应任务的复杂性,该文提出了一种面向临床发现的管道式事件抽取方法,将事件抽取划分为基于序列标注的触发词抽取、基于指针网络的论元抽取和基于匹配的事件极性预测三个模块。该方法在中国健康信息处理会议(CHIP2021)评测2数据集上获得0.430 3的F1值,取得了第1名的成绩。
Abstract
Clinical discovery oriented event extraction is to detect and extract the attributes of the required events from the electronic medical records. This task is challenging due to the event attributes diversity, the overlapping of multi-event attributes, the speciality of the domain corpus and samples imbalance, and the conventional method cannot solve the problem properly. This paper proposes a clinical discovery oriented event extraction method including three modules: a trigger extraction module based on sequence labeling, an argument extraction module based on pointer network, and a polarity prediction module based on matching. Tested in CHIP 2021 Track 2 "Evaluation of Chinese Clinical Discovery Event Extraction", the method achieves 0.430 3 F1-score as the top score in this competition.
关键词
临床发现 /
事件抽取 /
流水线模型
{{custom_keyword}} /
Key words
clinical discovery /
event extraction /
pipeline model
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 刘坤尧.基于自然语言处理的临床决策支持系统[J].医学信息,2014,7: 3-4.
[2] 陈衡, 黄刊迪. 结构化电子病历概述[J]. 中国数字医学, 2011, 6(5): 36-39.
[3] XIANG W, WANG B. A survey of event extraction from text[J]. IEEE Access, 2019, 7: 173111-173137.
[4] ZHENG S, CAO W, XU W, et al. Doc2EDAG: An end-to-end document-level framework for Chinese financial event extraction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 337-346.
[5] DU X, CARDIE C. Event extraction by answering (Almost) natural questions[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020: 671-683.
[6] YANG S, FENG D, QIAO L, et al. Exploring pre-trained language models for event extraction and generation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5284-5294.
[7] XI X Y, YE W, ZHANG S, et al. Capturing event argument interaction via a bi-directional entity-level recurrent decoder[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021: 210-219.
[8] LOU D, LIAO Z, DENG S, et al.MLBiNet: A cross-sentence collective event detection network[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021: 4829-4839.
[9] LU Y, LIN H, XU J, et al. Text2Event: Controllable sequence-to-structure generation for end-to-end event extraction[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021: 2795-2806.
[10] 刘子晴. 中医门诊电子病历关键临床信息抽取方法研究[D].广州: 广州中医药大学博士学位论文,2021.
[11] 余杰,纪斌,刘磊,等.面向中文医疗事件的联合抽取方法[J].计算机科学,2021,48(11): 287-293.
[12] LI X, YAN H, QIU X, et al. FLAT: Chinese NER using flat-lattice transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 6836-6842.
[13] VASWANI A,SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[14] LAFFERTY J D, MCCALLUM A, PEREIRA F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning, 2001: 282-289.
[15] SHENG J, GUO S, YU B, et al. CasEE: A joint learning framework with cascade decoding for overlapping event extraction[C]//Proceedings of the Association for Computational Linguistics: ACL-IJCNLP, 2021:164-174.
[16] KENTON J D M W C, TOUTANOVA L K. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT, 2019: 4171-4186.
[17] LIU Y, OTT M, GOYAL N, et al. RoBERTa: A robustly optimized bert pretraining approach[J]. arXiv preprint arXiv:1907.11692, 2019.
[18] LAN Z, CHEN M, GOODMAN S, et al. ALBERT: A lite BERT for self-supervised learning of language representations[C]//Proceedings of the International Conference on Learning Representations, 2020.
[19] ZHANG N, CHEN M, BI Z, et al. CBLUE: A Chinese biomedical language understanding evaluation benchmark[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022: 7888-7915.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家重点研究与发展计划(2020AAA0106600);基础加强计划技术领域基金(2021-JCJQ-JJ-0059);北京市自然科学基金(4212026);北理工科技创新计划(23CX13027)
{{custom_fund}}