面向医疗文本的实体及关系标注平台的构建及应用

张坤丽,赵旭,关同峰,尚柏羽,李羽蒙,昝红英

PDF(2358 KB)
PDF(2358 KB)
中文信息学报 ›› 2020, Vol. 34 ›› Issue (6) : 36-44.
语言资源建设

面向医疗文本的实体及关系标注平台的构建及应用

  • 张坤丽1,2,赵旭1,2,关同峰1,2,尚柏羽1,2,李羽蒙1,2,昝红英1,2
作者信息 +

A Platform for Entity and Entity Relationship Labeling in Medical Texts

  • ZHANG Kunli1,2, ZHAO Xu1,2, GUAN Tongfeng1,2, SHANG Baiyu1,2, LI Yumeng1,2, ZAN Hongying1,2
Author information +
History +

摘要

医疗文本数据是推行智慧医疗的重要数据基础,而医疗文本为半结构或非结构化数据,难以对其直接进行应用。对医疗文本中所包含的实体及实体关系进行标注是文本结构化的重要手段,也是命名实体识别、关系自动抽取研究的基础。传统的人工标注方法费力费时,已难以适应大数据发展的需求。该文以构建中文医学知识图谱的任务为驱动,构建了半自动化实体及关系标注平台。该平台融合多种算法,能够实现文本预标注、进度控制、质量把控和数据分析等多种功能。利用该平台,进行了医学知识图谱中实体和关系标注,结果表明该平台能够在文本资源建设中控制标注过程,保证标注质量,提高标注效率。同时该平台也被应用于其他文本标注任务,表明该平台具有较好的任务移植性。

Abstract

The medical text is an important data foundation for the implementation of intelligent healthcare. As a kind of semi-structured or unstructured data, the medical text needs to be labeled for entity and entity relationships, paving the way for text structuring, named entity recognition, and automatic relationship extraction. Aimed at constructing the Chinese medical knowledge graph, a semi-automated entity and relationship labeling platform is designed to integrate multiple algorithms for pre-labeling, schedule control, quality control and data analysis. Based on this platform, the medical knowledge graph entity and relationship labeling are carried out. The results show that the labeling platform can control the labeling process in the construction of text resources, ensure the labeling quality, and improve the labeling efficiency.

关键词

文本标注 / 标注平台 / 实体标注 / 关系标注 / 数据分析

Key words

text annotation / labeling platform / entity annotation / relationship annotation / data analysis

引用本文

导出引用
张坤丽,赵旭,关同峰,尚柏羽,李羽蒙,昝红英. 面向医疗文本的实体及关系标注平台的构建及应用. 中文信息学报. 2020, 34(6): 36-44
ZHANG Kunli, ZHAO Xu, GUAN Tongfeng, SHANG Baiyu, LI Yumeng, ZAN Hongying. A Platform for Entity and Entity Relationship Labeling in Medical Texts. Journal of Chinese Information Processing. 2020, 34(6): 36-44

参考文献

[1] 李昊迪.医学领域知识抽取方法研究[D]. 哈尔滨: 哈尔滨工业大学博士学位论文, 2018.
[2] Todd J, Richards B, Vanstone B J, et al. Text mining and automation for processing of patient referrals[J]. Applied Clinical Informatics, 2018, 09(01): 232-237.
[3] Cunningham H, Maynard D, Bontcheva K, et al. GATE: An architecture for development of robust HLT applications[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2002: 168-175.
[4] Morton T, LaCivita J. WordFreak: An open tool for linguistic annotation[C]//Proceedings of the Human Language Technology Conference of the NAACL: Demonstrations, 2003: 17-18.
[5] Druskat S, Bierkandt L, Gast V, et al. Atomic: An open-source software platform for multi-level corpus annotation[C]//Proceedings of The 12th Konferenz zur Verarbeitung Natürlicher Sprache, 2014: 228-234.
[6] Chen W T, Styler W.Anafora: A web-based generalpurpose annotation tool[C]//Proceedings of the 2013 NAACL HLT: Demonstrations, 2013: 14-19.
[7] Ogren P.Knowtator: A protégé plug-in for annotated corpus construction[C]//Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Demonstrations, 2006: 273-275.
[8] Stenetorp P, Pyysalo S, Topi'c G, et al. BRAT: A web-based tool for NLP-assisted text annotation[C]//Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2012: 102-107.
[9] Erdmann M, Maedche A, Schnurr H P, et al. From manual to semi-automatic semantic annotation: About ontology-based text annotation tools[C]//Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content, 2000: 79-85.
[10] 奥德玛,杨云飞,穗志方,等. 中文医学知识图谱CMeKG构建初探[J]. 中文信息学报, 2019, 33(10): 1-7.
[11] 刘浏, 王东波. 命名实体识别研究综述[J]. 情报学报, 2018, 37(3): 329-340.
[12] 赵哲焕,杨志豪,孙聪,等. 生物医学文献中的蛋白质关系抽取研究[J]. 中文信息学报, 2018,32(07): 87-95.
[13] Chiu J P C, Nichols E. Named entity recognition with bidirectional LSTM-CNNs[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370.
[14] Xu K, Zhou Z, Hao T, et al. A bidirectional LSTM and conditional random fields approach to medical named entity recognition[C]//Proceedings of the International Conference on Advanced Intelligent Systems and Informatics. Springer, Cham, 2017: 355-365.
[15] Dong C, Zhang J, Zong C, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C]//Proceedings of the International Conference on Computer Processing of Oriental Languages. Springer International Publishing, 2016: 239-250.
[16] Lipscomb C E. Medical subject headings (MeSH)[J]. Bulletin of the Medical Library Association, 2000, 88(3): 265-266.
[17] Sundararajan V, Henderson T, Perry C, et al. New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality[J]. Journal of clinical epidemiology, 2004, 57(12): 1288-1294.
[18] 昝红英,刘涛,牛常勇,等. 面向儿科疾病的实体及实体关系标准语料库构建与应用[J]. 中文信息学报,2020,34(5): 19-26.

基金

国家重点研发计划(2017YFB1002101);国家社会科学基金(18ZDA315);中国博士后科学基金(2019TQ0286);河南省科技攻关项目(192102210260);河南省医学科技攻关计划省部共建项目(SB201901021);河南省高等学校重点科研项目(19A520003,20A520038)
PDF(2358 KB)

4236

Accesses

0

Citation

Detail

段落导航
相关文章

/