昝红英,刘涛,牛常勇,赵悦淑,张坤丽,穗志方. 面向儿科疾病的命名实体及实体关系标注语料库构建及应用[J]. 中文信息学报, 2020, 34(5): 19-26.
ZAN Hongying, LIU Tao, NIU Changyong, ZHAO Yueshu, ZHANG Kunli, SUI Zhifang. Construction and Application of Named Entity and Entity Relations Corpus for Pediatric Diseases. , 2020, 34(5): 19-26.
Abstract:In the current medical corpus, the classification system of entities and entity relations is difficult to meet the development requirement of precision medicine. This paper conducts the research about pediatric diseases. In particular, this paper constructs an annotation system and detailed annotation schemes for named entity and entity relations under the guidance of medical experts. By fusing the relevant medical standard, annotation tools are applied for machine pre-annotation, manual annotation and manual proofreading of entities and entity relations in pediatric medical texts with more than 2.98 million words, thus constructing a medical entities and entity relations corpus for 504 common pediatric diseases. In this corpus, 23 603 named entities and 36 513 entity relationships were annotated, and for them the consistency accuracies of multiple-around annotation are 0.85 and 0.82, respectively. Based on the annotated corpus, this paper also constructs a pediatric medical knowledge graph and develops a pediatric medical knowledge QA system.
[1] Liu Y, Yang L L, Xu S Y, et al. Pediatrics in China: Challenges and prospects[J]. World Journal of Pediatrics, 2018, 14(5):1-3. [2] Meystre S, Hang P J. Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation[J]. Journal of Biomedical Informatics, 2006, 39(6): 589-599. [3] Savova G K, Masanz J J, Ogren P V, et al. Mayo clinical text analysis and knowledge extraction system(cTAKES): Architecture,component evaluation and applications[J]. Journal of the American Medical Informatics Association, 2010, 17(5): 507-513. [4] Roberts A, Gaizauskas R, Hepple M, et al. Building a semantically annotated corpus of clinical texts[J]. Journal of Biomedical Informatics, 2009, 42(5): 950-966. [5] Névéol A, Grouin C, Leixa J, et al. The QUAERO French medical corpus: A resource for medical entity recognition and normalization[C]//Proceedings of the 4thWorkshop on Building and Evaluating Resources for Health and Biomedical Text Processing - BioTxtM2014. 2014:24-30. [6] Campillos L, Louise Deleger, Grouin C, et al. A French clinical corpus with comprehensive semantic annotations: Development of the medical entity and relation LIMSI annotated text corpus(MERLOT)[J]. Language Resources and Evaluation, 2018, 52(2): 571-601. [7] Lei J, Tang B, Lu X, et al. A comprehensive study of named entity recognition in Chinese clinical text[J]. Journal of the American Medical Informatics Association, 2014, 21(5): 808-814. [8] Wang Y, Yu Z, Chen L, et al. Supervised methods for symptom flame recognition in free-text clinical records of traditional Chinese medicine: An empirical study[J]. Journal of Biomedical Informatics,2014,47:91-104. [9] 杨锦锋, 关毅, 何彬,等. 中文电子病历命名实体和实体关系语料库构建[J]. 软件学报, 2016, 27(11): 2725-2746. [10] 昝红英,韩杨超,范亚鑫,等. 中文症状知识库的建立与分析[J]. 中文信息学报, 2020, 34(4): 30-37. [11] 王卫平, 孙锟, 常立文. 儿科学(第9版)[M]. 北京:人民卫生出版社, 2018. [12] 沈晓明, 桂永浩. 临床儿科学(第2版)[M]. 北京:人民卫生出版社, 2013. [13] Uzuner , Mailoa J, Ryan RJ, et al. Semantic relations for problem-oriented medical records[J]. Artificial Intelligence in Medicine, 2010, 50:63-73. [14] Donghui Yue,Kunli Zhang,Lei Zhuang,et al. Annotation scheme and specification for named entities and relations on Chinese medical knowledge graph[C]//Proceedings of the 20th Chinese Lexical Semantic Workshop, 2019: 563-574. [15] Xia Fei,Yetisgen Meliha. Clinical corpus annotation: Challenges and strategies[C]//Proceedings of the 3rd Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2012) in Conjunction with the International Conference on Language Resources and Evaluation, Istanbul, Turkey, 2012. [16] Lipscomb C E. Medical Subject Headings[J]. Bulletin of the Medical Library Association, 2000, 88(3): 265-266. [17] Sundararajan V, Henderson T, Perry C, et al. New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality[J]. Journal of Clinical Epidemiology, 2004, 57(12): 1288-1294. [18] Nahler G. The anatomical therapeutic chemical classi- fication system (ATC)[J]. WHO Technical Report Series, 2005, 933:93-111. [19] Hripcsak G, Rothschild A S. Agreement, the f-measure, and reliability in information retrieval[J]. Journal of the American Medical Informatics Association, 2005, 12(3): 296-298. [20] Ogren P, Savova G, Chute C. Constructing evaluation corpora for automated clinical named entity recognition[C]//Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC'08). Marrakech, Morocco: European Language Resources Association, 2008: 28-30. [21] Artstein R, Poesio M. Inter-coder agreement for computational linguistics[J]. Computational Linguistics, 2008, 34(4): 555-596. [22] 奥德玛,杨云飞,穗志方,等. 中文医学知识图谱CMeKG构建初探[J]. 中文信息学报, 2019, 33(10): 1-7. [23] 昝红英,窦华溢,贾玉祥,等.基于多来源文本的中文医学知识图谱的构建[J/OL].郑州大学学报(理学版):1-7[2020-03-19].https://doi.org/10.13705/j.issn.1671-6841.2019383.