深度生成式模型在临床术语标准化中的应用

PDF(3706 KB)

中文信息学报 ›› 2021, Vol. 35 ›› Issue (5) : 77-85.

信息抽取与文本挖掘

深度生成式模型在临床术语标准化中的应用

闫璟辉¹,向露^2,5,周玉^2,3,5,孙建^2,5,陈思^2,5,薛晨^3,4

作者信息 +

Clinical Entity Normalization Using Deep Generative Model

YAN Jinghui¹, XIANG Lu^2,5, ZHOU Yu^2,3,5, SUN Jian^2,5, CHEN Si^2,5, XUE Chen^3,4

Author information +

History +

摘要

临床术语标准化任务是医学统计中不可或缺的一部分。在实际应用中,一个标准的临床术语可能有数种口语化和非标准化的描述,而对于一些应用例如临床知识库的构建而言,如何将这些描述进行标准化是必须要面对的问题。该文主要关注中文临床术语的标准化任务,即将非标准的中文临床术语的描述文本和给定的临床术语库中的标准词进行对应。尽管一些深度判别式模型在简单文本结构的医疗术语,例如,疾病、药品名等的标准化任务上取得了一定成效,但对于中文临床术语标准化任务而言,其带标准化的描述文本中经常包含的信息缺失、“一对多”等情况,仅依靠判别式模型无法得到完整的语义信息,因而导致模型效果欠佳。该文将临床术语标准化任务类比为翻译任务,引入深度生成式模型对描述文本的核心语义进行生成并得到标准词候选集,再利用基于BERT的语义相似度算法对候选集进行重排序得到最终标准词。该方法在第五届中国健康信息处理会议(CHIP2019)评测数据中进行了实验并取得了很好的效果。

Abstract

Clinical entity normalization is an indispensable part of medical statistics. In practice, a standard clinical term entity has several kinds of colloquialisms and non-standardized mentions, and for some applications such as the a clinical knowledge base construction, how to normalize these mentions is an issue that has to address. This paper is focused on the Chinese clinical entity normalization, i.e., linking non-standard Chinese clinical entity to the standard words which are in the given clinical terminology base. Specifically, we treat the clinical entity normalization task as a translation task, and employ a deep learning model to generate the core semantics of the clinical mentions and obtain the candidate set of the standard entity. The final standard words were obtained by re-ranking the candidate set by using a BERT-based semantic similarity model. Experiments on the data of the 5th China Conference on Health Information Processing (CHIP2019) achieve good results.

导出引用

闫璟辉,向露,周玉,孙建,陈思,薛晨. 深度生成式模型在临床术语标准化中的应用. 中文信息学报. 2021, 35(5): 77-85

YAN Jinghui, XIANG Lu, ZHOU Yu, SUN Jian, CHEN Si, XUE Chen. Clinical Entity Normalization Using Deep Generative Model. Journal of Chinese Information Processing. 2021, 35(5): 77-85

参考文献

[1] Leaman R, Wei C H, Lu Z. tmChem: a high performance approach for chemical named entity recognition and normalization[J]. Journal of cheminformatics, 2015, 7(1): S3.
[2] D’Souza J. A Multi-pass Sieve for Name Normalization[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence, 2015: 4150-4151.
[3] Ghiasvand O, Kate R J. UWM: Disorder mention extraction from clinical text using CRFs and normalization using learned edit distance patterns[C]//Proceedings of the 8th International Workshop on Semantic Evaluation, 2014: 828-832.
[4] Dogan R I, Leaman R, Lu Z. NCBI disease corpus: a resource for disease name recognition and concept normalization[J]. Journal of Biomedical Informatics, 2014, 47: 1-10.
[5] Xu J, Lee H J, Ji Z, et al. UTH_CCB System for adverse drug reaction extraction from drug labels at TAC-ADR 2017[C]//Proceedings of the Text Analysis Conference, 2017.
[6] Li H, Chen Q, Tang B, et al. CNN-based ranking for biomedical entity normalization[J].Bioinformatics, 2017, 18(11): 385.
[7] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[8] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.
[9] Leaman R, Islamaj Dogan R, Lu Z. DNorm: disease name normalization with pairwise learning to rank[J]. Bioinformatics, 2013, 29(22): 2909-2917.
[10] Limsopatham N, Collier N. Normalising medical concepts in social media texts by learning semantic representation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1014-1023.
[11] Ji Z, Wei Q, Xu H. BERT-based ranking for biomedical entity normalization[J]. AMIA Summits on Translational Science Proceedings, 2020, 269.
[12] Luo Y, Song G, Li P, et al. Multi-task medical concept normalization using multi-view convolutional neural network[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2018: 5868-5875.
[13] Bakkelund D. An LCS-based string metric[R]. Olso, Norway: University of Oslo, 2009.

PDF(3706 KB)

1911

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献

Received	Published
2020-06-15	2021-05-20
Issue Date
2021-05-20

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注