基于BERT的临床术语标准化

孙曰君,刘智强,杨志豪,林鸿飞

PDF(2234 KB)
PDF(2234 KB)
中文信息学报 ›› 2021, Vol. 35 ›› Issue (4) : 75-82.
信息抽取与文本挖掘

基于BERT的临床术语标准化

  • 孙曰君,刘智强,杨志豪,林鸿飞
作者信息 +

Clinical Term Normalization Based on BERT

  • SUN Yuejun, LIU Zhiqiang, YANG Zhihao, LIN Hongfei
Author information +
History +

摘要

电子病历中的临床术语描述形式具有多样性和不规范性,阻碍了医疗数据的分析和利用,因此对临床术语标准化的研究具有重要的现实意义。当前国内医疗机构临床术语标准化主要由人工完成,效率低,成本高。该文提出了一种基于BERT的临床术语标准化方法。该方法使用Jaccard相似度算法从标准术语集中挑选出候选词,基于BERT模型对原始词和候选词进行匹配得到标准化的结果。在CHIP2019临床术语标准化评测任务的数据集上准确率为90.04%。实验结果表明,该方法对于临床术语标准化任务是有效的。

Abstract

The diversity of clinical terms in electronic medical records hinder the analysis and utilization of medical data. To address this issue, this paper proposes a method of clinical term normalization based on BERT. The method uses Jaccard similarity to select the candidate words from the standard term set, and matches the original words and candidate words based on BERT model to obtain standardized results. Evaluated on the dataset of CHIP2019 clinical term normalization evaluation task, the method obtains 90.04% accuracy.

关键词

临床术语 / 标准化 / BERT

Key words

clinical term / normalization / BERT

引用本文

导出引用
孙曰君,刘智强,杨志豪,林鸿飞. 基于BERT的临床术语标准化. 中文信息学报. 2021, 35(4): 75-82
SUN Yuejun, LIU Zhiqiang, YANG Zhihao, LIN Hongfei. Clinical Term Normalization Based on BERT. Journal of Chinese Information Processing. 2021, 35(4): 75-82

参考文献

[1] 赵逸凡,郑建立,徐霄玲.基于深度学习的电子病历实体标准化[J].软件导刊,2019,18(8):12-15.
[2] 黄嘉俊.基于组合语义相似度计算的疾病术语自动编码[J].微型电脑应用,2020,36(08):157-160.
[3] 宁温馨,于明.基于语义相似度计算的临床诊断自动编码算法研究[J].医学信息学杂志,2016,37(02):52-56.
[4] Larkey L S, Croft W B. Combining classifiers in text categorization[C]//Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland. New York: ACM Press, 1996: 289-297.
[5] Shi H,Xie P, Hu Z, et al. Towards automated ICD coding using deep learning[J]. arXiv Preprint arXiv:1711.04075, 2017.
[6] 张虹科,付振新,任前平,等.基于融合条目词嵌入和注意力机制的自动ICD编码[J].北京大学学报(自然科学版),2020,56(01):1-8.
[7] 杜逸超,徐童,马建辉,等.一种基于深度神经网络的临床记录ICD自动编码方法[J].大数据,2020,6(05):3-15.
[8] Xu K, Lam M, Pang J, et al. Multimodal machine learning for automated ICD coding [EB/OL].https://arxiv.org/abs/1810.13348v1[2019-04-15].
[9] Devlin J, Chang M, Lee K, et al. BERT: Pre-Training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 2018. 4171-4186.
[10] Shen Y, He X, Gao J, et al. A Latent semantic model with convolutional-pooling structure for information retrieval[C]//Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014. 101-110.
[11] Wan S,Lan Y, Guo J, et al. A Deep architecture for semantic matching with multiple positional sentence representations[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016. 2835-2841.
[12] Pang L,Lan Y, Guo J, et al. Text matching as image recognition[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016. 2793-2799.
[13] Wang Z,Hamza W, Florian R, et al. Bilateral multi-perspective matching for natural language sentences[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017. 4144-4150.
[14] Kim S, Hong J, Kang I, et al. Semantic sentence matching with densely-connected recurrent and coattentive information[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 6586-6593.
[15] Chen Q, Zhu X, Ling Z, et al.Enhanced LSTM for natural language inference[C]//Proceedings of the 55th Annual Meeting of Association for Computational Linguistics, 2017: 1657-1668.

基金

国家十三五重点研发计划(2016YFC0901902)
PDF(2234 KB)

2563

Accesses

0

Citation

Detail

段落导航
相关文章

/