基于BERT蕴含推理的术语标准化系统

崇伟峰,李慧,李雪,任禾,于东,王晔晗

PDF(1615 KB)
PDF(1615 KB)
中文信息学报 ›› 2021, Vol. 35 ›› Issue (5) : 86-90.
信息抽取与文本挖掘

基于BERT蕴含推理的术语标准化系统

  • 崇伟峰,李慧,李雪,任禾,于东,王晔晗
作者信息 +

Term Normalization System Based on BERT Entailment Reasoning

  • CHONG Weifeng, LI Hui, LI Xue, REN He, YU Dong, WANG Yehan
Author information +
History +

摘要

临床术语标准化即对于医生书写的任一术语,给出其在标准术语集合内对应的标准词。标准词数量多且相似度高,存在Zero-shot和Few-shot等问题,给术语标准化带来了巨大的挑战。该文基于“中国健康信息处理大会”CHIP 2019评测1中提供的数据集,设计并实现了基于BERT蕴含分数排序的临床术语标准化系统。该系统由数据预处理、BERT蕴含打分、BERT数量预测、基于逻辑回归的重排序四个模块组成。用精确率(Accuracy)作为评价指标,最终结果为0.948 25,取得了评测1第一名的成绩。

Abstract

The normalization of clinical terms is to assign a corresponding term in the standard term set to any term written by the doctor. This task is challenged by large amount of standard terms with high mutual similarity, as well as insufficient training data known as Zero-shot or Few-shot learning. This paper designs and implements a clinical term normalization system based on BERT entailment ranking. The system consists of four modules: data preprocessing, BERT entailment scoring, BERT quantity prediction, and logistic regression-based reordering.Tested in CHIP 2019 Track 1 "Evaluation of Chinese Clinical Term Normalization", it achieves a final accuracy of 0.948 25 as the top score in this campaign.

关键词

BERT / 术语标准化 / 蕴含推理

Key words

BERT / term normalization / entailment

引用本文

导出引用
崇伟峰,李慧,李雪,任禾,于东,王晔晗. 基于BERT蕴含推理的术语标准化系统. 中文信息学报. 2021, 35(5): 86-90
CHONG Weifeng, LI Hui, LI Xue, REN He, YU Dong, WANG Yehan. Term Normalization System Based on BERT Entailment Reasoning. Journal of Chinese Information Processing. 2021, 35(5): 86-90

参考文献

[1] 秦安京. 疾病分类编码准确是诊断相关分组(DRGs)的保障[J]. 中国病案, 2007, 8(7):10-11.
[2] 赵亚辉. 临床医疗实体链接方法研究[D].哈尔滨: 哈尔滨工业大学硕士学位论文,2017.
[3] Li Haodi, Cheng Qingcai, Tang Buzhou, et al. CNN-based ranking for biomedical entity normalization[J]. BMC Bioinformatics,2017,18(11): 80-91.
[4] Jacob Devlin, Ming-Wei Chang, Kenton Lee, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018.
[5] 刘爱民.病案信息学[M].北京: 人民卫生出版社,2014: 206-209.
PDF(1615 KB)

1714

Accesses

0

Citation

Detail

段落导航
相关文章

/