The normalization of clinical terms is to assign a corresponding term in the standard term set to any term written by the doctor. This task is challenged by large amount of standard terms with high mutual similarity, as well as insufficient training data known as Zero-shot or Few-shot learning. This paper designs and implements a clinical term normalization system based on BERT entailment ranking. The system consists of four modules: data preprocessing, BERT entailment scoring, BERT quantity prediction, and logistic regression-based reordering.Tested in CHIP 2019 Track 1 "Evaluation of Chinese Clinical Term Normalization", it achieves a final accuracy of 0.948 25 as the top score in this campaign.
CHONG Weifeng, LI Hui, LI Xue, REN He, YU Dong, WANG Yehan.
Term Normalization System Based on BERT Entailment Reasoning. Journal of Chinese Information Processing. 2021, 35(5): 86-90
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 秦安京. 疾病分类编码准确是诊断相关分组(DRGs)的保障[J]. 中国病案, 2007, 8(7):10-11. [2] 赵亚辉. 临床医疗实体链接方法研究[D].哈尔滨: 哈尔滨工业大学硕士学位论文,2017. [3] Li Haodi, Cheng Qingcai, Tang Buzhou, et al. CNN-based ranking for biomedical entity normalization[J]. BMC Bioinformatics,2017,18(11): 80-91. [4] Jacob Devlin, Ming-Wei Chang, Kenton Lee, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018. [5] 刘爱民.病案信息学[M].北京: 人民卫生出版社,2014: 206-209.