一种文本相似度与BERT模型融合的手术操作术语归一化方法

杨飞洪,孙海霞,李姣

PDF(1181 KB)
PDF(1181 KB)
中文信息学报 ›› 2021, Vol. 35 ›› Issue (4) : 44-50.
信息抽取与文本挖掘

一种文本相似度与BERT模型融合的手术操作术语归一化方法

  • 杨飞洪,孙海霞,李姣
作者信息 +

A Method for Surgery Term Normalization Based on Text Similarity and BERT Model

  • YANG Feihong, SUN Haixia, LI Jiao
Author information +
History +

摘要

该文探究手术操作术语归一化方法的构建。首先,分析手术操作术语归一化数据集的特点;其次,调研术语归一化的相关方法;最后,结合调研知悉的技术理论方法和数据集特征,建立手术操作术语归一化模型。该文融合文本相似度排序+BERT模型匹配开展建模,在2019年中文健康信息处理会议(CHIP2019)手术操作术语归一化学术评测中,验证集准确率为88.35%,测试集准确率为88.51%,在所有参赛队伍中排名第5。

Abstract

To explore the method for surgery term normalization, this paper proposes a method of combining text similarity and BERT model. The model scheme is the text similarity ranking + BERT sentence pair matching model. This paper also analyzes the characteristics of the normalized surgery terms, and provides the related methods of clinical term normalization. In the CHIP2019 surgical term normalization task, the accuracy of this method on the verification set is 88.35%, and the accuracy on the test set is 88.51%, and the system based on this method ranked 5th among all participating teams.

关键词

手术术语 / 归一化 / BERT / 文本相似度

Key words

surgery terms / normalization / BERT / text similarity

引用本文

导出引用
杨飞洪,孙海霞,李姣. 一种文本相似度与BERT模型融合的手术操作术语归一化方法. 中文信息学报. 2021, 35(4): 44-50
YANG Feihong, SUN Haixia, LI Jiao. A Method for Surgery Term Normalization Based on Text Similarity and BERT Model. Journal of Chinese Information Processing. 2021, 35(4): 44-50

参考文献

[1] 医政医管局. 关于启动2019年全国三级公立医院绩效考核有关工作的通知[EB/OL]. http://www.nhc.gov.cn/yzygj/s3593g/201904/b8323261bb8a4175a2046d2-fffa93936.shtml.[2019-11-14].
[2] 第五届中国健康信息处理会议(CHIP) 评测1:临床术语归一化任务[EB/OL]. http://cips-chip.org.cn/evaluation.[2019-11-06].
[3] 宋洋, 王厚峰. 共指消解研究方法综述[J]. 中文信息学报, 2015,29(1): 1-12.
[4] Uryupina O, Saha S, Ekbal A, et al. Multi-metric optimization for coreference: The UniTN/IITP/Essex submission to the 2011 CoNLL shared task[C]//Proceedings of the 15th Conference on Computational Natural Language Learning: Shared Task:2011; Portland, Oregon: Association for Computational Linguistics; 2011: 61-65.
[5] Zhou G, Kong F. Global learning of noun phrase anaphoricity in coreference resolution via label propagation[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore; 2009: 978-986.
[6] 孔芳, 朱巧明, 周国栋. 中英文指代消解中待消解项识别的研究[J]. 计算机研究与发展, 2012, 49(05): 1072-1085.
[7] Cortes C, Vapnik V. Support-vector networks[J]. Machine Learning, 1995, 20(3): 273-297.
[8] Kim Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014:1746-1751.
[9] Liu P, Qiu X, Huang X. Recurrent neural network for text classification with multi-task learning[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, USA, 2016:2873-2879.
[10] Devlin J, Chang M, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 2019:4171-4186.
[11] Mccarthy J F, Lehner W G. Using decision trees for coreference resolution[C]//Proceedings of the International Joint Conferences on Artificial Intelligence, New York, USA,1995:1050-1055.
[12] Soon W M, Ng H T, Lim C Y. A machine learning approach to coreference resolution of noun phrases[J]. Computational Linguistics, 2001, 27: 521-544.
[13] Xu D, Zhang Z, Bethard S. A generate-and-rank for framework with semantic type regularization for biomedical concept normalization[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020.
[14] Li H, Chen Q, Tang B, et al. CNN-based ranking for biomedical entity normalization[J]. BMC Bioinformatics, 2017,18(Suppl 11): 385.
[15] Luo YF, Sun W, Rumshisky A. A hybrid normalization method for medical concepts in clinical narrative using semantic matching.[C]//Proceedings of AMIA Joint Summits on Translational Science, 2019:732-740.
[16] Ji Z, Wei Q, Xu H. BERT-based ranking for biomedical entity normalization[J]. arXiv,preprme arXiv;1908.03548,2019.
[17] Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2019, 36(4): 1234-1240.
[18] Huang K, Altosaar J, Rangnath R. ClinicalBERT: Modeling clinical notes and predicting hospital readmission[J]. arXiv,preprme arXiv;1904.05342, 2019.
[19] Chaput M. Whoosh is a fast, pure Python search engine library[CP/OL]. [2020-8-21]https://whoosh.readthedocs.io/en/latest/intro.html.

基金

中国医学科学院医学与健康科技创新工程(2018-I2M-AI-016);中国医学科学院中央级公益性科研院所基本科研业务费(2018PT33024)
PDF(1181 KB)

1338

Accesses

0

Citation

Detail

段落导航
相关文章

/