中文医疗大模型综述: 进展、评估与挑战

籍欣萌,昝红英,崔婷婷,张坤丽

PDF(1388 KB)
PDF(1388 KB)
中文信息学报 ›› 2024, Vol. 38 ›› Issue (11) : 1-12.
综述

中文医疗大模型综述: 进展、评估与挑战

  • 籍欣萌,昝红英,崔婷婷,张坤丽
作者信息 +

A Survey of Chinese Large Language Models in Medicine: Progress, Evaluation and Challenges

  • JI Xinmeng, ZAN Hongying, CUI Tingting, ZHANG Kunli
Author information +
History +

摘要

大型语言模型(Large Language Models,LLMs)已在多个领域受到广泛关注,并取得了令人瞩目的成绩。将LLMs应用于医学领域,如辅助医疗诊断、影像报告生成等,在人工智能和临床医学中都是很有前景的研究方向。由于中外医生对患者进行医疗诊断的过程存在差异,包括治疗方式、用药习惯和用药剂量等,特别是在传统中医领域,因此,为了更好地满足中文医疗领域的需求,构建大规模真实的中文医学数据集,开发中文医疗大模型是十分重要的。该文从医疗问诊、医学影像、心理健康三个方面对当前中文医疗大模型进行了概述,并介绍了当前中文医疗大模型已有的评测基准,阐述了当前中文医疗大模型面临的挑战,如幻觉、价值对齐等。未来的研究将致力于解决这些问题,并扩展医疗大模型的应用场景。

Abstract

Large language models have received widespread attention for their remarkable success in various fields. Applying large language models to the medical domain, such as assisting doctors in diagnosis and generating imaging reports, holds great promise for both artificial intelligence and clinical medicine. Due to differences between Chinese and foreign doctors in medical diagnosis, treatment methods, medication habits and drug dosages, it is important to build large-scale real Chinese medical datasets and develop Chinese large language models in medicine for the Chinese medical field. This paper overviews current Chinese large language models in medicine in three aspects: medical consultation, medical imaging, and mental health. Additionally, it introduces the evaluation benchmarks currently available for Chinese large language models in medicine and discusses the challenges like hallucinations and value alignment. Future researches of Chinese large language models in medicine are also discussed.

关键词

大语言模型 / 医疗大模型 / 人工智能

Key words

large language models / medical large language models / artificial intelligence

引用本文

导出引用
籍欣萌,昝红英,崔婷婷,张坤丽. 中文医疗大模型综述: 进展、评估与挑战. 中文信息学报. 2024, 38(11): 1-12
JI Xinmeng, ZAN Hongying, CUI Tingting, ZHANG Kunli. A Survey of Chinese Large Language Models in Medicine: Progress, Evaluation and Challenges. Journal of Chinese Information Processing. 2024, 38(11): 1-12

参考文献

[1] SUN K, LUO X, LUO M Y. A survey of pretrained language models[C]//Proceedings of the International Conference on Knowledge Science, Engineering and Management. Cham: Springer International Publishing, 2022: 442-456.
[2] GE M, MAO R, CAMBRIA E. A survey on computational metaphor processing techniques: From identification, interpretation, generation to application[J]. Artificial Intelligence Review, 2023: 1-67.
[3] LI J. Recent advances in end-to-end automatic speech recognition[J].arXiv.preprint.arXiv.2111.01690,2021.
[4] 冯洋,邵晨泽. 神经机器翻译前沿综述[J]. 中文信息学报, 2020, 34(7): 1-18.
[5] RANATHUNGA S, LEE E S A, PRIFTI SKENDULI M, et al. Neural machine translation for low-resource languages: A survey[J]. ACM Computing Surveys, 2023, 55(11): 1-37.
[6] BAO H, HE K, YIN X, et al. Bert-based meta-learning approach with looking back for sentiment analysis of literary book reviews[C]//Proceedings of the Natural Language Processing and Chinese Computing: 10th CCF International Conference, Qingdao, China, 2021: 235-247.
[7] MAO R, LI X. Bridging towers of multi-task learning with a gating mechanism for aspect-based sentiment analysis and sequential metaphor identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(15): 13534-13542.
[8] HE K, HUANG Y, MAO R, et al. Virtual prompt pre-training for prototype-based few-shot relation extraction[J]. Expert Systems with Applications, 2023, 213: 118927.
[9] 庄传志,靳小龙,朱伟建,等. 基于深度学习的关系抽取研究综述[J]. 中文信息学报, 2019, 33(12): 1-18.
[10] KENTON J D M W C, TOUTANOVA L K. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of naacL-HLT, 2019.
[11] PERGOLA G, KOCHKINA E, GUI L, et al. Boosting low-resource biomedical QA via entity-aware masking strategies[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021: 1977-1985.
[12] LEE J, YOON W, KIM S, et al.BioBERT: A pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
[13] ZHAO W X, ZHOU K, LI J, et al.A survey of large language models[J]. arXiv preprint arXiv:2303.18223, 2023.
[14] WANG Y, YAO Q, KWOK J T, et al. Generalizing from a few examples: A survey on few-shot learning[J]. ACM Computing Surveys, 2020, 53(3): 1-34.
[15] CHOWDHERY A, NARANG S, DEVLIN J, et al. Palm: Scaling language modeling with pathways[J]. Journal of Machine Learning Research,2023,24(240): 1-113.
[16] TOUVRON H, LAVRIL T, IZACARD G, et al. Llama: Open and efficient foundation language mode[J]. arXiv. preprint.arXiv: 2302.13971,2023.
[17] TOUVRON H, MARTIN L, STONE K, et al.Llama 2: Open foundation and fine-tuned chat models[J]. arXiv preprint arXiv:2307.09288, 2023.
[18] ACHIAM J, ADLER S, AGARWAL S, et al. Gpt-4 technical report[J]. arXiv preprint arXiv:2303.08774, 2023.
[19] DU Z, QIAN Y, LIU X, et al. GLM: General language model pretraining with autoregressive blank infilling[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022: 320-335.
[20] ZENG A, LIU X, DU Z, et al.Glm-130b: An open bilingual pre-trained model[J]. arXiv preprint arXiv:2210.02414, 2022.
[21] SINGHAL K, TU T, GOTTWEIS J, et al.Towards expert-level medical question answering with large language models[J]. arXiv preprint arXiv:2305.09617, 2023.
[22] JIN D, PAN E, OUFATTOLE N, et al. What disease does this patient have?A large-scale open domain question answering dataset from medical exams[J]. Applied Sciences, 2021, 11(14): 6421.
[23] WANG H, LIU C, XI N, et al.Huatuo: Tuning llama model with Chinese medical knowledge[J]. arXiv preprint arXiv:2304.06975, 2023.
[24] ZHANG H, CHEN J, JIANG F, et al. HuatuoGPT, towards taming language model to be a doctor[C]//Proceedings of the Association for Computational Linguistics: EMNLP, 2023: 10859-10885.
[25] SHU C, CHEN B, LIU F, et al. Visual med-alpaca: A parameter-efficient biomedicalllm with visual capabilities[EB/OL]. https://github.com/cambridgellt/visual-medalpaca.
[26] ZHOU H, GU B, ZOU X, et al. A survey of large language models in medicine: Progress, application, and challenge[J]. arXiv preprint arXiv:2311.05112, 2023.
[27] HE K, MAO R, LIN Q, et al.A survey of large language models for healthcare: From data, technology, and applications to accountability and ethics[J]. arXiv preprint arXiv:2310.05694, 2023.
[28] XIONG H, WANG S, ZHU Y, et al.DoctorGLM: Fine-tuning your Chinese doctor is not a herculean task[J]. arXiv preprint arXiv:2304.01097, 2023.
[29] CHEN Y, WANG Z, XING X, et al. BianQue: Balancing the questioning and suggestion ability of health llms with multi-turn health conversations polished by chatgpt[J]. arXiv preprint arXiv:2310.15896, 2023.
[30] WANG H, LIU C, ZHAO S, et al.ChatGLM-Med: 基于中文医学知识的ChatGLM模型微调[EB/OL].https://github.com/SCIR-HI/Med-ChatGLM.[2024-06-14].
[31] XU M. MedicalGPT: Training medical GPT model[EB/OL].https://github.com/shibing624/MedicalGPT. [2024-06-14].
[32] WANG G, YANG G, DU Z, et al.ClinicalGPT: Large language models finetuned with diverse medical data and comprehensive evaluation[J]. arXiv preprint arXiv:2306.09968, 2023.
[33] ZHU W, WANG X L. ChatMed: A Chinese medical large language model[EB/OL]. https://github.com/michael-wzhu/ChatMed.[2024-06-14].
[34] LIAO Y S, MENG Y T, LIU H C, et al. MING-MOE: Enhancing medical multi-task learning in large language models with sparse mixture of low-rank adapter experts[J]. arxiv preprint arxiv:2404.09027, 2024.
[35] WANG R, DUAN Y, LAM C T, et al.IvyGPT: Interactive Chinese path way language model in medical domain[C]//Proceedings of the CAAI International Conference on Artificial Intelligence. Singapore: Springer Nature Singapore, 2023: 378-382.
[36] YANG S H, ZHAO H J, ZHU S B, et al. Zhongjing: Enhancing the Chinese medical capabilities of large language model through expert feedback and real-world multi-turn dialogue[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(17): 19368-19376.
[37] BAO Z, CHEN W, XIAO S, et al.DISC-MedLLM: Bridging general large language models and real-world medical consultation[J]. arXiv preprint arXiv:2308.14346, 2023.
[38] ZHU W, YUE W J, WANG X L. ShenNong-TCM: A Traditional Chinese medicine large language model[EB/OL].https://github.com/michael-wzhu/ShenNong-TCM-LLM.[2024-06-14].
[39] WANG R S, DUAN Y F, LI J, et al. XrayGLM: The first chinese medical multimodal model that chest radiographs summarization[EB/OL].https://github.com/WangRongsheng/XrayGLM. [2024-06-14].
[40] LIU J, WANG Z, YE Q, et al. Qilin-Med-VL: Towards Chinese large vision-language model for general healthcare[J]. arXiv preprint arXiv:2310.17956, 2023.
[41] CHEN Y, XING X, LIN J, et al.SoulChat: Improving LLMs’ empathy, listening, and comfort abilities through fine-tuning with multi-turn empathy conversations[C]//Proceedings of the Association for Computational Linguistics, 2023: 1170-1183.
[42] YAN X, XUE D. MindChat: Psychological large language model[EB/OL].https://github.com/X-D-Lab/MindChat.[2024-06-14].
[43] ALTAN D, SARIEL S. Clue-AI: A convolutional three-stream anomaly identification framework for robot manipulation[J]. IEEE Access, 2023.
[44] MUENNIGHOFF N, WANG T, SUTAWIKA L, et al.Crosslingual generalization through multitask finetuning[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023.
[45] LE S T, FAN A, AKIKI C, et al. BLOOM: A 176 b-parameter open-access multilingual language model[J]. arXiv preprint arXiv: 2211.05100,2022.
[46] YANG A, XIAO B, WANG B, et al. Baichuan 2: Open large-scale language models[J]. arXiv preprint arXiv:2309.10305, 2023.
[47] YANG P, WANG J, GAN R, et al. Zero-shot learners for natural language understanding via a unified multiple choice perspective[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2022: 7042-7055.
[48] HU E J, WALLIS P, ALLEN ZHU Z, et al.Lora: Low-rank adaptation of large language models[J]. arXiv preprint arXiv:2106.09685, 2021.
[49] LIU X, JI K, FU Y, et al. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022: 61-68.
[50] ZHANG L, ZHAO K.ChatYuan: A large language model for dialogue in Chinese and English[EB/OL]. https://github.com/clue-ai/ChatYuan.[2024-06-14].
[51] BAI Y, JONES A, NDOUSSE K, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback[J]. arXiv preprint arXiv:2204.05862, 2022.
[52] WANG Y, KORDI Y, MISHRA S, et al.Self-instruct: Aligning language models with self-generated instructions[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023.
[53] ZHU D Y, CHEN J, SHEN X, et al. MiniGPT-4: Enhancing vision-language understanding with advanced large language models[C]//Proceedings of the 12th International Conference on Learning Representations, 2023.
[54] YE Q, XU H, XU G, et al.mPLUG-Owl: Modularization empowers large language models with multimodality[J]. arXiv preprint arXiv:2304.14178, 2023.
[55] GONG T, LYU C, ZHANG S, et al. Multimodal-GPT: A vision and language model for dialogue with humans[J]. arXiv preprint arXiv:2305.04790, 2023.
[56] LIU H, LI C, WU Q,et al. Visual instruction tuning[C]//Proceedigns of the 37th Conference on Neural Information Processing Systems, 2023.
[57] YIN S K, FU C Y, ZHAO S R,et al. A survey on multimodal large language models[J]. arXiv preprint arXiv:2306.13549, 2023.
[58] DU Z, QIAN Y, LIU X, et al. GLM: General language model pretraining with autoregressive blank infilling[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022: 320-335.
[59] SABOUR S, ZHANG W, XIAO X, et al. A chatbot for mental health support: Exploring the impact of Emohaa on reducing mental distress in China[J]. Frontiers in Digital Health, 2023, 5: 1133987.
[60] TEAM InternLM. InternLM: A multilingual language model with progressively enhanced capabilities[EB/OL].https://github.com/InternLM/InternLM, 2023.[2024-06-14].
[61] BAI J, BAI S, YANG S, et al.Qwen-VL: A frontier large vision-language model with versatile abilities[J]. arXiv preprint arXiv:2308.12966, 2023.
[62] LEWIS P, PEREZ E, PIKTUS A, et al. Retrieval-augmented generation for knowledge-intensivenlp tasks[J]. Advances in Neural Information Processing Systems, 2020, 33: 9459-9474.
[63] SUNG M, LEE J, YI S, et al. Can language models be biomedical knowledge bases?[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021: 4723-4734.
[64] TU T, AZIZI S, DRIESS D, et al.Towards generalist biomedical AI[J]. NEJM AI, 2024, 1(3): AIoa2300138.
[65] ZHANG N, CHEN M, BI Z, et al. CBLUE: A Chinese biomedical language understanding evaluation benchmark[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022: 7888-7915.
[66] ZHU W, WANG X, ZHENG H, et al. Prompt-CBLUE: A Chinese prompt tuning benchmark for the medical domain[J]. arXiv preprint arXiv:2310.14151, 2023.
[67] WANG X, CHEN G H, SONG D, et al.CMB: A comprehensive medical benchmark in Chinese[J]. arXiv preprint arXiv:2308.08833, 2023.
[68] RAWTE V, SHETH A, DAS A.A survey of hallucination in large foundation models[J]. arXiv preprint arXiv:2309.05922, 2023.
[69] SHEN T, JIN R, HUANG Y, et al.Large language model alignment: A survey[J]. arXiv preprint arXiv:2309.15025, 2023.

基金

国家自然科学基金(U23A20316);河南省科技攻关项目(232102211041);郑州市协同创新重大专项(20XTZX11020)
PDF(1388 KB)

682

Accesses

0

Citation

Detail

段落导航
相关文章

/