构建医疗对话系统可以缓和医疗资源紧缺和医疗资源分配不均的现状,在对话系统构建方面,如何结合已获取的知识生成对话语句是重要研究内容之一。Prompt指预先输入到语言模型的一组字符序列或编码,后续的推断从这里开始,从而影响整个语句的内容生成。该文先用医疗领域语料来对预训练语言模型进行精调,以学习医疗语句潜在的语义,然后设计Prompt方案将医疗实体引入对话生成模型,使生成的对话能够携带预设的知识,达到受控对话生成的目的。通过在医疗对话数据集MedDG上的实验验证,该文提出的方案能有效改善医疗对话生成的效果。
Abstract
Prompt refers to a set of characters or an encoding vector that are pre-input to the language model, thereby affecting the content generation. This paper first uses medical corpus to fine-tune the pretrained language model to learn the underlying semantics of medical sentences. Then it proposes a prompt scheme to introduce medical entities into the dialogue generation model, so that the generated dialogue can carry preset knowledge. Validated on the medical dialogue data set MedDG, the scheme proposed in this study can effectively improve the effect of medical dialogue generation.
关键词
医疗对话生成 /
受控文本生成 /
对话系统
{{custom_keyword}} /
Key words
medical dialogue generation /
controlled text generation /
dialogue system
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] CHEN Y,CELIKYILMAZ A, HAKKANI-TUR D. Deep learning for dialogue systems[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, 2017: 8-14.
[2] JOSHI A,KATARIYA N, AMATRIAIN X, et al. Dr. Summarize: Global summarization of medical dialogue by exploiting local structures[C]//Proceedings of the Association for Computational Linguistics:EMNLP 2020, 2020: 3755-3763.
[3] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J/OL].OpenAI Blog, 2019, 1(8): 9.
[4] DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J/OL]. arXiv preprint arXiv:1810.04805, 2019.
[5] STERN M, CHAN W, KIROS J, et al. Insertion transformer: Flexible sequence generation via insertion pperations[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 5976-5985.
[6] ZHANG Y, WANG G, LI C, et al. POINTER: Constrained progressive text generation via insertion-based generative pre-training[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020: 8649-8670.
[7] HSIEH L, LEE Y, LIM E. ENCONTER: Entity constrained progressive sequence generation via insertion-based transformer[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021: 3590-3599.
[8] LI X, LIANG P. Prefix-tuning: Optimizing continuous prompts for generation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021: 4582-4597.
[9] LIU W, TANG J, QIN J, et al. MedDG: A large-scale medical consultation dataset for building medical dialogue system[J/OL]. arXiv preprint arXiv:2010.07497, 2020.
[10] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2001: 311-318.
[11] LI J, GALLEY M, BROCKETT C, et al. A diversity-promoting objective function for neural conversation models[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 110-119.
[12] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014: 3104-3112.
[13] GU J, LU Z, LI H, et al. Incorporating copying mechanism in sequence-to-sequence learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 1631-1640.
[14] DATHATHRI S, MADOTTO A, LAN J, et al. Plug and play language models: A simple approach to controlled text generation[J/OL]. arXiv preprint arXiv:1912.02164, 2020.
[15] RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of Machine Learning Research, 2020, 21(140): 1-67.
[16] YANG K, KLEIN D. FUDGE: controlled text generation with future discriminators[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021: 3511-3535.
[17] SU Y, CAI D, WANG Y, et al. Non-autoregressive text generation with pre-trained language models[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021: 234-243.
[18] ZENG G, YANG W, JU Z, et al. MedDialog: Large-scale medical dialogue datasets[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020: 9241-9250.
[19] LIN S, ZHOU P, LIANG X, et al. Graph-evolving meta-learning for low-resource medical dialogue generation[J/OL]. arXiv preprint arXiv:2012.11988, 2020.
[20] FREITAG M, AL-ONAIZAN Y. Beam search strategies for neural machine translation[C]//Proceedings of the 1st Workshop on Neural Machine Translation, 2017: 56-60.
[21] CUI Y M,CHE W X, LIU T. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
[22] DU Z. GPT2-Chinese: Tools for training GPT2 model in Chinese language[OL]. https://github.com/M-orizeyao/GPT2-Chinese, 2019.
[23] SORDONI A, BENGIO Y, VAHABI H, et al. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge, 2015: 553-562.
[24] LEWIS M, LIU Y,GOYAL N, et al. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 7871-7880.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金青年基金(72001010);中国博士后科学基金第2批特别资助项目(2020TQ0024);中国博士后科学基金(2016M601435,2020M670105);北京理工大学青年教师学术启动计划
{{custom_fund}}