任务导向对话系统的自然语言理解,其目的就是解析用户以自然语言形式输入的语句,并提取出可以被计算机所理解的结构化信息,其包含意图识别和槽填充两个子任务。BERT是近期提出来的一种自然语言处理预训练模型,已有研究者提出基于BERT的任务导向对话系统自然语言理解模型。在此基础上,该文提出一种改进的自然语言理解模型,其编码器使用BERT,而解码器基于LSTM与注意力机制构建。同时,该文提出了该模型的两种调优方法: 锁定模型参数的训练方法、使用区分大小写的预训练模型版本。在基线模型与改进模型上,这些调优方法均能够显著改进模型的性能。实验结果显示,利用改进后的模型与调优方法,可以分别在ATIS和Snips两个数据集上得到0.883 3和0.925 1的句子级准确率。
Abstract
The purpose of natural language understanding in task-oriented dialog system is to parse sentences entered by the user in natural language, extracting structured information for subsequent processing. This paper proposes an improved natural language understanding model, using BERT as encoder, while the decoder is built with LSTM and attention mechanism. Furthermore, this paper proposes two tuning techniques on this model: training method with fixed model parameters, and using case-sensitive version of pretrained model. Experiments show that the improved model and tuning techniques can bring 0.8833 and 0.9251 sentence level accuracy on ATIS and Snips datasets, respectively.
关键词
任务导向对话系统 /
自然语言理解 /
BERT
{{custom_keyword}} /
Key words
task-oriented dialog system /
natural language understanding /
BERT
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Devlin J, Chang M, Lee K, et al. BERT: pretraining of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, MN, USA. 2019: 4171-4186.
[2] Tür G, Hakkani Tür D, Heck L P. What is left to be understood in ATIS?[C]//Proceedings of the 2010 IEEE Spoken Language Technology Workshop. Berkeley, California, USA. 2010: 19-24.
[3] Coucke A, Saade A, Ball A, et al. Snips voice platform: An embedded spoken language understanding system for private-by-design voice interfaces[J]. arXiv preprint arXiv:1805.10190, 2018.
[4] Mesnil G, He X, Deng L, et al. Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding[C]//Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France. 2013: 3771-3775.
[5] Yao K, Zweig G, Hwang M, et al. Recurrent neural networks for language understanding[C]//Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France. 2013: 2524-2528.
[6] Kim J, Tür G, elikyilmaz A, et al. Intent detection using semantically enriched word embeddings[C]//Proceedings of the 2016 IEEE Spoken Language Technology Workshop. San Diego, CA, USA. 2016: 414-419.
[7] Yao K, Peng B, Zhang Y, et al. Spoken language understanding using long short-term memory neural networks[C]//Proceedings of the 2014 IEEE Spoken Language Technology Workshop. South Lake Tahoe, NV, USA. 2014: 189-194.
[8] Mensio M, Rizzo G, Morisio M. Multi-turn QA: a RNN contextual approach to intent classification for goal-oriented systems[C]//Companion of the Web Conference, WWW’2018. Lyon, France. 2018: 1075-1080.
[9] Deoras A, Sarikaya R. Deep belief network based semantic taggers for spoken language understanding[C]//Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France. 2013: 2713-2717.
[10] Deng L, Tür G, He X, et al. Use of kernel deep convex networks and end-to-end learning for spoken language understanding[C]//Proceedings of the 2012 IEEE Spoken Language Technology Workshop. Miami, FL, USA. 2012: 210-215.
[11] Xu P, Sarikaya R. Convolutional neural network based triangular CRF for joint intent detection and slot filling[C]//Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. Olomouc, Czech Republic. 2013: 78-83.
[12] Jeong M, Lee G G. Triangular-chain conditional random fields[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(7): 1287-1302.
[13] Zhang X, Wang H. A joint model of intent determination and slot filling for spoken language understanding[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence. New York, NY, USA. 2016: 2993-2999.
[14] Liu B, Lane I. Attention-based recurrent neural network models for joint intent detection and slot filling[C]//Proceedings of the 17th Annual Conference of the International Speech Communication Association. San Francisco, CA, USA. 2016: 685-689.
[15] Schumann R, Angkititrakul P. Incorporating ASR errors with attention-based, jointly trained RNN for intent detection and slot filling[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Calgary, AB, Canada. 2018: 6059-6063.
[16] Zhang H, Zhu S, Fan S, et al. Joint spoken language understanding and domain adaptive language modeling[C]//Proceedings of the Lecture Notes in Computer Science, Vol 11266: Intelligence Science and Big Data Engineering-8th International Conference. Lanzhou, China. 2018: 311-324.
[17] Goo C, Gao G, Hsu Y, et al. Slot-gated modeling for joint slot filling and intent prediction[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, Louisiana, USA. 2018: 753-757.
[18] Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, Louisiana, USA. 2018: 2227-2237.
[19] Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[EB/OL]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf. 2018.
[20] Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[EB/OL]. https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf. 2019.
[21] Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
[22] Adhikari A, Ram A, Tang R, et al. DocBERT: BERT for document classification[J]. arXiv preprint arXiv:1904.08398, 2019.
[23] Nogueira R, Cho K. Passage re-ranking with BERT[J]. arXiv preprint arXiv:1901.04085, 2019.
[24] Chen Q, Zhuo Z, Wang W. BERT for joint intent classification and slot filling[J]. arXiv preprint arXiv:1902.10909, 2019.
[25] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems : Annual Conference on Neural Information Processing Systems 2017. 2017: 5998-6008.
[26] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[27] Gers F A, Schmidhuber J, Cummins F A. Learning to forget: continual prediction with LSTM[J]. Neural Computation, 2000, 12(10): 2451-2471.
[28] Greff K, Srivastava R K, Koutník J, et al. LSTM: a search space odyssey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 28(10): 2222-2232.
[29] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the 3rd International Conference on Learning Representations, 2015.
[30] Srivastava N, Hinton G E, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[31] Kingma D P, Ba J. Adam: a method for stochastic optimization[C]//Proceedings of the 3rd International Conference on Learning Representations, 2015.
[32] Hakkani-Tür D, Tür G, elikyilmaz A, et al. Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM[C]//Proceedings of the 17th Annual Conference of the International Speech Communication Association. San Francisco, CA, USA. 2016: 715-719.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61672081,U1636211,61370126,61602025);国家重点研发计划(2016QY04W0802);软件开发环境国家重点实验室课题(SKLSDE-2019ZX-17);北京成像理论与技术高精尖创新中心项目(BAICIT-2016001)
{{custom_fund}}