面向任务型的对话系统研究进展

杨帆,饶元,丁毅,贺王卜,丁紫凡

PDF(8637 KB)
PDF(8637 KB)
中文信息学报 ›› 2021, Vol. 35 ›› Issue (10) : 1-20.
综述

面向任务型的对话系统研究进展

  • 杨帆1,2,饶元1,2,3,丁毅1,2,贺王卜1,2,丁紫凡1,2
作者信息 +

Progress in Task-oriented Dialogue System

  • YANG Fan1,2, RAO Yuan1,2,3, DING Yi1,2, HE Wangbo1,2, DING Zifang1,2
Author information +
History +

摘要

基于人工智能技术的人机对话系统在人机交互、智能助手、智能客服、问答咨询等多个领域应用日益广泛,这极大地促进了自然语言理解及生成、对话状态追踪和端到端的深度学习模型构建等相关理论与技术的发展,并成为目前工业界与学术界共同关注的研究热点之一。该文聚焦特定场景下的任务型对话系统,在对其基本概念进行形式化定义的基础上,围绕着以最少的对话轮次来获得最佳用户需求相匹配的对话内容为目标,针对目前存在的复杂业务场景下基于自然语言的用户意图的准确理解和识别、针对训练数据的标注依赖及模型结果的可解释性不足,以及多模态条件下对话内容的个性化生成这三个重大的技术问题和挑战,对当前的技术与研究进展进行系统地对比分析和综述,为进一步的研究工作奠定基础。同时,对新一代的面向任务型的人机对话系统未来的关键研究方向与任务进行总结。

Abstract

Recently, the artificial intelligence-based dialogue system has been widely applied in, human-computer interaction, intelligent assistant, smart customer service, Q&A consulting, and so on. This paper proposes a definition about the task-oriented dialogue system, which is to satisfy the user’s requirements and certain tasks with the least response turns in dialogue between human and machine. Furthermore, three critical technical problems and challenges are summarized: the user’s intent detection in the complex context, the limitation annotated data, and the personalized response under the multi-modal situation. The research progress in these three challenges are discussed in the paper. Finally, we outline the research directions in the future and the key issues in the next generation of task-oriented dialogue system.

关键词

面向任务型的对话系统 / 自然语言处理 / 人工智能 / 深度学习 / 人机对话系统

Key words

task-oriented dialogue system / natural language processing / artificial intelligence / deep learning / man-machine dialogue system

引用本文

导出引用
杨帆,饶元,丁毅,贺王卜,丁紫凡. 面向任务型的对话系统研究进展. 中文信息学报. 2021, 35(10): 1-20
YANG Fan, RAO Yuan, DING Yi, HE Wangbo, DING Zifang. Progress in Task-oriented Dialogue System. Journal of Chinese Information Processing. 2021, 35(10): 1-20

参考文献

[1] 车万翔,张伟男. 人机对话系统综述[J].人工智能, 2018(01):76-82.
[2] Eric M, Krishnan L, Charette F, et al. Key-value retrieval networks for task-oriented dialogue[C]//Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, 2017: 37-49.
[3] Madotto A, Wu C S, Fung P. Mem2Seq: Effectively incorporating knowledge bases into end-to-end task-oriented dialog systems[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 1468-1478.
[4] Kim S, D’Haro L F, Banchs R E, et al. The fourth dialog state tracking challenge[M]. Dialogues with Social Robots. Springer, Singapore, 2017: 435-449.
[5] Coucke A, Saade A, Ball A, et al. Snips voice platform: An embedded spoken language understanding system for private-by-design voice interfaces[J]. arXiv preprint arXiv:1805.10190, 2018.
[6] Zhang X, Wang H. A joint model of intent determination and slot filling for spoken language understanding[C]//Proceedings of the IJCAI. 2016, 16: 2993-2999.
[7] Liu B, Lane I. Attention-based recurrent neural network models for joint intent detection and slot filling[J]. arXiv preprint arXiv:1609.01454, 2016.
[8] Goo C W, Gao G, Hsu Y K, et al. Slot-gated modeling for joint slot filling and intent prediction[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 753-757.
[9] Li C, Li L, Qi J. A self-attentive model with gate mechanism for spoken language understanding[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 3824-3833.
[10] Wang Y, Shen Y, Jin H. Abi-model based RNN semantic frame parsing model for intent detection and slot filling[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 309-314.
[11] Haihong E, Niu P, Chen Z, et al. A Novel bi-directional interrelated model for joint intent detection and slot filling [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5467-5471.
[12] Zhang C, Li Y, Du N, et al. Joint slot filling and intent detection via capsule neural networks[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5259-5267.
[13] Chen S, Yu S. Wais: Word attention for joint intent detection and slot filling[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(01): 9927-9928.
[14] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[15] Chen Q, Zhuo Z, Wang W. BERT for joint intent classification and slot filling[J]. arXiv preprint arXiv:1902.10909, 2019.
[16] Firdaus M, Kumar A, Ekbal A, et al. A multi-task hierarchical approach for intent detection and slot filling[J]. Knowledge-Based Systems, 2019,183: 104846.
[17] Shi Y, Yao K, Chen H, et al. Contextual spoken language understanding using recurrent neural networks[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2015: 5271-5275.
[18] Chen Y N, Hakkani Tür D, Tür G, et al. End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding[J].Interspeech, 2016: 3245-3249.
[19] Bapna A, Tür G, Hakkani Tür D, et al. Sequentialdialogue context modeling for spoken language understanding[C]//Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, 2017: 103-114.
[20] Chen P C, Chi T C, Su S Y, et al. Dynamic time-aware attention to speaker roles and contexts for spoken language understanding[C]//Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, 2017: 554-560.
[21] Su S Y, Yuan P C, Chen Y N. How time matters: Learning time-decay attention for contextual spoken language understanding in dialogues[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 2133-2142.
[22] Kim J, Lee J H. Decay-function-free time-aware attention to context and speaker Indicator for spoken language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 3718-3726.
[23] Williams J D, Henderson M, Raux A, et al. The dialog state tracking challenge series[J]. AI Magazine, 2014, 35(4): 121-124.
[24] Wen T H, Vandyke D, Mrkic' N, et al. A network-basedend-to-end trainable task-oriented dialogue system[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017: 438-449.
[25] Budzianowski P, Wen T H, Tseng B H, et al. MultiWOZ-A large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modeling[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 5016-5026.
[26] Mrkic' N, Vulic' I. Fully statistical neural belief tracking[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 108-113.
[27] Mrkic' N, Séaghdha D , Wen T H, et al. Neuralbelief tracker: Data-driven dialogue state tracking[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1777-1788.
[28] Zhong V, Xiong C, Socher R. Global-locally self-attentive encoder for dialogue state tracking[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 1458-1467.
[29] Sharma S, Choubey P K, Huang R. Improving dialogue state tracking by discerning the relevant context[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 576-581.
[30] Nouri E, Hosseini Asl E. Toward scalable neural dialogue state tracking model[J]. arXiv preprint arXiv:1812.00899, 2018.
[31] Ren L, Xie K, Chen L, et al. Towards universal dialogue state tracking[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 2780-2786.
[32] Lee H, Lee J, Kim T Y. SUMBT: Slot-utterance matching for universal and scalable belief tracking[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5478-5483.
[33] Goel R, Paul S, Hakkani Tür D. Hyst: A hybrid approach for flexible and accurate dialogue state tracking[J]. arXiv preprint arXiv:1907.00883, 2019.
[34] Gao S, Sethi A, Agarwal S, et al. Dialog state tracking: A neural reading comprehension approach[C]//Proceedings of the 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2019: 264.
[35] Wu C S, Madotto A, Hosseini Asl E, et al. Transferable multi-domain state generator for task-oriented dialogue systems[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 808-819.
[36] See A, Liu P J, Manning C D. Get to the point: Summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1073-1083.
[37] Kirkpatrick J, Pascanu R, Rabinowitz N, et al. Over-coming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2017, 114(13): 3521-3526.
[38] Lopez Paz D, Ranzato M A. Gradient episodic memory for continual learning[C]//Proceedings of Advances in Neural Information Processing Systems, 2017, 30: 6467-6476.
[39] Ren L, Ni J, McAuley J. Scalableand accurate dialogue state tracking via hierarchical sequence generation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 1876-1885.
[40] Cho K, van Merrinboer B, Gulcehre C, et al. Learning phrase representations using RNN Encoder-Decoder for statistical machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 1724-1734.
[41] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[C]//Proceedings of Advances in Neural Information Processing Systems, 2014: 3104-3112.
[42] Sukhbaatar S, Szlam A, Weston J, et al. End-to-end memory networks[C]//Proceedings of Advances in Neural Information Processing Systems, 2015: 2440-2448.
[43] Chen X, Xu J, Xu B. A working memory model for task-oriented dialog response generation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2687-2693.
[44] Wen H, Liu Y, Che W, et al. Sequence-to-sequence learning for task-oriented dialogue with dialogue state representation[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 3781-3792.
[45] Wen T H, Miao Y, Blunsom P, et al. Latent intention dialogue models[C]//Proceedings of the 34th International Conference on Machine Learning, 2017: 3732-3741.
[46] Qin L, Liu Y, Che W, et al. Entity-consistent end-to-end task-oriented dialogue system with KB retriever[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 133-142.
[47] Mintz M, Bills S, Snow R, et al. Distant supervision for relation extraction without labeled data[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 1003-1011.
[48] Banerjee S, Khapra M M. Graph convolutional network with sequential attention for goal-oriented dialogue systems[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 485-500.
[49] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv:1409.0473, 2014.
[50] Marcheggiani D, Titov I. Encoding sentences with graph convolutional networks for semantic role labeling[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 1506-1515.
[51] Lei W, Jin X, Kan M Y, et al. Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 1437-1447.
[52] Gu J, Lu Z, Li H, et al. Incorporating copying mechanism in sequence-to-sequence learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 1631-1640.
[53] Shu L, Molino P, Namazifar M, et al. Flexibly-structured model for task-oriented dialogues[C]//Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, 2019: 178-187.
[54] Bordes A, Boureau Y L, Weston J. Learning end-to-end goal-oriented dialog[J]. arXiv preprint arXiv:1605.07683, 2016.
[55] Papineni K, Roukos S, Ward T, et al. BlEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40thAnnual Meeting of the Association for Computational Linguistics, 2002: 311-318.
[56] Eric M, Manning C D. A copy-augmented sequence-to-sequence architecture gives good performance on task-oriented dialogue[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017: 468-473.
[57] Yih W, Richardson M, Meek C, et al. The value of semantic parse labeling for knowledge base question answering[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 201-206.
[58] Bordes A, Usunier N, Chopra S, et al. Large-scale simple question answering with memory networks[J]. arXiv preprint arXiv:1506.02075, 2015.
[59] Yu M, Yin W, Hasan K S, et al. Improved neural relation detection for knowledge base question answering[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 571-581.
[60] Zhang H, Xu G, Liang X, et al. An attention-based word-level interaction model: Relation detection for knowledge base question answering[J]. arXiv preprint arXiv:1801.09893, 2018.
[61] Yu Y, Hasan K S, Yu M, et al. Knowledge base relation detection via multi-view matching[C]//Proceedings of European Conference on Advances in Databases and Information Systems. Springer, Cham, 2018: 286-294.
[62] Zhou M, Huang M, Zhu X. An Interpretable reasoning network for multi-relation Question answering[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 2010-2022.
[63] Zhang Y, Dai H, Kozareva Z, et al. Variational reasoning for question answering with knowledge graph[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018.
[64] Zhang L, Winn J, Tomioka R. Gaussian attention model and its application to knowledge base embedding and question answering[J]. arXiv preprint arXiv:1611.02266, 2016.
[65] Lan Y, Wang S, Jiang J. Multi-hop knowledge base question answering with an iterative sequence matching model[C]//Proceedings of the IEEE International Conference on Data Mining. IEEE, 2019: 359-368.
[66] Wu P, Huang S, Weng R, et al. Learning representation mapping for relation detection in knowledge base question answering[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 6130-6139.
[67] Xiong W, Yu M, Chang S, et al. Improving question answering over incomplete KBs with knowledge-aware reader[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 4258-4264.
[68] Miller A, Fisch A, Dodge J, et al. Key-value memory networks for directly reading documents[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016: 1400-1409.
[69] Sun H, Dhingra B, Zaheer M, et al. Open domain question answering using early fusion of knowledge bases and text[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 4231-4242.
[70] Matsuda Y, Fedotov D, Takahashi Y, et al. Estimating user satisfaction impact in cities using physical reaction sensing and multimodal dialogue system[C]//Proceedings of the 9th International Workshop on Spoken Dialogue System Technology, 2019: 177-183.
[71] You Q, Luo J, Jin H, et al. Building a large scale dataset for image emotion recognition: The fine print and the benchmark[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2016.
[72] Yang J, She D, Sun M. Joint image emotion classification and distribution learning via deep convolutional neural network[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017: 3266-3272.
[73] Zhu X, Li L, Zhang W, et al. Dependency exploitation: A unified CNN-RNN approach for visual emotion recognition[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017: 3595-3601.
[74] Yang J, She D, Sun M, et al. Visual sentiment prediction based on automatic discovery of affective regions[J]. IEEE Transactions on Multimedia, 2018, 20(9): 2513-2525.
[75] Gumelar A B, Kurniawan A, Sooai A G, et al. Human voice emotion identification using prosodic and spectral feature extraction based on deep neural networks[C]//Proceedings of the IEEE 7th International Conference on Serious Games and Applications for Health, 2019: 1-8.
[76] Tao F, Liu G, Zhao Q. An ensemble framework of voice-based emotion recognition system for films and TV programs[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2018: 6209-6213.
[77] Li Y, Tao J, Schuller B, et al. MEC 2016:The multimodal emotion recognition challenge of CCPR 2016[C]//Proceedings of Chinese Conference on Pattern Recognition. Springer, Singapore, 2016: 667-678.
[78] Xie Y, Liang R, Liang Z, et al. Attention-based dense LSTM for speech emotion recognition[J]. IEICE Transactions on Information and Systems, 2019, 102(7): 1426-1429.
[79] Martin O, Kotsia I, Macq B, et al. The eNTERFACE'05 audio-visual emotion database[C]//Proceedings of the 22nd International Conference on Data Engineering Workshops, 2006: 8-8.
[80] Busso C, Bulut M, Lee C C, et al. IEMOCAP: Interactive emotional dyadic motion capture database[J]. Language Resources and Evaluation, 2008, 42(4): 335-359.
[81] Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015: 1412-1421.
[82] Malinowski M, Fritz M. A multi-world approach to question answering about real-world scenes based on uncertain input[C]//Proceedings of Advances in Neural Information Processing Systems, 2014, 27: 1682-1690.
[83] Jiang Y, Natarajan V, Chen X, et al. Pythia v 0.1: The winning entry to the vqa challenge 2018[J]. arXiv preprint arXiv:1807.09956, 2018.
[84] Singh A, Natarajan V, Jiang Y, et al. Pythia-a platform for vision & language research[C]//Proceedings of NeurIPS, SysML Workshop, 2018.
[85] Singh A, Natarajan V, Shah M, et al. Towards vqa models that can read[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 8317-8326.
[86] Zhan F, Lu S. Esir: End-to-end scene text recognition via iterative image rectification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2059-2068.
[87] Baek J, Kim G, Lee J, et al. What is wrong with scene text recognition model comparisons? dataset and model analysis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 4715-4723.
[88] Alamri H, Cartillier V, Lopes R G, et al. Audio visual scene-aware dialog (avsd) challenge at dstc7[J]. arXiv preprint arXiv:1806.00525, 2018.
[89] Yeh Y T, Lin T C, Cheng H H, et al. Reactive multi-stage feature fusion for multimodal dialogue modeling[J]. arXiv preprint arXiv:1908.05067, 2019.
[90] Zhou H, Huang M, Zhang T, et al. Emotional chatting machine: Emotional conversation generation with internal and external memory[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018.
[91] Song Z, Zheng X, Liu L, et al. Generating responses with a specific emotion in dialog[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3685-3695.
[92] Colombo P, Witon W, Modi A, et al. Affect-driven dialog generation[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 3734-3743.
[93] Danescu-Niculescu-Mizil C, Lee L. Chameleons in imagined conversations:A new approach to understanding coordination of linguistic style in dialogs[C]//Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, 2011: 76-87.
[94] Li J, Galley M, Brockett C, et al. A diversity-promoting objective function for neural conversation models[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016: 110-119.
[95] Lubis N, Sakti S, Yoshino K, et al. Eliciting positive emotion through affect-sensitive dialogue response generation: A neural network approach[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
[96] Li Y, Su H, Shen X, et al. DailyDialog: Amanually labeled multi-turn dialogue dataset[C]//Proceedings of the 8th International Joint Conference on Natural Language Processing, 2017: 986-995.
[97] Chan Z, Li J, Yang X, et al. Modeling personalization in continuous space for response generation via augmented Wasserstein autoencoders[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 1931-1940.
[98] Joshi C K, Mi F, Faltings B. Personalization in goal-oriented dialog[J]. arXiv preprint arXiv:1706.07503, 2017.
[99] Luo L, Huang W, Zeng Q, et al. Learning personalized end-to-end goal-oriented dialog[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(01): 6794-6801.
[100] Zhang B, Xu X, Li X, et al. Learning personalized end-to-end task-oriented dialogue generation[C]//Proceedings of CCF International Conference on Natural Language Processing and Chinese Computing. Springer, Cham, 2019: 55-66.

基金

科技部重点研发计划项目(2019YFB2102300);深圳市科技创新项目(JCYJ20180306170836595);四维图新-西安市智能时空数据分析工程实验室联合项目(C2020103);教育部社会科学重大项目(18JZD022);中央高校建设世界一流大学(学科)和特色发展引导专项资金项目(PY3A022);中央高校基本科研业务费西安交通大学重点项目(zdyf2017006);教育部“云数融合”基金(2017B00030)
PDF(8637 KB)

3637

Accesses

0

Citation

Detail

段落导航
相关文章

/