基于HRED模型的中文多轮对话任务方法研究

王孟宇,俞鼎耀,严睿,胡文鹏,赵东岩

PDF(1201 KB)
PDF(1201 KB)
中文信息学报 ›› 2020, Vol. 34 ›› Issue (8) : 78-85.
问答与对话

基于HRED模型的中文多轮对话任务方法研究

  • 王孟宇,俞鼎耀,严睿,胡文鹏,赵东岩
作者信息 +

Chinese Multi-turn Dialogue Tasks Based on HERD Model

  • WANG Mengyu, YU Dingyao, YAN Rui, HU Wenpeng, ZHAO Dongyan
Author information +
History +

摘要

多轮对话任务是自然语言处理中最具有实用价值的技术之一,该任务要求系统在产生通顺回答语句的同时能够照顾到上下文信息。近年来,出现了一大批以HRED(hierarchical recurrent encoder-decoder)模型为基础的多轮对话模型,其运用多层级的循环神经网络来编码上下文信息,并在Movie-DiC等英文对话数据集上取得了不错的结果。在2018年京东举办的中文多轮对话大赛中,京东向参赛选手公布了一批高质量的真实客服对话语料。该文在此数据上进行实验,针对HRED模型的缺点以及在中文语料下的表现进行改进,提出基于注意力和跨步融合机制与HRED模型结合的方案,实验结果表明,该方案取得了较大的性能提升。

Abstract

Multi-turn dialogue task requires the system to take care of context information while generating fluent answers. Recently, a large number of multi-turn dialogue models based on HRED(Hierarchical Recurrent Encoder-Decoder) model have been developed, reporting good results on some English dialogue datasets such as Movie-DiC. On a high-quality customer service dialogue corpus from real world to contestants released by Jingdong in 2018, this article investigates the performance of HRED model and explores possible improvements. It is revealed that the combination of the attention and ResNet mechanisms with HRED model can achieve significant improvements.

关键词

多轮对话 / 生成式模型 / 自然语言处理

Key words

multi-turn dialogue / generative model / natural language processing

引用本文

导出引用
王孟宇,俞鼎耀,严睿,胡文鹏,赵东岩. 基于HRED模型的中文多轮对话任务方法研究. 中文信息学报. 2020, 34(8): 78-85
WANG Mengyu, YU Dingyao, YAN Rui, HU Wenpeng, ZHAO Dongyan. Chinese Multi-turn Dialogue Tasks Based on HERD Model. Journal of Chinese Information Processing. 2020, 34(8): 78-85

参考文献

[1] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv: 1301.3781, 2013.
[2] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the Advances in Neural Information Processing Systems, 2014: 3104-3112.
[3] Sordoni A, Bengio Y, Vahabi H, et al. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion[C]//Proceedings of the 24th ACM International Conference on Information and Knowledge Management. ACM, 2015: 553-562.
[4] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the Advances in Neural Information Processing Systems, 2017: 5998-6008.
[5] Serban I V, Sordoni A, Bengio Y, et al. Building end-to-end dialogue systems using generative hierarchical neural network models[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016.
[6] Targ S, Almeida D, Lyman K. Resnet in resnet: Generalizing residual architectures[J]. arXiv preprint arXiv: 1603.08029, 2016.
[7] Vinyals O, Le Q. A neural conversational model[J]. arXiv preprint arXiv: 1506.05869, 2015.
[8] Shang L, Lu Z, Li H. Neural responding machine for short-text conversation[J].arXiv preprint arXiv: 1503.02364, 2015.
[9] Serban I V, Sordoni A, Lowe R, et al. A hierarchical latent variable encoder-decoder model for generating dialogues[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017.
[10] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[11] Cho K, Van Merrinboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv: 1406.1078, 2014.
[12] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 1409.0473, 2014.
[13] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[14] Srivastava N, Hinton G,Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[15] Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv: 1412.6980, 2014.
[16] Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2002: 311-318.
[17] Serban I V, Sordoni A, Lowe R, et al. A hierarchical latent variable encoder-decoder model for generating dialogues[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017.

基金

北京市创新项目(20180630)
PDF(1201 KB)

Accesses

Citation

Detail

段落导航
相关文章

/