|
|
DQN-based Policy Learning for Open Domain Multi-turn Dialogues |
SONG Haoyu, ZHANG Weinan, LIU Ting |
Research Center of Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China |
|
|
Abstract The open domain dialogue system is challenged by effective multi-turn dialogues. Current neural dialogue generation models tend to fall into conversation black holes by generating safe responses, without considering the future information. Inspired by the global view of reinforcement learning methods, we present an approach to learn multi-turn dialogue policy with DQN (deep Q-network). We introduce a deep neural network to evaluate each candidate sentence and choose the sentence with the maximum future rewards, instead of the highest generation probability, as a response. The results show that our method improves the average dialogue turns by 2 in the automatic eva-luation and outperforms the baseline model by 45% in the human evaluation.
|
Received: 24 November 2017
|
|
|
|
|
[1] Serban I V, Lowe R, Charlin L, et al. A survey of available corpora for building data-driven dialogue systems[J]. arXiv preprint, 2015, arXiv: 1512. 05742. [2] Vinyals O, Le Q. A neural conversational model[J]. arXiv preprint, 2015, arXiv: 1506. 05869. [3] Luan Y, Ji Y, Ostendorf M. LSTM based conversation models[J]. arXiv preprint, 2016, arXiv: 1603. 09457. [4] Yao K, Peng B, Zweig G, et al. An attentional neural conversation model with improved specificity[J]. arXiv preprint, 2016, arXiv: 1606. 01292. [5] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the Advances in neural Information Processing Systems. 2014: 3104-3112. [6] Shang L, Lu Z, Li H. Neural responding machine forshort-text conversation[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, 2015: 1577-1586. [7] Li J, Monroe W, Ritter A, et al. Deep reinforcement learning for dialogue Generation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016: 1192-1202. [8] Williams J D, Raux A, Ramachandran D, et al. The dialog state tracking challenge[C]//Proceedings of SIGDIAL Conference, 2013: 404-413. [9] Henderson M, Thomson B, Williams J. The second dialog state tracking challenge[C]//Proceedings of 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2014: 263. [10] Su P H, Gasic M, Mrksic N, et al. On line active Reward Learning for policy optimisation in spoken dialogue systems[C]//Proceedings of Meeting of the Association for Computational Linguistics, 2016: 2431-2441. [11] Lowe R, Pow N, Serban I, et al. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems[J]. arXiv preprint, 2015, arXiv: 1506. 08909. [12] Pascual B, Gurruchaga M, Ginebra M P, et al. A neural network approach to context-sensitive generation of conversational responses[J]. Transactions of the Royal Society of Tropical Medicine and Hygiene, 2015, 51(6): 502-504. [13] Serban I V, Sordoni A, Bengio Y, et al. Hierarchical neural network generative models for movie dialogues[J]. arXiv preprint, 2015, arXiv: 1507. 04808. [14] Li J, Galley M, Brockett C, et al. A diversity-promoting objective function for neural conversation models[C]//Proceedings of NAACL-HLT, 2016: 110-119. [15] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint, 2013, arXiv: 1312. 5602. [16] Sutton R S, Barto A G. Reinforcement learning: An introduction[J]. IEEE Transactions on Neural Networks, 2005, 16(1): 285-286. [17] Hasselt H V, Guez A, Silver D. Deep reinforcement learning with double q-learning[J]. arXiv preprint, 2015, arXiv: 1509. 06461. [18] Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J]. arXiv preprint, 2015, arXiv: 1511. 05952. [19] Wang Z, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning[J]. arXiv preprint, 2015, arXiv: 1511. 06581. [20] Guo H. Generating text with deep reinforcement learning[J]. arXiv preprint, 2015, arXiv: 1510. 09202. [21] Graves A. Sequence transduction with recurrent neural networks[J]. Computer Science, 2012, 58(3): 235-242. [22] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint, 2014, arXiv: 1409. 0473. [23] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958. |
|
|
|