Adversarial Learning for Neural Dialogue Generation.
this paper, drawing intuition from the Turing test, we propose using adversarial training for open-domain dialogue generation: the system is trained to produce sequences that are indistinguishable from human-generated dialogue utterances. We cast the task as a reinforcement learning (RL) problem where we jointly train two systems, a gene...更多
下载 PDF 全文
- Open domain dialogue generation (Ritter et al, 2011; Sordoni et al, 2015; Xu et al, 2016; Wen et al, 2016; Li et al, 2016b; Serban et al, 2016c, 2017) aims at generating meaningful and coherent dialogue responses given the dialogue history.
- E.g., phrase-based machine translation systems (Ritter et al, 2011; Sordoni et al, 2015) or end-to-end neural systems (Shang et al., 2015; Vinyals and Le, 2015; Li et al, 2016a; Yao et al, 2015; Luan et al, 2016) approximate such a goal by predicting the dialogue utterance given the dialogue history using the maximum likelihood estimation (MLE) objective
- Despite its success, this over-simplified training objective leads to problems: responses are dull, generic (Sordoni et al, 2015; Serban et al, 2016a; Li et al, 2016a), repetitive, and short-sighted (Li et al, 2016d).
- It is widely acknowledged that manually defined reward functions can’t possibly cover all crucial aspects and can lead to suboptimal generated utterances
- Open domain dialogue generation (Ritter et al, 2011; Sordoni et al, 2015; Xu et al, 2016; Wen et al, 2016; Li et al, 2016b; Serban et al, 2016c, 2017) aims at generating meaningful and coherent dialogue responses given the dialogue history
- For the rest of this section, we report results obtained by the Hierarchical Neural setting due to its end-to-end nature, despite its inferiority to SVM+Neural+multil-features
- We find that MMI+p(t|s) is better than maximum likelihood estimation-greedy, which is in turn better than maximum likelihood estimation+BS
- In this paper, drawing intuitions from the Turing test, we propose using an adversarial training approach for response generation
- We cast the model in the framework of reinforcement learning and train a generator based on the signal from a discriminator to generate response sequences indistinguishable from human-generated dialogues
- In preliminary experiments applying the same training paradigm to machine translation, we did not observe a clear performance boost. We conjecture that this is because the adversarial training strategy is more beneficial to tasks in which there is a big discrepancy between the distributions of the generated sequences and the reference target sequences
- The authors detail experimental results on adversarial success and human evaluation.
- For the rest of this section, the authors report results obtained by the Hierarchical Neural setting due to its end-to-end nature, despite its inferiority to SVM+Neural+multil-features.
- What first stands out is decoding using sampling, achieving a significantly higher AdverSuc number than all the rest models
- This does not indicate the superiority of the sampling decoding model, since the machine-vs-random accuracy is at the same time significantly lower.
- Conclusion and Future
In this paper, drawing intuitions from the Turing test, the authors propose using an adversarial training approach for response generation.
- In preliminary experiments applying the same training paradigm to machine translation, the authors did not observe a clear performance boost.
- The authors conjecture that this is because the adversarial training strategy is more beneficial to tasks in which there is a big discrepancy between the distributions of the generated sequences and the reference target sequences.
- Exploring this relationship further is a focus of the future work
- Table1: Sampled responses from different models. More in Appendix Tables 5 and 6
- Table2: ERE scores obtained by different models
- Table3: AdverSuc and machine-vs-random scores achieved by different training/decoding strategies
- Table4: The gain from the proposed adversarial model over the mutual information system based on pairwise human judgments
- Table5: Appendix: Responses sampled from different models. 2168
- Table6: Appendix: More responses sampled from different models
- Dialogue generation Response generation for dialogue can be viewed as a source-to-target transduction problem. Ritter et al (2011) frame the generation problem as a machine translation problem. Sordoni et al (2015) improved Ritter et al.’s system by rescoring the outputs of a phrasal MT-based conversation system with a neural model incorporating prior context. Recent progress in SEQ2SEQ models have inspired several efforts (Vinyals and Le, 2015; Serban et al, 2016a,d; Luan et al, 2016) to build end-to-end conversational systems that first apply an encoder to map a message to a distributed vector representing its meaning and then generate a response from the vector.
Our work adapts the encoder-decoder model to RL training, and can thus be viewed as an extension of Li et al (2016d), but with more general RL rewards. Li et al (2016d) simulate dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, coherence, and ease of answering. Our work is also related to recent efforts to integrate the SEQ2SEQ and reinforcement learning paradigms, drawing on the advantages of both (Wen et al, 2016). For example, Su et al (2016) combine reinforcement learning with neural generation on tasks with real users. Asghar et al (2016) train an end-to-end RL dialogue model using human users.
- Jiwei Li is supported by a Facebook Fellowship, which we gratefully acknowledge
- This work is also partially supported by the NSF under award IIS-1514268, and the DARPA Communicating with Computers (CwC) program under ARO prime contract no
- V. M. Aleksandrov, V. I. Sysoyev, and V. V. Shemeneva. 1968. Stochastic optimization. Engineering Cybernetics 5:11–16.
- Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, and Yoshua Bengio. 2016. Professor forcing: A new algorithm for training recurrent networks. In Advances In Neural Information Processing Systems. pages 4601–4609.
- Nabiha Asghar, Pasca Poupart, Jiang Xin, and Hang Li. 2016. Online sequence-to-sequence reinforcement learning for open-domain conversational agents. arXiv preprint arXiv:1612.03929.
- Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A diversity-promoting objective function for neural conversation models. In Proc. of NAACL-HLT.
- Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. An actor-critic algorithm for sequence prediction. ICLR.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proc. of ICLR.
- Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. CoNLL.
- Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016a. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances In Neural Information Processing Systems. pages 2172–2180.
- Xilun Chen, Ben Athiwaratkun, Yu Sun, Kilian Weinberger, and Claire Cardie. 2016b. Adversarial deep averaging networks for cross-lingual sentiment classification. arXiv preprint arXiv:1606.01614.
- Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 2016b. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany, pages 994–1003. http://www.aclweb.org/anthology/P16-1094.
- Jiwei Li, Minh-Thang Luong, and Dan Jurafsky. 2015. A hierarchical neural autoencoder for paragraphs and documents. ACL.
- Jiwei Li, Will Monroe, and Dan Jurafsky. 2016c. A simple, fast diverse decoding algorithm for neural generation. arXiv preprint arXiv:1611.08562.
- Jiwei Li, Will Monroe, Alan Ritter, and Dan Jurafsky. 2016d. Deep reinforcement learning for dialogue generation. EMNLP.
- Chia-Wei Liu, Ryan Lowe, Iulian V Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. EMNLP.
- Emily L Denton, Soumith Chintala, Rob Fergus, et al. 20Deep generative image models using a? laplacian pyramid of adversarial networks. In Advances in neural information processing systems. pages 1486–1494.
- Peter W Glynn. 1990. Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM 33(10):75–84.
- Ryan Lowe, Michael Noseworthy, Iulian Serban, Nicolas Angelard-Gontier, Yoshua Bengio, and Joelle Pineau. 20Towards an automatic turing test: Learning to evaluate dialogue responses. ACL.
- Ryan Lowe, Iulian V Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. On the evaluation of dialogue systems with next utterance classification. SIGDIAL.
- Yi Luan, Yangfeng Ji, and Mari Ostendorf. 2016. LSTM based conversation models. arXiv preprint arXiv:1603.09457.
- Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attentionbased neural machine translation. ACL.
- Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
- Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence level training with recurrent neural networks. ICLR.
- Alan Ritter, Colin Cherry, and William B Dolan. 2011. Data-driven response generation in social media. In Proceedings of EMNLP 2011. pages 583–593.
- Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. In Advances in Neural Information Processing Systems. pages 2226–2234.
- Iulian V Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016a. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of AAAI.
- Iulian V Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016b. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI-16).
- Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bowen Zhou, Yoshua Bengio, and Aaron Courville. 2016c. Multiresolution recurrent neural networks: An application to dialogue response generation. arXiv preprint arXiv:1606.00776.
- Iulian Vlad Serban, Ryan Lowe, Laurent Charlin, and Joelle Pineau. 2016d. Generative deep neural networks for dialogue: A short review.
- Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues. AAAI.
- Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural responding machine for short-text conversation. In Proceedings of ACL-IJCNLP. pages 1577–1586.
- Louis Shao, Stephan Gouws, Denny Britz, Anna Goldie, Brian Strope, and Ray Kurzweil. 2017. Generating long and diverse responses with neural conversational models. ICLR.
- Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Minimum risk training for neural machine translation. ACL.
- David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489.
- Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Meg Mitchell, JianYun Nie, Jianfeng Gao, and Bill Dolan. 2015. A neural network approach to context-sensitive generation of conversational responses. In Proceedings of NAACL-HLT.
- Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina RojasBarahona, Stefan Ultes, David Vandyke, TsungHsien Wen, and Steve Young. 2016. Continuously learning neural dialogue management. arxiv.
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. pages 3104–3112.
- Alan M Turing. 1950. Computing machinery and intelligence. Mind 59(236):433–460.
- Oriol Vinyals and Quoc Le. 2015. A neural conversational model. In Proceedings of ICML Deep Learning Workshop.
- Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, and Steve Young. 2016. A networkbased end-to-end trainable task-oriented dialogue system. arXiv preprint arXiv:1604.04562.
- Ronald J Williams. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229–256.
- Sam Wiseman and Alexander M Rush. 2016. Sequence-to-sequence learning as beam-search optimization. ACL.
- Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, and Xiaolong Wang. 2016. Incorporating loose-structured knowledge into LSTM with recall gate for conversation modeling. arXiv preprint arXiv:1605.05110.
- Kaisheng Yao, Geoffrey Zweig, and Baolin Peng. 2015. Attention with intention for a neural network conversation model. In NIPS workshop on Machine Learning for Spoken Language Understanding and Interaction.
- Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2016a. Seqgan: sequence generative adversarial nets with policy gradient. arXiv preprint arXiv:1609.05473.
- Zhou Yu, Ziyu Xu, Alan W Black, and Alex I Rudnicky. 2016b. Strategy and policy learning for nontask-oriented conversational systems. In 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. page 404.
- 2017. Aspect-augmented adversarial networks for domain adaptation. arXiv preprint arXiv:1701.00188.