Adversarial Learning for Neural Dialogue Generation.

EMNLP, (2017)

被引用766|浏览603
EI
下载 PDF 全文
引用
微博一下

摘要

this paper, drawing intuition from the Turing test, we propose using adversarial training for open-domain dialogue generation: the system is trained to produce sequences that are indistinguishable from human-generated dialogue utterances. We cast the task as a reinforcement learning (RL) problem where we jointly train two systems, a gene...更多

代码

数据

简介
重点内容
  • Open domain dialogue generation (Ritter et al, 2011; Sordoni et al, 2015; Xu et al, 2016; Wen et al, 2016; Li et al, 2016b; Serban et al, 2016c, 2017) aims at generating meaningful and coherent dialogue responses given the dialogue history
  • For the rest of this section, we report results obtained by the Hierarchical Neural setting due to its end-to-end nature, despite its inferiority to SVM+Neural+multil-features
  • We find that MMI+p(t|s) is better than maximum likelihood estimation-greedy, which is in turn better than maximum likelihood estimation+BS
  • In this paper, drawing intuitions from the Turing test, we propose using an adversarial training approach for response generation
  • We cast the model in the framework of reinforcement learning and train a generator based on the signal from a discriminator to generate response sequences indistinguishable from human-generated dialogues
  • In preliminary experiments applying the same training paradigm to machine translation, we did not observe a clear performance boost. We conjecture that this is because the adversarial training strategy is more beneficial to tasks in which there is a big discrepancy between the distributions of the generated sequences and the reference target sequences
结果
  • The authors detail experimental results on adversarial success and human evaluation.
  • For the rest of this section, the authors report results obtained by the Hierarchical Neural setting due to its end-to-end nature, despite its inferiority to SVM+Neural+multil-features.
  • What first stands out is decoding using sampling, achieving a significantly higher AdverSuc number than all the rest models
  • This does not indicate the superiority of the sampling decoding model, since the machine-vs-random accuracy is at the same time significantly lower.
结论
  • Conclusion and Future

    Work

    In this paper, drawing intuitions from the Turing test, the authors propose using an adversarial training approach for response generation.
  • In preliminary experiments applying the same training paradigm to machine translation, the authors did not observe a clear performance boost.
  • The authors conjecture that this is because the adversarial training strategy is more beneficial to tasks in which there is a big discrepancy between the distributions of the generated sequences and the reference target sequences.
  • Exploring this relationship further is a focus of the future work
表格
  • Table1: Sampled responses from different models. More in Appendix Tables 5 and 6
  • Table2: ERE scores obtained by different models
  • Table3: AdverSuc and machine-vs-random scores achieved by different training/decoding strategies
  • Table4: The gain from the proposed adversarial model over the mutual information system based on pairwise human judgments
  • Table5: Appendix: Responses sampled from different models. 2168
  • Table6: Appendix: More responses sampled from different models
Download tables as Excel
相关工作
  • Dialogue generation Response generation for dialogue can be viewed as a source-to-target transduction problem. Ritter et al (2011) frame the generation problem as a machine translation problem. Sordoni et al (2015) improved Ritter et al.’s system by rescoring the outputs of a phrasal MT-based conversation system with a neural model incorporating prior context. Recent progress in SEQ2SEQ models have inspired several efforts (Vinyals and Le, 2015; Serban et al, 2016a,d; Luan et al, 2016) to build end-to-end conversational systems that first apply an encoder to map a message to a distributed vector representing its meaning and then generate a response from the vector.

    Our work adapts the encoder-decoder model to RL training, and can thus be viewed as an extension of Li et al (2016d), but with more general RL rewards. Li et al (2016d) simulate dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, coherence, and ease of answering. Our work is also related to recent efforts to integrate the SEQ2SEQ and reinforcement learning paradigms, drawing on the advantages of both (Wen et al, 2016). For example, Su et al (2016) combine reinforcement learning with neural generation on tasks with real users. Asghar et al (2016) train an end-to-end RL dialogue model using human users.
基金
  • Jiwei Li is supported by a Facebook Fellowship, which we gratefully acknowledge
  • This work is also partially supported by the NSF under award IIS-1514268, and the DARPA Communicating with Computers (CwC) program under ARO prime contract no
引用论文
  • V. M. Aleksandrov, V. I. Sysoyev, and V. V. Shemeneva. 1968. Stochastic optimization. Engineering Cybernetics 5:11–16.
    Google ScholarLocate open access versionFindings
  • Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, and Yoshua Bengio. 2016. Professor forcing: A new algorithm for training recurrent networks. In Advances In Neural Information Processing Systems. pages 4601–4609.
    Google ScholarLocate open access versionFindings
  • Nabiha Asghar, Pasca Poupart, Jiang Xin, and Hang Li. 2016. Online sequence-to-sequence reinforcement learning for open-domain conversational agents. arXiv preprint arXiv:1612.03929.
    Findings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A diversity-promoting objective function for neural conversation models. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. An actor-critic algorithm for sequence prediction. ICLR.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. CoNLL.
    Google ScholarLocate open access versionFindings
  • Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016a. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances In Neural Information Processing Systems. pages 2172–2180.
    Google ScholarLocate open access versionFindings
  • Xilun Chen, Ben Athiwaratkun, Yu Sun, Kilian Weinberger, and Claire Cardie. 2016b. Adversarial deep averaging networks for cross-lingual sentiment classification. arXiv preprint arXiv:1606.01614.
    Findings
  • Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 2016b. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany, pages 994–1003. http://www.aclweb.org/anthology/P16-1094.
    Locate open access versionFindings
  • Jiwei Li, Minh-Thang Luong, and Dan Jurafsky. 2015. A hierarchical neural autoencoder for paragraphs and documents. ACL.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Will Monroe, and Dan Jurafsky. 2016c. A simple, fast diverse decoding algorithm for neural generation. arXiv preprint arXiv:1611.08562.
    Findings
  • Jiwei Li, Will Monroe, Alan Ritter, and Dan Jurafsky. 2016d. Deep reinforcement learning for dialogue generation. EMNLP.
    Google ScholarLocate open access versionFindings
  • Chia-Wei Liu, Ryan Lowe, Iulian V Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. EMNLP.
    Google ScholarLocate open access versionFindings
  • Emily L Denton, Soumith Chintala, Rob Fergus, et al. 20Deep generative image models using a? laplacian pyramid of adversarial networks. In Advances in neural information processing systems. pages 1486–1494.
    Google ScholarLocate open access versionFindings
  • Peter W Glynn. 1990. Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM 33(10):75–84.
    Google ScholarLocate open access versionFindings
  • Ryan Lowe, Michael Noseworthy, Iulian Serban, Nicolas Angelard-Gontier, Yoshua Bengio, and Joelle Pineau. 20Towards an automatic turing test: Learning to evaluate dialogue responses. ACL.
    Google ScholarLocate open access versionFindings
  • Ryan Lowe, Iulian V Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. On the evaluation of dialogue systems with next utterance classification. SIGDIAL.
    Google ScholarLocate open access versionFindings
  • Yi Luan, Yangfeng Ji, and Mari Ostendorf. 2016. LSTM based conversation models. arXiv preprint arXiv:1603.09457.
    Findings
  • Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attentionbased neural machine translation. ACL.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
    Findings
  • Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence level training with recurrent neural networks. ICLR.
    Google ScholarLocate open access versionFindings
  • Alan Ritter, Colin Cherry, and William B Dolan. 2011. Data-driven response generation in social media. In Proceedings of EMNLP 2011. pages 583–593.
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. In Advances in Neural Information Processing Systems. pages 2226–2234.
    Google ScholarLocate open access versionFindings
  • Iulian V Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016a. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of AAAI.
    Google ScholarLocate open access versionFindings
  • Iulian V Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016b. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI-16).
    Google ScholarLocate open access versionFindings
  • Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bowen Zhou, Yoshua Bengio, and Aaron Courville. 2016c. Multiresolution recurrent neural networks: An application to dialogue response generation. arXiv preprint arXiv:1606.00776.
    Findings
  • Iulian Vlad Serban, Ryan Lowe, Laurent Charlin, and Joelle Pineau. 2016d. Generative deep neural networks for dialogue: A short review.
    Google ScholarFindings
  • Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues. AAAI.
    Google ScholarLocate open access versionFindings
  • Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural responding machine for short-text conversation. In Proceedings of ACL-IJCNLP. pages 1577–1586.
    Google ScholarLocate open access versionFindings
  • Louis Shao, Stephan Gouws, Denny Britz, Anna Goldie, Brian Strope, and Ray Kurzweil. 2017. Generating long and diverse responses with neural conversational models. ICLR.
    Google ScholarLocate open access versionFindings
  • Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Minimum risk training for neural machine translation. ACL.
    Google ScholarLocate open access versionFindings
  • David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489.
    Google ScholarLocate open access versionFindings
  • Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Meg Mitchell, JianYun Nie, Jianfeng Gao, and Bill Dolan. 2015. A neural network approach to context-sensitive generation of conversational responses. In Proceedings of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina RojasBarahona, Stefan Ultes, David Vandyke, TsungHsien Wen, and Steve Young. 2016. Continuously learning neural dialogue management. arxiv.
    Google ScholarFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. pages 3104–3112.
    Google ScholarLocate open access versionFindings
  • Alan M Turing. 1950. Computing machinery and intelligence. Mind 59(236):433–460.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals and Quoc Le. 2015. A neural conversational model. In Proceedings of ICML Deep Learning Workshop.
    Google ScholarLocate open access versionFindings
  • Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, and Steve Young. 2016. A networkbased end-to-end trainable task-oriented dialogue system. arXiv preprint arXiv:1604.04562.
    Findings
  • Ronald J Williams. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229–256.
    Google ScholarLocate open access versionFindings
  • Sam Wiseman and Alexander M Rush. 2016. Sequence-to-sequence learning as beam-search optimization. ACL.
    Google ScholarFindings
  • Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, and Xiaolong Wang. 2016. Incorporating loose-structured knowledge into LSTM with recall gate for conversation modeling. arXiv preprint arXiv:1605.05110.
    Findings
  • Kaisheng Yao, Geoffrey Zweig, and Baolin Peng. 2015. Attention with intention for a neural network conversation model. In NIPS workshop on Machine Learning for Spoken Language Understanding and Interaction.
    Google ScholarLocate open access versionFindings
  • Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2016a. Seqgan: sequence generative adversarial nets with policy gradient. arXiv preprint arXiv:1609.05473.
    Findings
  • Zhou Yu, Ziyu Xu, Alan W Black, and Alex I Rudnicky. 2016b. Strategy and policy learning for nontask-oriented conversational systems. In 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. page 404.
    Google ScholarLocate open access versionFindings
  • 2017. Aspect-augmented adversarial networks for domain adaptation. arXiv preprint arXiv:1701.00188.
    Findings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科