AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a model-agnostic approach, counterfactual off-policy training, that can be applied to any adversarial learning-based dialogue generation models

Counterfactual Off Policy Training for Neural Dialogue Generation

EMNLP 2020, pp.3438-3448, (2020)

Cited by: 0|Views184
Full Text
Bibtex
Weibo

Abstract

Open-domain dialogue generation suffers from the data insufficiency problem due to the vast size of potential responses. In this paper, we propose to explore potential responses by counterfactual reasoning. Given an observed response, the counterfactual reasoning model automatically infers the outcome of an alternative policy that could h...More

Code:

Data:

0
Introduction
  • Open-domain dialogue generation (Shang et al, 2015a; Vinyals and Le, 2015; Sordoni et al, 2015a) intends to produce coherent responses given dialogue history.
  • The authors cast a dialogue generation model as an SCM over two random variables: dialogue history X and response Y
  • This is achieved by converting the conditional distribution P (Y |X) into a deterministic function Y = fπ(X, U ).
  • The dialogue generation SCM makes it possible to sample counterfactual responses in the scenario where observed responses occur
  • This improves the quality of synthesized responses and subsequently helps the model to explore the highreward area of the potential response space in the training process
Highlights
  • Open-domain dialogue generation (Shang et al, 2015a; Vinyals and Le, 2015; Sordoni et al, 2015a) intends to produce coherent responses given dialogue history
  • Both reward for every generation step (REGS) and StepGAN outperform HRED in distinct-1 and distinct-2, indicating that adversarial learning is beneficial for improving the diversity of responses
  • The results demonstrate the effectiveness of the counterfactual response in exploring the high-reward area of the potential response space during the training process
  • We propose a model-agnostic approach, counterfactual off-policy training (COPT), that can be applied to any adversarial learning-based dialogue generation models
  • In contrast to existing approaches, it learns on counterfactual responses inferred from the structural causal model, taking advantage of observed responses
  • An empirical study on the DailyDialog dataset shows that our approach significantly outperforms the HRED model as well as the conventional adversarial learning approaches
  • Experiments show that the COPT significantly improves the quality of the generated responses, which demonstrates the effectiveness of this approach
Methods
  • The authors cast a dialogue generation model as an SCM to explore potential responses by counterfactual reasoning during the training process.
  • The authors will first review the concept of the SCM (Sec. 3.2), and introduce the COPT approach (Sec. 3.3).
  • The authors denote the response generated by COPT as counterfactual response.
  • The response of standard adversarial learning-based dialogue U1 V1 U3 U2 V2 V3.
  • Generation (i.e., REGS Li et al, 2017a) is denoted as standard response
Results
  • Table 3 shows the results of automatic evaluation
  • Both REGS and StepGAN outperform HRED in distinct-1 and distinct-2, indicating that adversarial learning is beneficial for improving the diversity of responses.
  • After introducing COPT, both distinct-1 and distinct-2 in REGS and StepGAN further increase, and the improvement is significant (t-test, p <0.01)
  • This suggests that COPT is model-agnostic to adversarial learning-based approaches and helps to promote the diversity.
  • The less significant result of BLEU-3 and BLEU-4 is mainly due to the sparsity of tri-grams and four-grams, which are harder to be covered by references than uni-grams and bi-grams
Conclusion
  • The authors propose a model-agnostic approach, COPT, that can be applied to any adversarial learning-based dialogue generation models.
  • In contrast to existing approaches, it learns on counterfactual responses inferred from the structural causal model, taking advantage of observed responses.
  • This helps the model to explore the high-reward area of the potential response space.
  • Experiments show that the COPT significantly improves the quality of the generated responses, which demonstrates the effectiveness of this approach
Summary
  • Introduction:

    Open-domain dialogue generation (Shang et al, 2015a; Vinyals and Le, 2015; Sordoni et al, 2015a) intends to produce coherent responses given dialogue history.
  • The authors cast a dialogue generation model as an SCM over two random variables: dialogue history X and response Y
  • This is achieved by converting the conditional distribution P (Y |X) into a deterministic function Y = fπ(X, U ).
  • The dialogue generation SCM makes it possible to sample counterfactual responses in the scenario where observed responses occur
  • This improves the quality of synthesized responses and subsequently helps the model to explore the highreward area of the potential response space in the training process
  • Objectives:

    The update from the behavior policy μ that generates observed responses to the target policy π that the authors aim to learn is the intervention of replacing fμ(X, U ) with fπ(X, U ).
  • Π is the target policy that the authors aim to learn.
  • Μ is the behavior policy that generates observed responses.
  • Note that there are two policies in COPT: the target policy that the authors aim to learn and the behavior policy used for the reasoning of scenarios
  • Methods:

    The authors cast a dialogue generation model as an SCM to explore potential responses by counterfactual reasoning during the training process.
  • The authors will first review the concept of the SCM (Sec. 3.2), and introduce the COPT approach (Sec. 3.3).
  • The authors denote the response generated by COPT as counterfactual response.
  • The response of standard adversarial learning-based dialogue U1 V1 U3 U2 V2 V3.
  • Generation (i.e., REGS Li et al, 2017a) is denoted as standard response
  • Results:

    Table 3 shows the results of automatic evaluation
  • Both REGS and StepGAN outperform HRED in distinct-1 and distinct-2, indicating that adversarial learning is beneficial for improving the diversity of responses.
  • After introducing COPT, both distinct-1 and distinct-2 in REGS and StepGAN further increase, and the improvement is significant (t-test, p <0.01)
  • This suggests that COPT is model-agnostic to adversarial learning-based approaches and helps to promote the diversity.
  • The less significant result of BLEU-3 and BLEU-4 is mainly due to the sparsity of tri-grams and four-grams, which are harder to be covered by references than uni-grams and bi-grams
  • Conclusion:

    The authors propose a model-agnostic approach, COPT, that can be applied to any adversarial learning-based dialogue generation models.
  • In contrast to existing approaches, it learns on counterfactual responses inferred from the structural causal model, taking advantage of observed responses.
  • This helps the model to explore the high-reward area of the potential response space.
  • Experiments show that the COPT significantly improves the quality of the generated responses, which demonstrates the effectiveness of this approach
Tables
  • Table1: Statistics of the DailyDialog dataset
  • Table2: The average training time (in seconds per epoch) on a single GPU
  • Table3: Automatic evaluation results of distinct-1 (Dist-1), distinct-2 (Dist-2), and BLEU scores
  • Table4: An example of generated responses given dialogue history between person A and B
  • Table5: Wins, losses, and ties (in %) of our approach against baselines based on the human evaluation
Download tables as Excel
Related work
  • Dialogue Generation Data-driven dialogue systems can be roughly divided into two categories: retrieval-based (Leuski et al, 2006; Ji et al, 2014; Yan et al, 2016) and generation-based (Shang et al, 2015b; Sordoni et al, 2015b; Vinyals and Le, 2015). Responses of retrieval-based methods come from a fixed candidate response set and thus are incapable of being customized. The generation-based methods can create new responses, but the vanilla sequence to sequence model tends to produce generic responses (Li et al, 2016).

    One way to address the generic response problem is by introducing external knowledge, such as keywords (Mou et al, 2016; Zhu et al, 2019b), topics (Xing et al, 2017), persona information (Zhang et al, 2019; Song et al, 2019), and retrieved candidate responses (Song et al, 2018; Wu et al, 2019; Zhu et al, 2019a). Another way is to optimize the architecture of networks. There are two architectures widely employed in this research line: the variational auto-encoder (Bowman et al, 2016; Zhao et al, 2017) and the generative adversarial network (Goodfellow et al, 2014; Li et al, 2017a; Zhang et al, 2018; Xu et al, 2018; Tuan and Lee, 2019). Our approach falls into the latter category. The differences between our approach and other adversarial learning-based approaches are as follows. First, we cast the dialogue generation model as an SCM to explore potential responses in the environment where observed responses occur. Second, we learn on counterfactual responses that inferred from the SCM. Third, a pre-trained behavior policy is involved during the generation process, making our approach an off-policy algorithm and benefits the exploration of potential responses.
Funding
  • This paper is supported by the National Natural Science Foundation of China under Grant No 62076081, No 61772153, and No 61936010
Study subjects and analysis
AMT workers: 5
Human Evaluation The human evaluation is conducted on 200 instances randomly sampled from the test set. We create a project on Amazon Mechanical Turk (Buhrmester et al, 2016) (AMT) and employ five AMT workers to give a preference between two responses generated by our approach and a baseline.6. To maintain the quality of the evaluation, the task is visible to workers whose approve rate is greater than 95%, and the number of approved is greater than 500

Reference
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the Third International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pages 10–21.
    Google ScholarLocate open access versionFindings
  • Lars Buesing, Theophane Weber, Yori Zwols, Nicolas Heess, Sebastien Racaniere, Arthur Guez, and JeanBaptiste Lespiau. 2019.
    Google ScholarFindings
  • Woulda, coulda, shoulda: Pei Ke, Jian Guan, Minlie Huang, and Xiaoyan Zhu. 2018. Generating informative responses with controlled sentence function. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1499–1508, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. OpenNMT: Opensource toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations, pages 67–72.
    Google ScholarLocate open access versionFindings
  • Anton Leuski, Ronakkumar Patel, David Traum, and Brandon Kennedy. 200Building effective question answering characters. In Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, pages 18–27.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Will Monroe, Tianlin Shi, Sebastien Jean, Alan Ritter, and Dan Jurafsky. 2017a. Adversarial learning for neural dialogue generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2157–2169.
    Google ScholarLocate open access versionFindings
  • Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. 2017b. Dailydialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 986–995.
    Google ScholarLocate open access versionFindings
  • Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2122–2132, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • R Duncan Luce. 2012. Individual choice behavior: A theoretical analysis. Courier Corporation.
    Google ScholarFindings
  • Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421.
    Google ScholarLocate open access versionFindings
  • Chris J Maddison, Daniel Tarlow, and Tom Minka. 2014. A* sampling. In Advances in Neural Information Processing Systems, pages 3086–3094.
    Google ScholarLocate open access versionFindings
  • Lili Mou, Yiping Song, Rui Yan, Ge Li, Lu Zhang, and Zhi Jin. 2016. Sequence to backward and forward sequences: A content-introducing approach to generative short-text conversation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3349–3358.
    Google ScholarLocate open access versionFindings
  • Michael Oberst and David Sontag. 2019. Counterfactual off-policy evaluation with gumbel-max structural causal models. In International Conference on Machine Learning, pages 4881–4890.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yookoon Park, Jaemin Cho, and Gunhee Kim. 2018. A hierarchical latent structure for variational conversation modeling. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1792–1801, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Judea Pearl and Dana Mackenzie. 20The book of why: the new science of cause and effect. Basic Books.
    Google ScholarFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lianhui Qin, Antoine Bosselut, Ari Holtzman, Chandra Bhagavatula, Elizabeth Clark, and Yejin Choi. 2019. Counterfactual story reasoning and generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5043– 5053, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Neal J Roese. 1997. Counterfactual thinking. Psychological bulletin, 121(1):133.
    Google ScholarLocate open access versionFindings
  • Iulian V Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Lifeng Shang, Zhengdong Lu, and Hang Li. 2015a. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1577–1586, Beijing, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lifeng Shang, Zhengdong Lu, and Hang Li. 2015b. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1), pages 1577–1586.
    Google ScholarLocate open access versionFindings
  • Haoyu Song, Wei-Nan Zhang, Yiming Cui, Dong Wang, and Ting Liu. 2019. Exploiting persona information for diverse generation of conversational responses. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Yiping Song, Rui Yan, Cheng-Te Li, Jian-Yun Nie, Ming Zhang, and Dongyan Zhao. 2018. An ensemble of retrieval-based and generation-based humancomputer conversation systems. In Proceedings of the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan. 2015a. A neural network approach to context-sensitive generation of conversational responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 196– 205, Denver, Colorado. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan. 2015b. A neural network approach to context-sensitive generation of conversational responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 196–205.
    Google ScholarLocate open access versionFindings
  • Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina M. Rojas-Barahona, Stefan Ultes, David Vandyke, Tsung-Hsien Wen, and Steve Young. 2016. Online active reward learning for policy optimisation in spoken dialogue systems. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2431–2441, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Twenty-Eighth Conference on Neural Information Processing Systems, pages 3104–3112.
    Google ScholarLocate open access versionFindings
  • Yi-Lin Tuan and Hung-Yi Lee. 2019. Improving conditional sequence generative adversarial networks by stepwise evaluation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(4):788–798.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869.
    Findings
  • Ronald J Williams. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Journal of Machine Learning, 3(8):229–256.
    Google ScholarLocate open access versionFindings
  • Sewall Wright. 1920. The relative importance of heredity and environment in determining the piebald pattern of guinea-pigs. Proceedings of the National Academy of Sciences of the United States of America, 6(6):320.
    Google ScholarLocate open access versionFindings
  • Yu Wu, Furu Wei, Shaohan Huang, Zhoujun Li, and Ming Zhou. 2019. Response generation by contextaware prototype editing. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, and Wei-Ying Ma. 2017. Topic aware neural response generation. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, volume 17, pages 3351–3357.
    Google ScholarLocate open access versionFindings
  • Jingjing Xu, Xuancheng Ren, Junyang Lin, and Xu Sun. 2018. Diversity-promoting GAN: A crossentropy based generative adversarial network for diversified text generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3940–3949.
    Google ScholarLocate open access versionFindings
  • Rui Yan, Yiping Song, and Hua Wu. 2016. Learning to respond with deep neural networks for retrievalbased human-computer conversation system. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 55–64.
    Google ScholarLocate open access versionFindings
  • Wei-Nan Zhang, Qingfu Zhu, Yifa Wang, Yanyan Zhao, and Ting Liu. 2019. Neural personalized response generation as domain adaptation. World Wide Web, 22(4):1427–1446.
    Google ScholarLocate open access versionFindings
  • Yizhe Zhang, Michel Galley, Jianfeng Gao, Zhe Gan, and Bill Dolan. 2018. Generating informative and diverse conversational responses via adversarial information maximization. In Proceedings of the Thirty-Second Conference on Neural Information Processing Systems, pages 1810–1820.
    Google ScholarLocate open access versionFindings
  • Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. 2017. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 654–664, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hao Zhou, Tom Young, Minlie Huang, Haizhou Zhao, Jingfang Xu, and Xiaoyan Zhu. 2018. Commonsense knowledge aware conversation generation with graph attention. In the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence, pages 4623–4629.
    Google ScholarLocate open access versionFindings
  • Qingfu Zhu, Lei Cui, Wei-Nan Zhang, Furu Wei, and Ting Liu. 2019a. Retrieval-enhanced adversarial training for neural response generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3763–3773, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Qingfu Zhu, Weinan Zhang, Lei Cui, and Ting Liu. 2019b. Order-sensitive keywords based response generation in open-domain conversational systems. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 19(2):1– 18.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
小科