AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We introduce the future conversation with the corresponding dialogue history to learn the implicit conversation scenario, which entails latent context knowledge and specifies how people interact in real-world

Regularizing Dialogue Generation by Imitating Implicit Scenarios

EMNLP 2020, pp.6592-6604, (2020)

Cited by: 0|Views136
Full Text
Bibtex
Weibo

Abstract

Human dialogues are scenario-based and appropriate responses generally relate to the latent context knowledge entailed by the specific scenario. To enable responses that are more meaningful and context-specific, we propose to improve generative dialogue systems from the scenario perspective, where both dialogue history and future conversa...More

Code:

Data:

0
Introduction
  • Neural dialogue generation has drawn increasing attention due to its vast commercial values and practical demands.
  • Different from other sequence generation tasks such as machine translation and paraphrase generation, the dialogue generation task is a loose coupling task that has much freedom in the semantic and linguistic aspects of the generated responses.
  • It is often hard for existing models to handle such freedom, while humans have no problem in giving.
Highlights
  • Neural dialogue generation has drawn increasing attention due to its vast commercial values and practical demands
  • By combining the dialogue history and its corresponding future conversation, we introduce the implicit conversation scenario into existing dialogue models to provide more semantic constraints and reduce the difficulty of prediction
  • We introduce the future conversation with the corresponding dialogue history to learn the implicit conversation scenario, which entails latent context knowledge and specifies how people interact in real-world
  • To incorporate such scenario knowledge without requiring future conversation in inference, we propose an imitation learning framework
  • The scenario-based teacher model first learns to generate responses with access to both the future conversation and the dialogue history, and a conventional student model is trained to imitate the teacher by hierarchical supervised signals
  • Our model achieves better results than stateof-the-art baselines on four datasets
  • Evaluation on four datasets demonstrates the effectiveness and scalability of our approach, compared to state-ofthe-art baselines, which enables the generation of responses that pertain more closely to the scenario indicated by the given dialogue history
Results
  • Automatic Evaluation The evaluation of opendomain dialogue generation has no well-defined automatic metrics.
  • DailyDialog Seq2Seq+Att VHRED+BOW NEXUS Transformer ReCoSa CHMAM RegDG.
  • PersonaChat Seq2Seq+Att VHRED+BOW NEXUS Transformer ReCoSa CHMAM RegDG.
  • OpenSubtitles Seq2Seq+Att VHRED+BOW NEXUS Transformer ReCoSa CHMAM RegDG greedy (GRE), embedding extrema (EXT)) (Liu et al, 2016), and coherence (COH) (Xu et al, 2018b), are widely adopted to reflect the grammaticality and semantic relevance of the responses (Serban et al, 2017; Csaky et al, 2019).
  • Please refer to the appendix for the detailed settings of automatic metrics
Conclusion
  • The authors introduce the future conversation with the corresponding dialogue history to learn the implicit conversation scenario, which entails latent context knowledge and specifies how people interact in real-world.
  • To incorporate such scenario knowledge without requiring future conversation in inference, the authors propose an imitation learning framework.
  • The authors will incorporate pre-trained models into the framework (e.g., BERT as a teacher and GPT as a student) to further unlock the performance improvement, and explore how to balance diverse prior knowledge from multiple teachers
Summary
  • Introduction:

    Neural dialogue generation has drawn increasing attention due to its vast commercial values and practical demands.
  • Different from other sequence generation tasks such as machine translation and paraphrase generation, the dialogue generation task is a loose coupling task that has much freedom in the semantic and linguistic aspects of the generated responses.
  • It is often hard for existing models to handle such freedom, while humans have no problem in giving.
  • Results:

    Automatic Evaluation The evaluation of opendomain dialogue generation has no well-defined automatic metrics.
  • DailyDialog Seq2Seq+Att VHRED+BOW NEXUS Transformer ReCoSa CHMAM RegDG.
  • PersonaChat Seq2Seq+Att VHRED+BOW NEXUS Transformer ReCoSa CHMAM RegDG.
  • OpenSubtitles Seq2Seq+Att VHRED+BOW NEXUS Transformer ReCoSa CHMAM RegDG greedy (GRE), embedding extrema (EXT)) (Liu et al, 2016), and coherence (COH) (Xu et al, 2018b), are widely adopted to reflect the grammaticality and semantic relevance of the responses (Serban et al, 2017; Csaky et al, 2019).
  • Please refer to the appendix for the detailed settings of automatic metrics
  • Conclusion:

    The authors introduce the future conversation with the corresponding dialogue history to learn the implicit conversation scenario, which entails latent context knowledge and specifies how people interact in real-world.
  • To incorporate such scenario knowledge without requiring future conversation in inference, the authors propose an imitation learning framework.
  • The authors will incorporate pre-trained models into the framework (e.g., BERT as a teacher and GPT as a student) to further unlock the performance improvement, and explore how to balance diverse prior knowledge from multiple teachers
Tables
  • Table1: The automatic evaluation results at the lowest point of the validation loss. The proposed approach achieves substantial improvements across all the dialogue datasets. “↑” means higher is better. “↓” means lower is better
  • Table2: The human evaluation results. Our model has higher percentages of win than baselines
  • Table3: Examples of the generated responses. The responses generated by our model imply the implicit conversation scenario and contain meaningful information
  • Table4: Results of the ablation study
  • Table5: The average of all metrics improvements on Class 1 set and Class 2 set. “∗” and “∗∗” indicate p ≤ 0.05 and p ≤ 0.01 respectively
  • Table6: Results on DailyDialog (1-1-1) and (3-1-3)
  • Table7: Effect of regularization: cosine similarity between the generated and real-world word distribution
  • Table8: Results of hard transfer operations. With more hard transfer operations, the diversity gradually improves, and the relevance gradually weakens
  • Table9: Results of multiple teachers on DailyDialog. With the help of LM, the performance gains consistent improvements
  • Table10: Examples of the generated responses
  • Table11: The automatic evaluation results at the lowest point of the validation loss. “↑” means higher is better. “↓” means lower is better
  • Table12: The results on DailyDialog (1-1-1) and DailyDialog (3-1-3)
  • Table13: The results on DailyDialog after 50 epochs of training
  • Table14: The results about the hard transfer operation on DailyDialog
  • Table15: The results about multiple teachers on DailyDialog
Download tables as Excel
Related work
  • Recently, various researches have focused on neural dialogue models to generate diverse, informative, and relevant responses. One line of research attempts to extract relevant contexts from redundant dialogue history accurately (Xing et al, 2018; Tao et al, 2018; Zhang et al, 2019a). Another line of research tries to explicitly incorporate a latent variable to inject the variability of response in the decoding process (Serban et al, 2017; Zhao et al, 2017). Shen et al (2018); Gu et al (2019); Gao et al (2019) further enriched the latent variable approach. Also, some works redesigned the objective function or automatically learned it by adversarial learning (Li et al, 2016, 2017a; Xu et al, 2018a; Feng et al, 2020), which improves diversity but brings a fragile training process. Finally, some researchers have adapted external knowledge, such as topic information (Xing et al, 2017), persona (Zhang et al, 2018a), knowledge base (Ghazvininejad et al, 2018). Unlike the above models to predict responses given a dialogue history, our method combines the future conversation with the dialogue history as the implicit conversation scenario, which contains comprehensive background information to guide the response generation.
Funding
  • This research was supported by Beijing Natural Science Foundation (No.L181010, 4172054), National Key R&D Program of China (No.2016YFB0801100), and National Basic Research Program of China (No.2013CB329605)
Study subjects and analysis
datasets: 4
We also demonstrate why imitation learning works and how to enhance it. • Our model achieves better results than stateof-the-art baselines on four datasets. Result analysis demonstrates the effectiveness and scalability of the implicit conversation scenario and our imitation learning framework

datasets: 4
As a result, the student is effectively regularized to reach a robust local minimum that represents better generalization performance. Evaluation on four datasets demonstrates the effectiveness and scalability of our approach, compared to state-ofthe-art baselines, which enables the generation of responses that pertain more closely to the scenario indicated by the given dialogue history. Moreover, detailed analyses illustrate how imitating implicit scenarios regularize the student model

Reference
  • Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. CoRR, abs/1607.06450.
    Findings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.
    Google ScholarFindings
  • Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS, pages 1171–1179.
    Google ScholarLocate open access versionFindings
  • Pawel Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Inigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. 2018. Multiwoz - A
    Google ScholarFindings
  • Xiaodong Gu, Kyunghyun Cho, Jung-Woo Ha, and Sunghun Kim. 2019. Dialogwae: Multimodal response generation with conditional wasserstein autoencoder. In ICLR (Poster).
    Google ScholarFindings
  • Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, and Tie-Yan Liu. 2019. Non-autoregressive neural machine translation with enhanced decoder input. In AAAI, pages 3723–3730.
    Google ScholarLocate open access versionFindings
  • Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the knowledge in a neural network. CoRR, abs/1503.02531.
    Findings
  • Jeremy Howard and Sebastian Ruder. 201Universal language model fine-tuning for text classification. In ACL (1), pages 328–339.
    Google ScholarLocate open access versionFindings
  • Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2017. On large-batch training for deep learning: Generalization gap and sharp minima. In ICLR. OpenReview.net.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR (Poster).
    Google ScholarLocate open access versionFindings
  • Jason Lee, Elman Mansimov, and Kyunghyun Cho. 2018. Deterministic non-autoregressive neural sequence modeling by iterative refinement. In EMNLP, pages 1173–1182.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In HLT-NAACL, pages 110–119.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Will Monroe, Tianlin Shi, Sebastien Jean, Alan Ritter, and Dan Jurafsky. 2017a. Adversarial learning for neural dialogue generation. In EMNLP, pages 2157–2169.
    Google ScholarLocate open access versionFindings
  • Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. 2017b. Dailydialog: A manually labelled multi-turn dialogue dataset. In IJCNLP(1), pages 986–995.
    Google ScholarLocate open access versionFindings
  • Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao Qin, Liwei Wang, and Tie-Yan Liu. 2019a. Hint-based training for non-autoregressive machine translation. CoRR, abs/1909.06708.
    Findings
  • Ziming Li, Julia Kiseleva, and Maarten de Rijke. 2019b. Dialogue generation: From imitation learning to inverse reinforcement learning. In AAAI, pages 6722–6729.
    Google ScholarLocate open access versionFindings
  • Pierre Lison and Jorg Tiedemann. 2016. Opensubtitles2016: Extracting large parallel corpora from movie and TV subtitles. In LREC. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Chia-Wei Liu, Ryan Lowe, Iulian Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In EMNLP, pages 2122–2132.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In ACL, pages 311– 318.
    Google ScholarLocate open access versionFindings
  • Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. Fitnets: Hints for thin deep nets. In ICLR (Poster).
    Google ScholarFindings
  • Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In AAAI, pages 3776–3784.
    Google ScholarLocate open access versionFindings
  • Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron C. Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues. In AAAI, pages 3295–3301.
    Google ScholarLocate open access versionFindings
  • Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural responding machine for short-text conversation. In ACL (1), pages 1577–1586.
    Google ScholarLocate open access versionFindings
  • Xiaoyu Shen, Hui Su, Wenjie Li, and Dietrich Klakow. 2018. Nexus network: Connecting the preceding and the following in dialogue generation. In EMNLP, pages 4316–4327.
    Google ScholarLocate open access versionFindings
  • Siqi Sun, Yu Cheng, Zhe Gan, and Jingjing Liu. 2019. Patient knowledge distillation for BERT model compression. CoRR, abs/1908.09355.
    Findings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In NIPS, pages 3104–3112.
    Google ScholarLocate open access versionFindings
  • Chongyang Tao, Shen Gao, Mingyue Shang, Wei Wu, Dongyan Zhao, and Rui Yan. 2018. Get the point of my utterance! learning towards effective responses with multi-head attention mechanism. In IJCAI, pages 4418–4424.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals and Quoc V. Le. 2015. A neural conversational model. CoRR, abs/1506.05869.
    Findings
  • Bingzhen Wei, Mingxuan Wang, Hao Zhou, Junyang Lin, and Xu Sun. 2019. Imitation learning for nonautoregressive neural machine translation. In ACL (1), pages 1304–1312.
    Google ScholarLocate open access versionFindings
  • Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, and Wei-Ying Ma. 2017. Topic aware neural response generation. In AAAI, pages 3351– 3357.
    Google ScholarLocate open access versionFindings
  • Chen Xing, Yu Wu, Wei Wu, Yalou Huang, and Ming Zhou. 2018. Hierarchical recurrent attention network for response generation. In AAAI, pages 5610– 5617.
    Google ScholarLocate open access versionFindings
  • Jingjing Xu, Xuancheng Ren, Junyang Lin, and Xu Sun. 2018a. Diversity-promoting GAN: A crossentropy based generative adversarial network for diversified text generation. In EMNLP, pages 3940– 3949.
    Google ScholarLocate open access versionFindings
  • Xinnuo Xu, Ondrej Dusek, Ioannis Konstas, and Verena Rieser. 2018b. Better conversations by modeling, filtering, and optimizing for coherence and diversity. In EMNLP, pages 3981–3991.
    Google ScholarLocate open access versionFindings
  • Hainan Zhang, Yanyan Lan, Liang Pang, Jiafeng Guo, and Xueqi Cheng. 2019a. Recosa: Detecting the relevant contexts with self-attention for multi-turn dialogue generation. In ACL (1), pages 3721–3730.
    Google ScholarLocate open access versionFindings
  • Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. 2018a. Personalizing dialogue agents: I have a dog, do you have pets too? In ACL (1), pages 2204–2213.
    Google ScholarLocate open access versionFindings
  • Wen Zhang, Yang Feng, Fandong Meng, Di You, and Qun Liu. 2019b. Bridging the gap between training and inference for neural machine translation. In ACL (1), pages 4334–4343.
    Google ScholarLocate open access versionFindings
  • Yizhe Zhang, Michel Galley, Jianfeng Gao, Zhe Gan, Xiujun Li, Chris Brockett, and Bill Dolan. 2018b. Generating informative and diverse conversational responses via adversarial information maximization. In NeurIPS, pages 1815–1825.
    Google ScholarLocate open access versionFindings
  • Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. 2017. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In ACL (1), pages 654–664.
    Google ScholarLocate open access versionFindings
  • MultiWOZ This dataset is a large-scale multi-turn conversation corpus that contains highly natural conversation across 7 goal-oriented scenarios written by human (Budzianowski et al., 2018). It is split to 58K, 15K, and 5K pairs for training, validation, and testing, respectively.
    Google ScholarLocate open access versionFindings
  • (2) The length of response, dialogue history, and future conversation are limited to [5, 25], [25, 80] and [25, 80], respectively.
    Google ScholarFindings
  • Seq2Seq+Att We use a vanilla Seq2Seq model (Sutskever et al., 2014) with attention mechanism (Bahdanau et al., 2015). The encoder consists of a 2-layer bidirectional LSTM with 256 hidden units. The decoder is based on a 4-layer unidirectional LSTM with 256 hidden units. VHRED+BOW VHRED is proposed by Serban et al. (2017), which introduces conditional variational auto-encoder (CVAE) into the HRED model (Serban et al., 2016) with a continuous latent variable attached to the response. We also adopt the BOW loss (Zhao et al., 2017) as a complementary with KL annealing. The latent variable is 256. NEXUS NEXUS (Shen et al., 2018) enriches the latent variable with both dialogue history and future conversation through mutual information maximization.
    Google ScholarLocate open access versionFindings
  • Transformer Transformer (Vaswani et al., 2017) is based solely on the attention mechanism. The number of blocks and heads is 2 and 4, respectively. The hidden size is set to 256. The dimension of the feed-forward layer is 1024. ReCoSa ReCoSa is proposed by Zhang et al. (2019a), which consists of a word-level LSTM encoder, a self-attention based context-level encoder and a self-attention based context-response decoder. CHMAM CHMAM (Tao et al., 2018) applies Multi-Head Attention Mechanism (MHAM) to capture multiple semantic aspects from the dialogue history with a regularizer penalizing the redundancy of attention weight vectors across different aspects of the source sequence.
    Google ScholarLocate open access versionFindings
  • We adopt residual connection, layernorm (Ba et al., 2016), and dropout in the LSTM-based baselines, which significantly boosts the performance of Seq2Seq+Att, VHRED+BOW, and NEXUS. RegDG and Transformer-IF use the same settings as Transformer. The parameters of Transformer-IF is kept fixed during imitation learning.
    Google ScholarFindings
  • as one long sentence and then calculates the perplexity (e.g., the official code of Zhao et al. (2017)).
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
小科