AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
A novel pre-training model for dialogue generation is introduced in this paper, incorporated with latent discrete variables for one-to-many relationship modeling

PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable

ACL 2020, pp.85-96, (2020)

被引用4|浏览261
EI
下载 PDF 全文
引用
微博一下

摘要

Pre-training models have been proved effective for a wide range of natural language processing tasks. Inspired by this, we propose a novel dialogue generation pre-training framework to support various kinds of conversations, including chit-chat, knowledge grounded dialogues, and conversational question answering. In this framework, we a...更多
0
简介
  • Dialogue generation is a challenging task due to the limited corpus of human conversations, complex background knowledge, and diverse relationships between utterances.
  • Pre-trained large-scale language models, such as BERT (Devlin et al, 2019) and XLNet (Yang et al, 2019), have achieved prominent success in natural language processing
  • Such models are usually constructed based on a massive scale of general text corpora, like English Wikipedia or BooksCorpus (Zhu et al, 2015), where distributed representations can be learned automatically from the raw text.
  • Possible reasons are three-fold: 1) the underlying linguistic patterns in human conversations can be highly different from those in general text, which suggests a potentially large gap in knowledge or data distribution; 2) the training mode of uni-directional dialogue generation is distinct from that of bi-directional natural language understating as applied in BERT; 3) unlike most of the general NLP tasks, there exists a one-to-many relationship in dialogue generation, where the dialogue context may correspond to multiple appropriate replies
重点内容
  • Dialogue generation is a challenging task due to the limited corpus of human conversations, complex background knowledge, and diverse relationships between utterances
  • Pre-trained large-scale language models, such as BERT (Devlin et al, 2019) and XLNet (Yang et al, 2019), have achieved prominent success in natural language processing. Such models are usually constructed based on a massive scale of general text corpora, like English Wikipedia or BooksCorpus (Zhu et al, 2015), where distributed representations can be learned automatically from the raw text. With these representations being fine-tuned, breakthroughs have been continuously reported for various downstream tasks, especially those on natural language understanding, such as question answering, natural language inference, and so on
  • We propose to encode discrete latent variables into transformer blocks for one-to-many relationship modeling, where two reciprocal tasks of response generation and latent act recognition are collaboratively carried out
  • A novel pre-training model for dialogue generation is introduced in this paper, incorporated with latent discrete variables for one-to-many relationship modeling
  • The results demonstrate that our model obtains significant improvements over the other state-of-the-art methods
  • Our work can be potentially improved with more fine-grained latent variables
方法
  • The following models have been compared in the experiments. Baseline.
  • Sequence to sequence with attention (Seq2Seq) (Vinyals and Le, 2015) is employed as the baseline for the experiments on Persona-Chat and Daily Dialog.
  • Persona-Chat was utilized in the ConvAI2 challenge (Dinan et al, 2019a), where the team of Lost in Conversation (LIC) (Golovanov et al, 2019) obtains the best performance.
  • In DSTC7-AVSD, the team of CMU (Sanabria et al, 2019) obtains the best performance across all the evaluation metrics.
  • To better analyze the effects of the latent discrete variable, the authors compare to the version without latent variable (The authors' w/o Latent).2
结论
  • A novel pre-training model for dialogue generation is introduced in this paper, incorporated with latent discrete variables for one-to-many relationship modeling.
  • To pre-train the model, two reciprocal tasks of response generation and latent recognition are carried out simultaneously on large-scale conversation datasets.
  • The authors' pre-trained model is flexible enough to handle various down-stream tasks of dialogue generation.
  • The authors will explore to boost the latent selection policy with reinforcement learning and extend the pre-training to support dialogue generation in other languages
总结
  • Introduction:

    Dialogue generation is a challenging task due to the limited corpus of human conversations, complex background knowledge, and diverse relationships between utterances.
  • Pre-trained large-scale language models, such as BERT (Devlin et al, 2019) and XLNet (Yang et al, 2019), have achieved prominent success in natural language processing
  • Such models are usually constructed based on a massive scale of general text corpora, like English Wikipedia or BooksCorpus (Zhu et al, 2015), where distributed representations can be learned automatically from the raw text.
  • Possible reasons are three-fold: 1) the underlying linguistic patterns in human conversations can be highly different from those in general text, which suggests a potentially large gap in knowledge or data distribution; 2) the training mode of uni-directional dialogue generation is distinct from that of bi-directional natural language understating as applied in BERT; 3) unlike most of the general NLP tasks, there exists a one-to-many relationship in dialogue generation, where the dialogue context may correspond to multiple appropriate replies
  • Methods:

    The following models have been compared in the experiments. Baseline.
  • Sequence to sequence with attention (Seq2Seq) (Vinyals and Le, 2015) is employed as the baseline for the experiments on Persona-Chat and Daily Dialog.
  • Persona-Chat was utilized in the ConvAI2 challenge (Dinan et al, 2019a), where the team of Lost in Conversation (LIC) (Golovanov et al, 2019) obtains the best performance.
  • In DSTC7-AVSD, the team of CMU (Sanabria et al, 2019) obtains the best performance across all the evaluation metrics.
  • To better analyze the effects of the latent discrete variable, the authors compare to the version without latent variable (The authors' w/o Latent).2
  • Conclusion:

    A novel pre-training model for dialogue generation is introduced in this paper, incorporated with latent discrete variables for one-to-many relationship modeling.
  • To pre-train the model, two reciprocal tasks of response generation and latent recognition are carried out simultaneously on large-scale conversation datasets.
  • The authors' pre-trained model is flexible enough to handle various down-stream tasks of dialogue generation.
  • The authors will explore to boost the latent selection policy with reinforcement learning and extend the pre-training to support dialogue generation in other languages
表格
  • Table1: Summary of datasets used in the experiments
  • Table2: Table 2
  • Table3: Experimental results on DSTC7-AVSD with automatic evaluation, with highest value written in bold
  • Table4: Examples of response generation with our pre-trained model
  • Table5: Table 5
  • Table6: Table 6
  • Table7: Case analysis of response generation on Daily Dialog
  • Table8: Case analysis of response generation on DSTC7-AVSD
Download tables as Excel
相关工作
  • Related work involves pre-trained language models and one-to-many modeling in dialogue generation. Pre-trained Language Models. Pre-trained language models, which are trained on massive general text, have brought many breakthroughs on various NLP tasks. These models can be roughly divided into two categories according to their attention mechanisms. GPT (Radford et al, 2018) and GPT-2 (Radford et al, 2019) are representative unidirectional language models, where one token is only allowed to attend its previous tokens and the objective is to maximize left-to-right generation likelihood. BERT (Devlin et al, 2019) and XLNet (Yang et al, 2019) are bi-directional language models, where bi-directional context attention is enabled for token prediction. The latest unified language model UniLM (Dong et al, 2019) is able to support both uni- and bi-directional attention with flexible self-attention mask designs. Recently, some attempts (Golovanov et al, 2019; Wolf et al, 2019; Zhang et al, 2019) have been made to adapt generative language models GPT or GPT-2 for dialogue generation. Whereas the special issues of conversations, such as impacts from background knowledge and problems of one-to-many relationship, are not fully considered and tackled in these adaptations. One-to-many Modeling. Given one piece of context, there exists multiple appropriate responses, which is know as the one-to-many mapping problem. To model this one-to-many relationship, CVAE (Zhao et al, 2017) employs Gaussian distri-
基金
  • This work was supported by the Natural Key Research and Development Project of China (No 2018AAA0101900)
引用论文
  • Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K Marks, Chiori Hori, Peter Anderson, et al. 2019. Audio visual scene-aware dialog. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7558–7567.
    Google ScholarLocate open access versionFindings
  • Siqi Bao, Huang He, Fan Wang, Rongzhong Lian, and Hua Wu. 2019. Know more about each other: Evolving dialogue strategy via compound assessment. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5382– 5391.
    Google ScholarLocate open access versionFindings
  • Boxing Chen and Colin Cherry. 2014. A systematic comparison of smoothing techniques for sentencelevel bleu. In Proceedings of the 9th Workshop on Statistical Machine Translation, pages 362–367.
    Google ScholarLocate open access versionFindings
  • Chaotao Chen, Jinhua Peng, Fan Wang, Jun Xu, and Hua Wu. 2019. Generating multiple diverse responses with multi-mapping and posterior mapping selection. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pages 4918–4924.
    Google ScholarLocate open access versionFindings
  • Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, and C Lawrence Zitnick. 201Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325.
    Findings
  • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1724–1734.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, et al. 2019a. The second conversational intelligence challenge (convai2). arXiv preprint arXiv:1902.00098.
    Findings
  • Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. 2019b. Wizard of wikipedia: Knowledge-powered conversational agents. International Conference on Learning Representations.
    Google ScholarFindings
  • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. arXiv preprint arXiv:1905.03197.
    Findings
  • Le Fang, Chunyuan Li, Jianfeng Gao, Wen Dong, and Changyou Chen. 2019. Implicit deep latent variable models for text generation. arXiv preprint arXiv:1908.11527.
    Findings
  • Joseph L Fleiss and Jacob Cohen. 1973. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. In Educational and Psychological Measurement, pages 613– 619.
    Google ScholarLocate open access versionFindings
  • Michel Galley, Chris Brockett, Xiang Gao, Jianfeng Gao, and Bill Dolan. 2019. Grounded response generation task at dstc7. In AAAI Dialog System Technology Challenge Workshop.
    Google ScholarLocate open access versionFindings
  • Xiang Gao, Sungjin Lee, Yizhe Zhang, Chris Brockett, Michel Galley, Jianfeng Gao, and Bill Dolan. 2019. Jointly optimizing diversity and relevance in neural response generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1229–1238.
    Google ScholarLocate open access versionFindings
  • Sergey Golovanov, Rauf Kurbanov, Sergey Nikolenko, Kyryl Truskovskyi, Alexander Tselousov, and Thomas Wolf. 2019. Large-scale transfer learning for natural language generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6053–6058.
    Google ScholarLocate open access versionFindings
  • Xiaodong Gu, Kyunghyun Cho, Jung-Woo Ha, and Sunghun Kim. 2019. Dialogwae: Multimodal response generation with conditional wasserstein autoencoder. International Conference on Learning Representations.
    Google ScholarFindings
  • Chenyang Huang, Osmar Zaiane, Amine Trabelsi, and Nouha Dziri. 2018. Automatic dialogue generation with expressed emotions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 49–54.
    Google ScholarLocate open access versionFindings
  • Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong, and Richard Socher. 2019. CTRL: A Conditional Transformer Language Model for Controllable Generation. arXiv preprint arXiv:1909.05858.
    Findings
  • Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119.
    Google ScholarLocate open access versionFindings
  • Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. 2017. Dailydialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the 8th International Joint Conference on Natural Language Processing, pages 986–995.
    Google ScholarLocate open access versionFindings
  • Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2122–2132.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Technical report, OpenAI.
    Google ScholarFindings
  • Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. Technical report, OpenAI.
    Google ScholarFindings
  • Hannah Rashkin, Eric Michael Smith, Margaret Li, and Y-Lan Boureau. 2019. Towards empathetic opendomain conversation models: A new benchmark and dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5370–5381.
    Google ScholarLocate open access versionFindings
  • Ramon Sanabria, Shruti Palaskar, and Florian Metze. 2019. Cmu sinbads submission for the dstc7 avsd challenge. In AAAI Dialog System Technology Challenge Workshop.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869.
    Findings
  • Thomas Wolf, Victor Sanh, Julien Chaumond, and Clement Delangue. 2019. Transfertransfo: A transfer learning approach for neural network based conversational agents. arXiv preprint arXiv:1901.08149.
    Findings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
    Findings
  • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237.
    Findings
  • Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. 2018. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pages 2204–2213.
    Google ScholarLocate open access versionFindings
  • Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 2019. Dialogpt: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536.
    Findings
  • Tiancheng Zhao, Kyusong Lee, and Maxine Eskenazi. 2018. Unsupervised discrete sentence representation learning for interpretable neural dialog generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pages 1098–1107.
    Google ScholarLocate open access versionFindings
  • Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. 2017. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 654–664.
    Google ScholarLocate open access versionFindings
  • Hao Zhou, Tom Young, Minlie Huang, Haizhou Zhao, Jingfang Xu, and Xiaoyan Zhu. 2018. Commonsense knowledge aware conversation generation with graph attention. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 4623–4629.
    Google ScholarLocate open access versionFindings
  • Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE International Conference on Computer Vision, pages 19–27.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
小科