AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Evaluation results on two benchmarks indicate that our model can significantly outperform state-of-the-art methods

Knowledge Grounded Dialogue Generation with Pre trained Language Models

EMNLP 2020, pp.3377-3390, (2020)

Cited by: 0|Views355
Full Text
Bibtex
Weibo

Abstract

We study knowledge-grounded dialogue generation with pre-trained language models. To leverage the redundant external knowledge under capacity constraint, we propose equipping response generation defined by a pre-trained language model with a knowledge selection module, and an unsupervised approach to jointly optimizing knowledge selection...More

Code:

Data:

0
Introduction
  • Context the author justs discovered star trek and the author really likes watching star trek.
  • If the author remembers Captain Kirk was not the original captain.
  • The author watched a little of the generation but could not get into it like i did with the original show.
  • Response These adventures went on but were short lived and six feature films.
Highlights
  • A B A B A Human DialoGPT

    Context I just discovered star trek and I really like watching star trek
  • We propose an unsupervised approach where learning of knowledge selection and fine-tuning of response generation are jointly conducted with unlabeled dialogues
  • Evaluation results indicate that our model can significantly outperform state-of-the-art methods as well as a few pre-trained models used in heuristic ways, and achieves new state-of-theart on the benchmarks
  • We further explore the application of pre-training to the task of open domain dialogue generation by equipping the pre-trained language models with external knowledge
  • Evaluation results on two benchmarks indicate that our model can significantly outperform state-of-the-art methods
Methods
  • Learning a g(U, D) without human annotations is not trivial.
  • Since knowledge selection and response generation are entangled, ideally the authors hope g(U, D) and the GPT-2 model can enhance each other in learning.
  • As the parameters of g(U, D) are far from optimal at the early stage, it is very possible that noise from g(U, D) will be fed to the GPT-2 model and flows back to the learning procedure of g(U, D), resulting in inferior models on both sides.
Results
  • Table 2 and Table 3 report evaluation results on Wizard and CMU DoG respectively.
  • GPT-2trunc is worse than KnowledGPT, due to (1) knowledge loss: the authors find that in 53% test examples (Test Seen+Test Unseen), the groundtruth knowledge is cut.
  • In this case, GPT-2trunc only relies on the context, the related knowledge in other candidates, and the knowledge packed in the parameters of GPT-2 for responding, which explains the comparable per-.
  • Coherence Relevance Wizard Kappa CMU DoG.
Conclusion
  • The authors apply large-scaled pre-trained language models to the task of knowledge-grounded dialogue generation.
  • To this end, the authors devise a knowledge selection module, and propose an unsupervised approach to jointly optimizing knowledge selection and response generation.
  • Evaluation results on two benchmarks indicate that the model can significantly outperform state-of-the-art methods
Tables
  • Table1: An example from the test set (Test Seen) of Wizard of Wikipedia (<a class="ref-link" id="cDinan_et+al_2019_a" href="#rDinan_et+al_2019_a">Dinan et al, 2019</a>)
  • Table2: Evaluation results on Wizard. Models that leverage human labels are marked with *. Numbers in bold mean that the improvement to the best baseline is statistically significant (t-test with p-value < 0.01)
  • Table3: Evaluation results on CMU DoG. Numbers in bold mean that the improvement to the best baseline is statistically significant (t-test with p-value < 0.01)
  • Table4: Human evaluation results on Wizard and CMU DoG
  • Table5: Ablation study on Wizard and CMU DoG
  • Table6: Performance of KnowledGPT under different Tmaxs
  • Table7: Statistics of the two datasets
  • Table8: Comparison with DialoGPT on Wizard and CMU DoG
  • Table9: Performance of GPT-2trunc under different maximum tokens with ground-truth knowledge involved
  • Table10: Performance of KnowledGPT under different sizes of GPT-2
  • Table11: A case from Test Seen of Wizard of Wikipedia
  • Table12: A case from Test Uneen of Wizard of Wikipedia
Download tables as Excel
Related work
Funding
  • This work was supported by the National Key Research and Development Program of China (No 2020AAA0105200), the National Science Foundation of China (NSFC No 61876196 and NSFC No 61672058)
  • Rui Yan was sponsored as the young fellow of Beijing Academy of Artificial Intelligence (BAAI)
Study subjects and analysis
well-educated native speakers: 3
blob/master/parlai/core/metrics.py. Besides automatic evaluation, we randomly sample 300 examples from Test Seen, Test Unseen, and the test set of CMU DoG respectively, and recruit 3 well-educated native speakers as annotators for human evaluation. To each annotator, an example is presented with a context, the associated external knowledge3, and model responses (top 1 in greedy search) that are randomly shuffled to hide their sources

Reference
  • Daniel Adiwardana, Minh-Thang Luong, David R So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, et al. 2020. Towards a human-like open-domain chatbot. arXiv preprint arXiv:2001.09977.
    Findings
  • Kevin Clark and Christopher D Manning. 2016. Deep reinforcement learning for mention-ranking coreference models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2256–2262.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. 2019. Wizard of wikipedia: Knowledge-powered conversational agents. In ICLR.
    Google ScholarFindings
  • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems, pages 13042–13054.
    Google ScholarLocate open access versionFindings
  • Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378.
    Google ScholarLocate open access versionFindings
  • Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 201Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1243–1252. JMLR. org.
    Google ScholarLocate open access versionFindings
  • Karthik Gopalakrishnan, Behnam Hedayatnia, Qinlang Chen, Anna Gottardi, Sanjeev Kwatra, Anu Venkatesh, Raefer Gabriel, Dilek Hakkani-Tur, and Amazon Alexa AI. 2019. Topical-chat: Towards knowledge-grounded open-domain conversations. Proc. Interspeech 2019, pages 1891–1895.
    Google ScholarLocate open access versionFindings
  • Bernd Huber, Daniel McDuff, Chris Brockett, Michel Galley, and Bill Dolan. 2018. Emotional dialogue generation using image-grounded language models. In CHI, page 277. ACM.
    Google ScholarLocate open access versionFindings
  • Byeongchang Kim, Jaewoo Ahn, and Gunhee Kim. 2020. Sequential latent knowledge selection for knowledge-grounded dialogue. arXiv preprint arXiv:2002.07510.
    Findings
  • Seonhoon Kim, Jin-Hyuk Hong, Inho Kang, and Nojun Kwak. 2018. Semantic sentence matching with densely-connected recurrent and co-attentive information. arXiv preprint arXiv:1805.11360.
    Findings
  • Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR.
    Google ScholarFindings
  • Guillaume Lample and Alexis Conneau. 2019. Crosslingual language model pretraining. arXiv preprint arXiv:1901.07291.
    Findings
  • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
    Findings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 20A diversity-promoting objective function for neural conversation models. NAACL, pages 110–119.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 20A persona-based neural conversation model. In ACL, pages 994–1003.
    Google ScholarLocate open access versionFindings
  • Zekang Li, Cheng Niu, Fandong Meng, Yang Feng, Qian Li, and Jie Zhou. 2019. Incremental transformer with deliberation decoder for document grounded conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 12–21.
    Google ScholarLocate open access versionFindings
  • Rongzhong Lian, Min Xie, Fan Wang, Jinhua Peng, and Hua Wu. 2019. Learning to select knowledge for response generation in dialog systems. arXiv preprint arXiv:1902.04911.
    Findings
  • Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2122–2132.
    Google ScholarLocate open access versionFindings
  • Alan Ritter, Colin Cherry, and William B Dolan. 2011. Data-driven response generation in social media. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 583–593.
    Google ScholarLocate open access versionFindings
  • Abigail See, Stephen Roller, Douwe Kiela, and Jason Weston. 2019. What makes a good conversation? how controllable attributes affect human judgments. arXiv preprint arXiv:1902.08654.
    Findings
  • Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In AAAI, volume 16, pages 3776–3784.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron C Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues. In AAAI, pages 3295–3301.
    Google ScholarLocate open access versionFindings
  • Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Advances in Neural Information Processing Systems, pages 13–23.
    Google ScholarLocate open access versionFindings
  • Seungwhan Moon, Pararth Shah, Anuj Kumar, and Rajen Subba. 2019. Opendialkg: Explainable conversational reasoning with attention-based walks over knowledge graphs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 845–854.
    Google ScholarLocate open access versionFindings
  • Nasrin Mostafazadeh, Chris Brockett, Bill Dolan, Michel Galley, Jianfeng Gao, Georgios Spithourakis, and Lucy Vanderwende. 2017. Image-grounded conversations: Multimodal context for natural question and response generation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 462–472.
    Google ScholarLocate open access versionFindings
  • Yifan Qiao, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. 2019. Understanding the behaviors of bert in ranking. arXiv preprint arXiv:1904.07531.
    Findings
  • Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural responding machine for short-text conversation. In ACL, pages 1577–1586.
    Google ScholarLocate open access versionFindings
  • Kurt Shuster, Samuel Humeau, Antoine Bordes, and Jason Weston. 2018. Engaging image chat: Modeling personality in grounded dialogue. arXiv preprint arXiv:1811.00945.
    Findings
  • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and TieYan Liu. 2019. Mass: Masked sequence to sequence pre-training for language generation. In International Conference on Machine Learning, pages 5926–5936.
    Google ScholarLocate open access versionFindings
  • Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2019. Vl-bert: Pretraining of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530.
    Findings
  • Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid. 2019a. Videobert: A joint model for video and language representation learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 7464–7473.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners.
    Google ScholarFindings
  • Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
    Findings
  • Chi Sun, Luyao Huang, and Xipeng Qiu. 2019b. Utilizing bert for aspect-based sentiment analysis via constructing auxiliary sentence. In Proceedings of NAACL-HLT, pages 380–385.
    Google ScholarLocate open access versionFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112.
    Google ScholarLocate open access versionFindings
  • Pengjie Ren, Zhumin Chen, Christof Monz, Jun Ma, and Maarten de Rijke. 2019. Thinking globally, acting locally: Distantly supervised global-to-local knowledge selection for background based conversation. arXiv preprint arXiv:1908.09528.
    Findings
  • Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pages 1057–1063.
    Google ScholarLocate open access versionFindings
  • Chongyang Tao, Shen Gao, Mingyue Shang, Wei Wu, Dongyan Zhao, and Rui Yan. 2018. Get the point of my utterance! learning towards effective responses with multi-head attention mechanism. In IJCAI, pages 4418–4424.
    Google ScholarLocate open access versionFindings
  • Yi-Lin Tuan, Yun-Nung Chen, and Hung-yi Lee. 2019. Dykgchat: Benchmarking dialogue generation grounding on dynamic knowledge graphs. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1855– 1865.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869.
    Findings
  • Yansen Wang, Chenyi Liu, Minlie Huang, and Liqiang Nie. 2018. Learning to ask questions in opendomain conversational systems with typed decoders. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2193–2203.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Victor Sanh, Julien Chaumond, and Clement Delangue. 2019. Transfertransfo: A transfer learning approach for neural network based conversational agents. arXiv preprint arXiv:1901.08149.
    Findings
  • Chen Xing, Wei Wu, Jie Liu, Yalou Huang, Ming Zhou, and Wei-Ying Ma. 2017a. Topic aware neural response generation. In AAAI, pages 3351–3357.
    Google ScholarLocate open access versionFindings
  • Chen Xing, Wei Wu, Yu Wu, Ming Zhou, Yalou Huang, and Wei-Ying Ma. 2017b. Hierarchical recurrent attention network for response generation. arXiv preprint arXiv:1701.07149.
    Findings
  • Can Xu, Wei Wu, Chongyang Tao, Huang Hu, Matt Schuerman, and Ying Wang. 2019. Neural response generation with meta-words. arXiv preprint arXiv:1906.06050.
    Findings
  • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237.
    Findings
  • Hainan Zhang, Yanyan Lan, Liang Pang, Jiafeng Guo, and Xueqi Cheng. 2019a. Recosa: Detecting the relevant contexts with self-attention for multi-turn dialogue generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3721–3730.
    Google ScholarLocate open access versionFindings
  • Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Jun Xu, and Xueqi Cheng. 2018a. Learning to control the specificity in neural response generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1108–1117.
    Google ScholarLocate open access versionFindings
  • Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. 2018b. Personalizing dialogue agents: I have a dog, do you have pets too? arXiv preprint arXiv:1801.07243.
    Findings
  • Xingxing Zhang, Furu Wei, and Ming Zhou. 2019b. Hibert: Document level pre-training of hierarchical bidirectional transformers for document summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5059–5069.
    Google ScholarLocate open access versionFindings
  • Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 2019c. Dialogpt: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536.
    Findings
  • Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. 2017. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In ACL, pages 654–664.
    Google ScholarLocate open access versionFindings
  • Xueliang Zhao, Wei Wu, Chongyang Tao, Can Xu, Dongyan Zhao, and Rui Yan. 2020. Low-resource knowledge-grounded dialogue generation. arXiv preprint arXiv:2002.10348.
    Findings
  • Hao Zhou, Minlie Huang, Tianyang Zhang, Xiaoyan Zhu, and Bing Liu. 2017. Emotional chatting machine: Emotional conversation generation with internal and external memory. arXiv preprint arXiv:1704.01074.
    Findings
  • Hao Zhou, Tom Young, Minlie Huang, Haizhou Zhao, Jingfang Xu, and Xiaoyan Zhu. 2018a. Commonsense knowledge aware conversation generation with graph attention. In IJCAI, pages 4623–4629.
    Google ScholarLocate open access versionFindings
  • Kangyan Zhou, Shrimai Prabhumoye, and Alan W Black. 2018b. A dataset for document grounded conversations. arXiv preprint arXiv:1809.07358.
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科