KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation

ACL, pp. 7098-7108, 2020.

Cited by: 7|Bibtex|Views153
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We provide generation- and retrieval-based benchmark models to facilitate further research

Abstract:

The research of knowledge-driven conversational systems is largely limited due to the lack of dialog data which consist of multi-turn conversations on multiple topics and with knowledge annotations. In this paper, we propose a Chinese multi-domain knowledge-driven conversation dataset, KdConv, which grounds the topics in multi-turn conv...More
0
Introduction
  • It has been a long-term goal of artificial intelligence to deliver human-like conversations, where background knowledge plays a crucial role in the success of conversational systems (Shang et al, 2015; Li et al, 2016a; Shao et al, 2017).
  • CMU DoG (Zhou et al, 2018b), India DoG (Moghe et al, 2018), and Wizard of Wikipedia (Dinan et al, 2018) demonstrate attempts for generating informative responses with topic-related Wikipedia articles
  • These datasets are not suitable for modeling topic transition or knowledge planning through multi-turn dialogs based on the relations of topics.
  • These knowledgegrounded dialog datasets still have limitations in modeling knowledge interactions2 in multi-turn conversations
Highlights
  • It has been a long-term goal of artificial intelligence to deliver human-like conversations, where background knowledge plays a crucial role in the success of conversational systems (Shang et al, 2015; Li et al, 2016a; Shao et al, 2017)
  • Background knowledge is defined as slot-value pairs, which provides key information for question answering or recommendation, and has been well defined and thoroughly studied (Wen et al, 2015; Zhou et al, 2016)
  • We provide benchmark models on this corpus to facilitate further research, and conduct extensive experiments
  • In order to explore the role of knowledge annotation, we evaluated the models with/without access to the knowledge graph of our dataset
  • We provide generation- and retrieval-based benchmark models to facilitate further research
Methods
  • 4.1 Models

    To provide benchmark models for knowledgedriven conversation modeling, the authors evaluated both generation- and retrieval-based models on the corpus.
  • The input of the encoder was the concatenation of the past k − 1 utterances, while the target output of the decoder was the k-th utterance.
  • If there were fewer than k − 1 sentences in the dialogue history, all the past utterances would be used as input.
  • The adapted model generates the k-th utterance based on the past k − 1 utterances, where k was set to 8, for fair comparison with Seq2Seq
Results
  • The authors analyze the results from the following perspectives: The influence of knowledge: after introducing the knowledge, all the models were improved in terms of all the metrics except PPL in all the domains.
  • The Coherence scores of both HRED and knowledgeaware HRED are higher than 1.00 but still have a huge gap to 2.00, indicating that the generated responses are relevant to the context but not coherent to knowledge information in most cases.
  • After incorporating the knowledge information into HRED, the Coherence score is improved significantly in all the three domains, as the knowledge information is
Conclusion
  • The authors propose a Chinese multi-domain corpus for knowledge-driven conversation generation, KdConv.
  • It contains 86K utterances and 4.5K dialogues, with an average number of 19.0 turns.
  • Each dialogue contains various topics and sentence-level annotations that map each utterance with the related knowledge triples.
  • The dataset provides a benchmark to evaluate the ability to model knowledge-driven conversations.
  • The authors provide generation- and retrieval-based benchmark models to facilitate further research.
  • Extensive experiments demonstrate that these models can be enhanced by introducing knowledge, whereas there is still much room in knowledge-grounded conversation modeling for future work
Summary
  • Introduction:

    It has been a long-term goal of artificial intelligence to deliver human-like conversations, where background knowledge plays a crucial role in the success of conversational systems (Shang et al, 2015; Li et al, 2016a; Shao et al, 2017).
  • CMU DoG (Zhou et al, 2018b), India DoG (Moghe et al, 2018), and Wizard of Wikipedia (Dinan et al, 2018) demonstrate attempts for generating informative responses with topic-related Wikipedia articles
  • These datasets are not suitable for modeling topic transition or knowledge planning through multi-turn dialogs based on the relations of topics.
  • These knowledgegrounded dialog datasets still have limitations in modeling knowledge interactions2 in multi-turn conversations
  • Methods:

    4.1 Models

    To provide benchmark models for knowledgedriven conversation modeling, the authors evaluated both generation- and retrieval-based models on the corpus.
  • The input of the encoder was the concatenation of the past k − 1 utterances, while the target output of the decoder was the k-th utterance.
  • If there were fewer than k − 1 sentences in the dialogue history, all the past utterances would be used as input.
  • The adapted model generates the k-th utterance based on the past k − 1 utterances, where k was set to 8, for fair comparison with Seq2Seq
  • Results:

    The authors analyze the results from the following perspectives: The influence of knowledge: after introducing the knowledge, all the models were improved in terms of all the metrics except PPL in all the domains.
  • The Coherence scores of both HRED and knowledgeaware HRED are higher than 1.00 but still have a huge gap to 2.00, indicating that the generated responses are relevant to the context but not coherent to knowledge information in most cases.
  • After incorporating the knowledge information into HRED, the Coherence score is improved significantly in all the three domains, as the knowledge information is
  • Conclusion:

    The authors propose a Chinese multi-domain corpus for knowledge-driven conversation generation, KdConv.
  • It contains 86K utterances and 4.5K dialogues, with an average number of 19.0 turns.
  • Each dialogue contains various topics and sentence-level annotations that map each utterance with the related knowledge triples.
  • The dataset provides a benchmark to evaluate the ability to model knowledge-driven conversations.
  • The authors provide generation- and retrieval-based benchmark models to facilitate further research.
  • Extensive experiments demonstrate that these models can be enhanced by introducing knowledge, whereas there is still much room in knowledge-grounded conversation modeling for future work
Tables
  • Table1: Comparison between our corpus and other human-labeled knowledge-grounded dialogue corpora
  • Table2: Statistics of the knowledge graphs used in constructing KdConv
  • Table3: Statistics of KdConv
  • Table4: Top-3 topic transition of the film domain, where Tn denotes the n-th topic of a dialog and Tn − X → Tn+1 represents the relation X between Tn and Tn+1
  • Table5: Automatic evaluation. The best results of generative models and retrieval models are in bold and underlined respectively. “+ know” means the models enhanced by the knowledge base
  • Table6: Manual evaluation. The best results (ttest, p-value < 0.005) are in bold. Between two generative models, the significantly better results are italic underlined (t-test, p-value < 0.005) or underlined (t-test, p-value < 0.05). κ is the Fleiss’ kappa value. “+ know” means the models enhanced by knowledge information
Download tables as Excel
Related work
  • Recently, open-domain conversation generation has been largely advanced due to the increase of publicly available dialogue data (Godfrey et al, 1992; Ritter et al, 2010; Shang et al, 2015; Lowe et al, 2015). However, the lack of annotation of background information or related knowledge results in significantly degenerated conversations, where the text is bland and strangely repetitive (Holtzman et al, 2019). These models produce conversations that are substantially different from those humans make, which largely rely on background knowledge.

    3https://github.com/thu-coai/KdConv

    To facilitate the development of conversational models that mimic human conversations, there have been several knowledge-grounded corpora proposed. Some datasets (Zhou et al, 2018b; Ghazvininejad et al, 2018; Liu et al, 2018; Tuan et al, 2019; Qin et al, 2019) collect dialogues and label the knowledge annotations using NER, string match, artificial scoring, and filtering rules based on external knowledge resources (Liu et al, 2018). However, mismatches between dialogues and knowledge resources introduce noises to these datasets. To obtain the high-quality knowledgegrounded datasets, some studies construct dialogues from scratch with human annotators, based on the unstructured text or structured knowledge graphs. For instance, several datasets (Zhou et al, 2018b; Dinan et al, 2018; Gopalakrishnan et al, 2019) have human conversations where one or both participants have access to the unstructured text of related background knowledge, while OpenDialKG (Moon et al, 2019) and DuConv (Wu et al, 2019) build up their corpora based on structured knowledge graphs. In Table 1, we present a survey on existing human-labeled knowledge-grounded dialogue datasets.
Funding
  • This work was jointly supported by the NSFC projects (Key project with No 61936010 and regular project with No 61876096), and the National Key R&D Program of China (Grant No 2018YFC0830200)
Reference
  • Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
    Findings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
    Findings
  • Hannah Bast, Florian Bäurle, Björn Buchhold, and Elmar Haußmann. 2014. Easy access to the freebase dataset. In WWW, pages 95–98. ACM.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research, 3(Feb):1137–1155.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724– 1734, Doha, Qatar. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Karthik Gopalakrishnan, Behnam Hedayatnia, Qinlang Chen, Anna Gottardi, Sanjeev Kwatra, Anu Venkatesh, Raefer Gabriel, and Dilek HakkaniTÃijr. 2019. Topical-Chat: Towards KnowledgeGrounded Open-Domain Conversations. In Proc. Interspeech 2019, pages 1891–1895.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, Yong Wang, Yun Chen, Victor OK Li, and Kyunghyun Cho. 2018. Meta-learning for lowresource neural machine translation. In EMNLP, pages 3622–3631.
    Google ScholarLocate open access versionFindings
  • Ari Holtzman, Jan Buys, Maxwell Forbes, and Yejin Choi. 2019. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
    Findings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A diversity-promoting objective function for neural conversation models. In NAACL, pages 110–119.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016b. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119, San Diego, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kristina Toutanova. 2019. BERT: Pre-training of
    Google ScholarFindings
  • Feng, Qun Liu, and Dawei Yin. 2018. Knowledge diffusion for neural dialogue generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1489–1498.
    Google ScholarLocate open access versionFindings
  • Louis Shao, Stephan Gouws, Denny Britz, Anna Goldie, Brian Strope, and Ray Kurzweil. 2017. Generating long and diverse responses with neural conversation models. arXiv preprint arXiv:1701.03185.
    Findings
  • Ryan Lowe, Nissan Pow, Iulian Serban, and Joelle Pineau. 20The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv preprint arXiv:1506.08909.
    Findings
  • Yan Song, Shuming Shi, Jing Li, and Haisong Zhang. 2018. Directional skip-gram: Explicitly distinguishing left and right context for word embeddings. In NAACL, pages 175–180.
    Google ScholarLocate open access versionFindings
  • Fei Mi, Minlie Huang, Jiyong Zhang, and Boi Faltings. 2019. Meta-learning for low-resource natural language generation in task-oriented dialogue systems. In IJCAI.
    Google ScholarFindings
  • Alexander Miller, Adam Fisch, Jesse Dodge, AmirHossein Karimi, Antoine Bordes, and Jason Weston. 2016. Key-value memory networks for directly reading documents. In EMNLP, pages 1400–1409.
    Google ScholarLocate open access versionFindings
  • Nikita Moghe, Siddhartha Arora, Suman Banerjee, and Mitesh M Khapra. 2018. Towards exploiting background knowledge for building conversation systems. In EMNLP, pages 2322–2332.
    Google ScholarLocate open access versionFindings
  • Seungwhan Moon, Pararth Shah, Anuj Kumar, and Rajen Subba. 2019. Opendialkg: Explainable conversational reasoning with attention-based walks over knowledge graphs. In ACL, pages 845–854.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In ACL, pages 311– 318.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch.
    Google ScholarFindings
  • Lianhui Qin, Michel Galley, Chris Brockett, Xiaodong Liu, Xiang Gao, Bill Dolan, Yejin Choi, and Jianfeng Gao. 2019. Conversing by reading: Contentful neural conversation with on-demand machine reading. arXiv preprint arXiv:1906.02738.
    Findings
  • Alan Ritter, Colin Cherry, and Bill Dolan. 2010. Unsupervised modeling of twitter conversations. In NAACL, pages 172–180.
    Google ScholarLocate open access versionFindings
  • Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at trec-3. Nist Special Publication Sp, 109:109.
    Google ScholarFindings
  • Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In AAAI, pages 3776–3784.
    Google ScholarLocate open access versionFindings
  • Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural responding machine for short-text conversation. In ACL, pages 1577–1586.
    Google ScholarLocate open access versionFindings
  • Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan. 2015. A neural network approach to context-sensitive generation of conversational responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 196– 205, Denver, Colorado. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Robert Speer and Catherine Havasi. 2013. Conceptnet 5: A large semantic network for relational knowledge. In The PeopleâAZs Web Meets NLP, pages 161–176. Springer.
    Google ScholarLocate open access versionFindings
  • Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223.
    Findings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In NIPS, pages 3104–3112.
    Google ScholarLocate open access versionFindings
  • Yi-Lin Tuan, Yun-Nung Chen, and Hung-yi Lee. 2019. Dykgchat: Benchmarking dialogue generation grounding on dynamic knowledge graphs. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1855– 1865.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Zhigang Wang, Juanzi Li, Zhichun Wang, Shuangjie Li, Mingyang Li, Dongsheng Zhang, Yao Shi, Yongbin Liu, Peng Zhang, and Jie Tang. 2013. Xlore: A large-scale english-chinese bilingual knowledge graph. In International semantic web conference (Posters & Demos), volume 1035, pages 121–124.
    Google ScholarLocate open access versionFindings
  • Tsung Hsien Wen, Milica Gasic, Nikola Mrksic, Pei Hao Su, David Vandyke, and Steve Young. 2015. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. In EMNLP, pages 1711–1721.
    Google ScholarLocate open access versionFindings
  • 2019. Proactive human-machine conversation with explicit conversation goal. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3794–3804, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yu Wu, Wei Wu, Chen Xing, Ming Zhou, and Zhoujun Li. 2017. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 496–505, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. 2018. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2204– 2213, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hao Zhou, Minlie Huang, and Xiaoyan Zhu. 2016. Context-aware natural language generation for spoken dialogue systems. In COLING, pages 2032– 2041.
    Google ScholarLocate open access versionFindings
  • Hao Zhou, Tom Young, Minlie Huang, Haizhou Zhao, Jingfang Xu, and Xiaoyan Zhu. 2018a. Commonsense knowledge aware conversation generation with graph attention. In IJCAI, pages 4623–4629.
    Google ScholarLocate open access versionFindings
  • Kangyan Zhou, Shrimai Prabhumoye, and Alan W Black. 2018b. A dataset for document grounded conversations. In EMNLP, pages 708–713.
    Google ScholarLocate open access versionFindings
  • Wenya Zhu, Kaixiang Mo, Yu Zhang, Zhangbin Zhu, Xuezheng Peng, and Qiang Yang. 2017. Flexible end-to-end dialogue system for knowledge grounded conversation. arXiv preprint arXiv:1709.04264.
    Findings
Full Text
Your rating :
0

 

Tags
Comments