We introduce Topical-Chat, an open-domain knowledgegrounded conversation dataset without explicit roles for conversation partners and containing depth and breadth of topical coverage with transitions in conversations
Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations
INTERSPEECH, pp.1891-1895, (2019)
Building socialbots that can have deep, engaging open-domain conversations with humans is one of the grand challenges of artificial intelligence (AI). To this end, bots need to be able to leverage world knowledge spanning several domains effectively when conversing with humans who have their own world knowledge. Existing knowledge-grounde...更多
下载 PDF 全文
- Building conversational bots that can interact with humans in natural language has been of interest to researchers since the early days of computing, as exemplified by text-based systems such as ELIZA .
- Task-oriented bots aim to help humans accomplish a specific task through multi-turn interactions, whereas open-domain bots aim to serve as social conversation partners with whom humans can have natural and engaging conversations.
- In addition to mastering traditional language skills like comprehension, open-domain bots need to perfect several conversational skills that come naturally to humans: recalling from world knowledge, reasoning in conjunction with conversational history and constructing valid responses.
- The authors introduce Topical-Chat, a dataset of ∼11K human-human conversations about knowledge spanning 8 broad topics.
- Partners do not have explicitly defined roles they need to serve during a conversation and the
- Building conversational bots that can interact with humans in natural language has been of interest to researchers since the early days of computing, as exemplified by text-based systems such as ELIZA 
- Task-oriented bots aim to help humans accomplish a specific task through multi-turn interactions, whereas open-domain bots aim to serve as social conversation partners with whom humans can have natural and engaging conversations
- We demonstrate the ability of our models to have engaging conversations grounded in knowledge through automated and human evaluation
- In order to decide on an appropriate WH , we tried training a Transformer that uses knowledge with varying WH and evaluated them on automated metrics described below (Table 5)
- We introduce Topical-Chat, an open-domain knowledgegrounded conversation dataset without explicit roles for conversation partners and containing depth and breadth of topical coverage with transitions in conversations
- We provide evidence of qualitative value through human evaluation of these models
- All models were trained using ParlAI .
- The authors randomly initialized 300-dimensional word embeddings, which are learned during training.
- The authors do not learn positional embeddings and encode position using one-hot vectors.
- The authors use a batch size of 32, stochastic gradient descent for optimization with a gradient clip of 0.1 and learning rate scheduler decay 0.5 with patience 3.
- The authors stop training when perplexity on the validation frequent set does not decrease for 10 epochs.
- The authors use beam search with a beam size of 5 for decoding
- =. Number of Utterances Average Number of Turns per Conversation.
- In order to decide on an appropriate WH , the authors tried training a Transformer that uses knowledge with varying WH and evaluated them on automated metrics described below (Table 5).
- The authors observe that WH = 32 works best.
- The authors believe this reflects the knowledge model’s inability to attend to important tokens in the dialog context when a large WH is used.
- The authors introduce Topical-Chat, an open-domain knowledgegrounded conversation dataset without explicit roles for conversation partners and containing depth and breadth of topical coverage with transitions in conversations.
- The authors train simple Transformer-based models for response generation and evaluate them using automated metrics for benchmarking.
- The authors provide evidence of qualitative value through human evaluation of these models.
- The authors hope that the release of Topical-Chat fosters data-driven research in open-domain knowledge-grounded conversational AI
- Table1: Topics and their entity budgets
- Table2: Topical-Chat conversation stats
- Table3: Automated metrics on test set (Frequent/Rare)
- Table4: Human evaluation metrics for 150 test freq. snippets
- Table5: Effect of varying WH for TF (w/ k.) on test freq
- Recent research interest in knowledge-grounded conversations has led to the public release of multiple datasets.  released a dataset of ∼4K conversations where Wikipedia articles about 30 movies served as the knowledge base. The collection was performed with portions of the articles shown to conversation partners in a scheduled way.  released a similar dataset of conversations about movies, where the knowledge base comprises Wikipedia articles, reviews and comments mined from the web about ∼1K movies. The collection involved self-dialogues, where one crowdworker generates utterances for both sides. More recently, the Wizard of Wikipedia (WoW) dataset  was released, where the focus, similar to ours, is on collecting opendomain knowledge-grounded conversations. A key difference is their knowledge base comprises Wikipedia articles, whereas we relied on multiple data sources, specifically Washington Post articles and Reddit fun facts in addition to Wikipedia articles about entities, to enable lively interactions.
Sequence-to-sequence generative modeling approaches have become popular for response generation, where the goal is to generate a response given the previous turn in a conversation [2, 3]. However, responses generated by these sequence-tosequence models are not always coherent or contextually appropriate and are noted to be often generic and lacking interesting content . Such approaches don’t explicitly ground responses on relevant knowledge. This has led to work on approaches that include world knowledge into conversational response generation.  used end-to-end memory networks to condition the generated responses on knowledge, where attention over the knowledge relevant to the conversation context is estimated, and multiple knowledge representations are included as input during response decoding.  retrieves relevant knowledge graphs given the conversation context and encodes the graphs with a static graph attention mechanism. The decoder attentively reads the retrieved knowledge graphs and the knowledge triples within each graph. More recently,  use a Transformer Memory Network to encode knowledge sentences and conversation context and decode a response.
- Introduces Topical-Chat, a knowledge-grounded humanhuman conversation dataset where the underlying knowledge spans 8 broad topics and conversation partners don’t have explicitly defined roles, to help further research in opendomain conversational AI
- Introduces Topical-Chat, a dataset of ∼11K human-human conversations about knowledge spanning 8 broad topics
- Demonstrates the ability of our models to have engaging conversations grounded in knowledge through automated and human evaluation
- Considered the frequency distribution of the 8 topics across all user utterances to allocate an entity budget Bi for each topic i
Article Selection: We fetched Washington Post articles from 2018 that each referenced 3 or more of the 300 entities and contained 600-1000 words. We removed articles with profane language and then considered the topic-entity budgets to finalize 3088 articles, ensuring adequate coverage for all topics. 3.2
The partial conversation corresponding to each snippet came from a distinct conversation in the Topical-Chat test frequent set. For each rc in each snippet, we asked two humans to separately annotate [20, 21] (possible values in parentheses) whether rc is comprehensible (0/1), on-topic (0/1) and interesting (0/1). We also asked them to annotate how effectively kis utilized in rc (0-3) and if they would have liked to continue the conversation after rc (0/1)
F1 0.16 / 0.16 0.16 / 0.15 0.22 / 0.20 0.22 / 0.19. Div. (n=1) 0.85 / 0.84 0.86 / 0.85 0.84 / 0.80 0.85 / 0.82. Div. (n=2) 0.86 / 0.86 0.86 / 0.85 0.83 / 0.81 0.84 / 0.82
Div. (n=1) 0.85 / 0.84 0.86 / 0.85 0.84 / 0.80 0.85 / 0.82. Div. (n=2) 0.86 / 0.86 0.86 / 0.85 0.83 / 0.81 0.84 / 0.82. Model 1 Human TF TF (w/ p.t.) TF (w/ k.) TF (w/ k. p.t.)
comp. (κ = 0.83) 0.99 0.87 0.88 0.78 0.71 o.t. (κ = 0.67) 0.93 0.60 0.62 0.69 0.66 l.k. (κ = 0.62) 1.92 0.08 0.12 0.63 0.80. WH PPL F1 Div. (n=1) Div. (n=2). We introduce Topical-Chat, an open-domain knowledgegrounded conversation dataset without explicit roles for conversation partners and containing depth and breadth of topical coverage with transitions in conversations
- J. Weizenbaum et al., “Eliza—a computer program for the study of natural language communication between man and machine,” Communications of the ACM, vol. 9, no. 1, pp. 36–45, 1966.
- O. Vinyals and Q. Le, “A neural conversational model,” arXiv preprint arXiv:1506.05869, 2015.
- A. Ritter, C. Cherry, and B. Dolan, “Unsupervised modeling of twitter conversations,” in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010, pp. 172–180.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
- E. Dinan, S. Roller, K. Shuster, A. Fan, M. Auli, and J. Weston, “Wizard of wikipedia: Knowledge-powered conversational agents,” arXiv preprint arXiv:1811.01241, 2018.
- K. Zhou, S. Prabhumoye, and A. W. Black, “A dataset for document grounded conversations,” arXiv preprint arXiv:1809.07358, 2018.
- N. Moghe, S. Arora, S. Banerjee, and M. M. Khapra, “Towards exploiting background knowledge for building conversation systems,” 2018.
- M. Ghazvininejad, C. Brockett, M.-W. Chang, B. Dolan, J. Gao, W.-t. Yih, and M. Galley, “A knowledge-grounded neural conversation model,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- H. Zhou, T. Young, M. Huang, H. Zhao, J. Xu, and X. Zhu, “Commonsense knowledge aware conversation generation with graph attention.” in IJCAI, 2018, pp. 4623–4629.
- J. E. Weston, “Dialog-based language learning,” in Advances in Neural Information Processing Systems, 2016, pp. 829–837.
- M. Lewis, D. Yarats, Y. N. Dauphin, D. Parikh, and D. Batra, “Deal or no deal? end-to-end learning for negotiation dialogues,” arXiv preprint arXiv:1706.05125, 2017.
- S. Zhang, E. Dinan, J. Urbanek, A. Szlam, D. Kiela, and J. Weston, “Personalizing dialogue agents: I have a dog, do you have pets too?” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 2204–2213.
- C. Khatri, B. Hedayatnia, A. Venkatesh, J. Nunn, Y. Pan, Q. Liu, H. Song, A. Gottardi, S. Kwatra, S. Pancholi, M. Cheng, Q. Chen, L. Stubel, K. Gopalakrishnan, K. Bland, R. Gabriel, A. Mandal, D. Hakkani-Tur, G. Hwang, N. Michel, E. King, and R. Prasad, “Advancing the state of the art in open domain dialog systems through the alexa prize,” in Alexa Prize Proceeedings (https://developer.amazon.com/alexaprize/challenges/pastchallenges/2018/), 2018.
- Reddit, “r/todayilearned,” https://www.reddit.com/r/todayilearned/.
- R. Mihalcea and P. Tarau, “Textrank: Bringing order into text,” in Proceedings of the 2004 conference on empirical methods in natural language processing, 2004.
- A. H. Miller, W. Feng, A. Fisch, J. Lu, D. Batra, A. Bordes, D. Parikh, and J. Weston, “Parlai: A dialog research software platform,” arXiv preprint arXiv:1705.06476, 2017.
- BookCorpus, https://github.com/soskek/bookcorpus/.
- A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” URL https://s3-us-west2.amazonaws.com/openai-assets/research-covers/languageunsupervised/language understanding paper.pdf, 2018.
- R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” arXiv preprint arXiv:1508.07909, 2015.
- A. Venkatesh, C. Khatri, A. Ram, F. Guo, R. Gabriel, A. Nagar, R. Prasad, M. Cheng, B. Hedayatnia, A. Metallinou, R. Goel, S. Yang, and A. Raju, “On evaluating and comparing open domain dialog systems,” 2018.
- A. See, S. Roller, D. Kiela, and J. Weston, “What makes a good conversation? how controllable attributes affect human judgments,” 2019.