Towards Conversational Recommendation over Multi-Type Dialogs

ACL, pp. 1036-1049, 2020.

Cited by: 0|Bibtex|Views96|Links
EI
Keywords:
recommendation dialogopen domain conversationhuman botfuture studymulti type dialogMore(6+)
Weibo:
We identify the task of conversational recommendation over multi-type dialogs, and create a dataset DuRecDial with multiple dialog types and multidomain use cases

Abstract:

We focus on the study of conversational recommendation in the context of multi-type dialogs, where the bots can proactively and naturally lead a conversation from a non-recommendation dialog (e.g., QA) to a recommendation dialog, taking into account user's interests and feedback. To facilitate the study of this task, we create a human-t...More
Introduction
Highlights
  • In recent years, there has been a significant increase in the work of conversational recommendation due to the rise of voice-based bots (Christakopoulou et al, 2016; Li et al, 2018; Reschke et al, 2013; Warnestal, 2005)
  • We present a novel task, conversational recommendation over multitype dialogs, where we want the bot to proactively and naturally lead a conversation from a non-recommendation dialog to a recommendation dialog
  • Each seeker has an explicit profile for the modeling of personalized recommendation, and multiple dialogs with the recommender to mimic real-world application scenarios. To address this task, inspired by the work of Xu et al (2020), we present a multi-goal driven conversation generation framework (MGCG) to handle multi-type dialogs simultaneously, such as QA/chitchat/recommendation/task etc
  • Multi-goal driven conversation generation framework R performs better in terms of Hits@k and DIST-2, but worse in terms of knowledge F1 when compared to multi-goal driven conversation generation framework G.7
  • We identify the task of conversational recommendation over multi-type dialogs, and create a dataset DuRecDial with multiple dialog types and multidomain use cases
  • In order to further analyze the relationship between knowledge usage and goal completion, we provide the number of failed goals, completed goals, and used knowledge for each method over different dialog types in Table 6
Results
  • 5.1 Experimental Setting

    The authors split DuRecDial into train/dev/test data by randomly sampling 65%/10%/25% data at the level of seekers, instead of individual dialogs.
  • As shown in Table 5, the two systems outperform S2S by a large margin, especially in terms of appropriateness, informativeness, goal success rate and coherence.
  • The retrieval-based model performs better in terms of fluency since its response is selected from the original human utterances, not automatically generated.
  • It performs worse on all the other metrics when compared to the generationbased model.
  • It can be seen that there is still much room for performance improvement in terms of appropriateness and goal success rate, which will be left as the future work
Conclusion
  • The authors identify the task of conversational recommendation over multi-type dialogs, and create a dataset DuRecDial with multiple dialog types and multidomain use cases.
  • The complexity in DuRecDial makes it a great testbed for more tasks such as knowledge grounded conversation (Ghazvininejad et al, 2018), domain transfer for dialog modeling, target-guided conversation (Tang et al, 2019a) and multi-type dialog modeling (Yu et al, 2017).
  • The study of these tasks will be left as the future work
Summary
  • Introduction:

    There has been a significant increase in the work of conversational recommendation due to the rise of voice-based bots (Christakopoulou et al, 2016; Li et al, 2018; Reschke et al, 2013; Warnestal, 2005)
  • They focus on how to provide high-quality recommendations through dialog-based interactions with users.
  • To the knowledge, there is less previous work on this problem
  • Results:

    5.1 Experimental Setting

    The authors split DuRecDial into train/dev/test data by randomly sampling 65%/10%/25% data at the level of seekers, instead of individual dialogs.
  • As shown in Table 5, the two systems outperform S2S by a large margin, especially in terms of appropriateness, informativeness, goal success rate and coherence.
  • The retrieval-based model performs better in terms of fluency since its response is selected from the original human utterances, not automatically generated.
  • It performs worse on all the other metrics when compared to the generationbased model.
  • It can be seen that there is still much room for performance improvement in terms of appropriateness and goal success rate, which will be left as the future work
  • Conclusion:

    The authors identify the task of conversational recommendation over multi-type dialogs, and create a dataset DuRecDial with multiple dialog types and multidomain use cases.
  • The complexity in DuRecDial makes it a great testbed for more tasks such as knowledge grounded conversation (Ghazvininejad et al, 2018), domain transfer for dialog modeling, target-guided conversation (Tang et al, 2019a) and multi-type dialog modeling (Yu et al, 2017).
  • The study of these tasks will be left as the future work
Tables
  • Table1: Comparison of our dataset DuRecDial to recommendation dialog datasets and knowledge grounded dialog datasets. “Rec.” stands for recommendation
  • Table2: One of our task templates that is used to guide the workers to annotate the dialog in Figure 1. We require that the recommendation target (the long-term goal) is consistent with the user’s interests and the topics mentioned by the user, and short-term goals provide natural topic transitions to approach the long-term goal
  • Table3: Statistics of knowledge graph and DuRecDial
  • Table4: Automatic evaluation results. +(-)gl. represents “with(without) conversational goals”. +(-)kg. represents “with(without) knowledge”. For “S2S +gl.+kg.”, we simply concatenate the goal predicted by our model, all the related knowledge and the dialog context as its input
  • Table5: Human evaluation results at the level of turns and dialogs
  • Table6: Analysis of goal completion and knowledge usage across different dialog types
  • Table7: Model parameter settings
Download tables as Excel
Related work
  • Datasets for Conversational Recommendation To facilitate the study of conversational recommendation, multiple datasets have been created in previous work, as shown in Table 1. The first recommendation dialog dataset is released by Dodge et al (2016), which is a synthetic dialog dataset built with the use of the classic MovieLens ratings dataset and natural language templates. Li et al (2018) creates a human-to-human multi-turn recommendation dialog dataset, which combines the elements of social chitchat and recommendation dialogs. Kang et al (2019) provides a recommendation dialogue dataset with clear goals, and Moon et al (2019) collects a parallel Dialog↔KG corpus for recommendation. Compared with them, our dataset contains multiple dialog types, multidomain use cases, and rich interaction variability.

    Datasets for Knowledge Grounded Conversation As shown in Table 1, CMU DoG (Zhou et al, 2018a) explores two scenarios for Wikipediaarticle grounded dialogs: only one participant has access to the document, or both have. IIT DoG (Moghe et al, 2018) is another dialog dataset for movie chats, wherein only one participant has access to background knowledge, such as IMDB’s facts/plots, or Reddit’s comments. Dinan et al (2019) creates a multi-domain multi-turn conversations grounded on Wikipedia articles. OpenDialKG (Moon et al, 2019) provides a chit-chat dataset between two agents, aimed at the modeling of dialog logic by walking over knowledge graph-Freebase. Wu et al (2019) provides a Chinese dialog datasetDuConv, where one participant can proactively lead the conversation with an explicit goal. KdConv (Zhou et al, 2020) is a Chinese dialog dataset, where each dialog contains in-depth discussions on multiple topics. In comparison with them, our dataset contains multiple dialog types, clear goals to achieve during each conversation, and user profiles for personalized conversation.
Funding
  • This work was supported by the National Key Research and Development Project of China (No 2018AAA0101900) and the Natural Science Foundation of China (No 61976072)
Reference
  • Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Inigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. 2018. MultiWOZ - a large-scale multi-domain wizard-of-Oz dataset for task-oriented dialogue modelling. In EMNLP.
    Google ScholarFindings
  • Qibin Chen, Junyang Lin, Yichang Zhang, Ming Ding, Yukuo Cen, Hongxia Yang, and Jie Tang. 2019. Towards knowledge-based recommender dialog system. In ACL.
    Google ScholarFindings
  • Konstantina Christakopoulou, Alex Beutel, Rui Li, Sagar Jain, and Ed H. Chi. 2018. Q and r: A twostage approach toward interactive recommendation. In KDD.
    Google ScholarLocate open access versionFindings
  • Konstantina Christakopoulou, Katja Hofmann, and Filip Radlinski. 2016. Towards conversational recommender systems. In KDD.
    Google ScholarFindings
  • Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In arXiv preprint arXiv:1412.3555.
    Findings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. In arXiv preprint arXiv:1810.04805.
    Findings
  • Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. 2019. Wizard of wikipedia: knowledge-powered conversational agents. In Proceedings of ICLR.
    Google ScholarLocate open access versionFindings
  • Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander H. Miller, and Arthur Szlam andJason Weston. 2016. Evaluating prerequisite qualities for learning end-to-end dialog systems. In ICLR.
    Google ScholarFindings
  • Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen tau Yih, and Michel Galley. 2018. A knowledge-grounded neural conversation model. In AAAI.
    Google ScholarFindings
  • Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul Crook, Y-Lan Boureau, and Jason Weston. 2019. Recommendation as a communication game: Self-supervised bot-play for goal-oriented dialogue. In EMNLP.
    Google ScholarFindings
  • Sunhwan Lee, Robert Moore, Guang-Jie Ren, Raphael Arar, and Shun Jiang. 2018. Making personalized recommendation through conversation: Architecture design and recommendation methods. In AAAI Workshops.
    Google ScholarFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In NAACL-HLT, pages 110–119.
    Google ScholarLocate open access versionFindings
  • Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, and Chris Pal. 2018. Towards deep conversational recommendations. In NIPS.
    Google ScholarFindings
  • Nikita Moghe, Siddhartha Arora, Suman Banerjee, and Mitesh M. Khapra. 2018. Towards exploiting background knowledge for building conversation systems. In EMNLP.
    Google ScholarFindings
  • Seungwhan Moon, Pararth Shah, Anuj Kumar, and Rajen Subba. 2019. Opendialkg: Explainable conversational reasoning with attention-based walks over knowledge graphs. In ACL.
    Google ScholarFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In ACL, pages 311– 318.
    Google ScholarLocate open access versionFindings
  • Ashwin Ram, Rohit Prasad, Chandra Khatri, Anu Venkatesh, Raefer Gabriel, Qing Liu, Jeff Nunn, Behnam Hedayatnia, Ming Cheng, Ashish Nagar, Eric King, Kate Bland, Amanda Wartick, Yi Pan, Han Song, Sk Jayadevan, Gene Hwang, and Art Pettigrue. 2018. Conversational AI: the science behind the alexa prize. In CoRR, abs/1801.03604.
    Findings
  • Kevin Reschke, Adam Vogel, and Daniel Jurafsky. 2013. Generating recommendation dialogs by extracting information from user reviews. In ACL.
    Google ScholarFindings
  • Yueming Sun and Yi Zhang. 2018. Conversational recommender system. In SIGIR.
    Google ScholarLocate open access versionFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In NIPS.
    Google ScholarFindings
  • Jianheng Tang, Tiancheng Zhao, Chengyan Xiong, Xiaodan Liang, Eric P Xing, and Zhiting Hu. 2019a. Target-guided open-domain conversation. In ACL.
    Google ScholarFindings
  • Jianheng Tang, Tiancheng Zhao, Chenyan Xiong, Xiaodan Liang, Eric P. Xing, and Zhiting Hu. 2019b. Target-guided open-domain conversation. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Zhuoran Wang, Hongliang Chen, Guanchun Wang, Hao Tian, Hua Wu, and Haifeng Wang. 2014. Policy learning for domain selection in an extensible multidomain spoken dialogue system. In EMNLP.
    Google ScholarFindings
  • Pontus Warnestal. 2005. Modeling a dialogue strategy for personalized movie recommendations. In The Beyond Personalization Workshop.
    Google ScholarLocate open access versionFindings
  • Wenquan Wu, Zhen Guo, Xiangyang Zhou, Hua Wu, Xiyuan Zhang, Rongzhong Lian, and Haifeng Wang. 2019. Proactive human-machine conversation with explicit conversation goal. In ACL.
    Google ScholarFindings
  • Jun Xu, Haifeng Wang, Zheng-Yu Niu, Hua Wu, and Wanxiang Che. 2020. Knowledge graph grounded goal planning for open-domain conversation generation. In AAAI.
    Google ScholarFindings
  • Lili Yao, Yaoyuan Zhang, Yansong Feng, Dongyan Zhao, and Rui Yan. 2017. Towards implicit contentintroducing for generative short-text conversation systems. In EMNLP.
    Google ScholarFindings
  • Zhou Yu, Alexander I. Rudnicky, and Alan W. Black. 2017. Learning conversational systems that interleave task and non-task content. In IJCAI.
    Google ScholarFindings
  • Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. 2018a. Personalizing dialogue agents: I have a dog, do you have pets too? In ACL.
    Google ScholarFindings
  • Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang, and W. Bruce Croft. 2018b. Towards conversational search and recommendation: System ask, user respond. In CIKM.
    Google ScholarFindings
  • Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. 2017. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In ACL.
    Google ScholarFindings
  • Hao Zhou, Chujie Zheng, Kaili Huang, Minlie Huang, and Xiaoyan Zhu. 2020. Kdconv: A chinese multi-domain dialogue dataset towards multi-turn knowledge-driven conversation. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Kangyan Zhou, Shrimai Prabhumoye, and Alan W Black. 2018a. A dataset for document grounded conversations. In Proceedings of EMNLP.
    Google ScholarLocate open access versionFindings
  • Li Zhou, Jianfeng Gao, Di Li, and Heung-Yeung Shum. 2018b. The design and implementation of xiaoice, an empathetic social chatbot. In CoRR, abs/1812.08989.
    Findings
Your rating :
0

 

Tags
Comments