Towards Persona Based Empathetic Conversational Models

EMNLP 2020, pp. 6556-6566, 2020.

Cited by: 0|Views53
Weibo:
We present a new task and a large-scale multidomain dataset, Persona-based Empathetic Conversation, towards persona-based empathetic conversations

Abstract:

Empathetic conversational models have been shown to improve user satisfaction and task outcomes in numerous domains. In Psychology, persona has been shown to be highly correlated to personality, which in turn influences empathy. In addition, our empirical analysis also suggests that persona plays an important role in empathetic conversati...More
0
Full Text
Bibtex
Weibo
Introduction
  • Affective empathy, refers to the capacity to respond with an appropriate emotion to another’s mental states (Rogers et al, 2007).
  • In NLP, empathetic conversational models have been shown to improve user satisfaction and task outcomes in numerous domains (Klein, 1998; Liu and Picard, 2005; Wright and McCarthy, 2008; Fitzpatrick et al, 2017; Zhou et al, 2018a).
  • Neural network based conversational models (Vinyals and Le, 2015; Lowe et al., 0.0 carcinogntejneiamt lopuressseexdcitefudrpioreupsaarnedndeovyaesdtatteerdrifieadcnognrfydidisegnutstaendxiogurasteffauilathsfhualmepdrdoiusadafprpaiodinapthepodrpeehfeunl sive saldoneemlyjboayrfrualssaendgtuicilitpyastuinrpgristerudsntoisnsegtnatligmicental.
  • Rashkin et al (2019) presented a new dataset and benchmark towards empathetic conversations and found that both Transformer-based generative models (Vaswani et al, 2017) and BERTbased retrieval models (Devlin et al, 2019) relying on this dataset exhibit stronger empathy
Highlights
  • Empathy, affective empathy, refers to the capacity to respond with an appropriate emotion to another’s mental states (Rogers et al, 2007)
  • We investigate whether persona improves empathetic responding more when CoBERT is trained on empathetic conversations than non-empathetic ones
  • This result reveals an empirical link between persona and empathy in human conversations and may suggest that persona has a greater impact on empathetic conversations than non-empathetic ones
  • We observe that persona consistently improves performance on both validation sets for all ratios
  • We propose CoBERT, an effective and efficient model that obtains substantially better performance than competitive baselines on Persona-based Empathetic Conversation (PEC), including the state-of-the-art Poly-encoder and several BERT-adapted models
Methods
  • The authors evaluate models on PEC and its two subdomains, i.e., happy and offmychest.
  • Note that the BoW, HLSTM (Lowe et al, 2015) and Bi-encoder (Humeau et al, 2020) baselines share the same Tri-encoder architecture, where the final matching score is the dot product between the average of context and persona representations and the response representation.
  • BoW: The context, persona and response encoders compute the averaged word embedding.
Results
  • The posts in the happy and offmychest domains are mostly positive and negative, respectively.
  • Both domains are significantly more empathetic than the control group (p < 0.001, one-tailed t-test).
  • The persona improvement on empathetic responding consistently increases as more PEC training examples are used (3.77% when trained on all 150K CASUAL conversations versus 6.32% when trained on all 150K PEC conversations), showing that persona improves empathetic responding significantly more when CoBERT is trained on empathetic conversations than non-empathetic ones (p < 0.001, one-tailed t-test)
Conclusion
  • The authors investigate whether persona improves empathetic responding more when CoBERT is trained on empathetic conversations than non-empathetic ones.
  • The authors compare the persona improvement, i.e., R@1 − R@1, on the PEC validation set and the CASUAL validation set for different replacement ratios.The authors present a new task and a large-scale multidomain dataset, PEC, towards persona-based empathetic conversations.
  • The results reveal an empirical link between persona and empathy in human conversations and may suggest that persona has a greater impact on empathetic conversations than non-empathetic ones
Summary
  • Introduction:

    Affective empathy, refers to the capacity to respond with an appropriate emotion to another’s mental states (Rogers et al, 2007).
  • In NLP, empathetic conversational models have been shown to improve user satisfaction and task outcomes in numerous domains (Klein, 1998; Liu and Picard, 2005; Wright and McCarthy, 2008; Fitzpatrick et al, 2017; Zhou et al, 2018a).
  • Neural network based conversational models (Vinyals and Le, 2015; Lowe et al., 0.0 carcinogntejneiamt lopuressseexdcitefudrpioreupsaarnedndeovyaesdtatteerdrifieadcnognrfydidisegnutstaendxiogurasteffauilathsfhualmepdrdoiusadafprpaiodinapthepodrpeehfeunl sive saldoneemlyjboayrfrualssaendgtuicilitpyastuinrpgristerudsntoisnsegtnatligmicental.
  • Rashkin et al (2019) presented a new dataset and benchmark towards empathetic conversations and found that both Transformer-based generative models (Vaswani et al, 2017) and BERTbased retrieval models (Devlin et al, 2019) relying on this dataset exhibit stronger empathy
  • Methods:

    The authors evaluate models on PEC and its two subdomains, i.e., happy and offmychest.
  • Note that the BoW, HLSTM (Lowe et al, 2015) and Bi-encoder (Humeau et al, 2020) baselines share the same Tri-encoder architecture, where the final matching score is the dot product between the average of context and persona representations and the response representation.
  • BoW: The context, persona and response encoders compute the averaged word embedding.
  • Results:

    The posts in the happy and offmychest domains are mostly positive and negative, respectively.
  • Both domains are significantly more empathetic than the control group (p < 0.001, one-tailed t-test).
  • The persona improvement on empathetic responding consistently increases as more PEC training examples are used (3.77% when trained on all 150K CASUAL conversations versus 6.32% when trained on all 150K PEC conversations), showing that persona improves empathetic responding significantly more when CoBERT is trained on empathetic conversations than non-empathetic ones (p < 0.001, one-tailed t-test)
  • Conclusion:

    The authors investigate whether persona improves empathetic responding more when CoBERT is trained on empathetic conversations than non-empathetic ones.
  • The authors compare the persona improvement, i.e., R@1 − R@1, on the PEC validation set and the CASUAL validation set for different replacement ratios.The authors present a new task and a large-scale multidomain dataset, PEC, towards persona-based empathetic conversations.
  • The results reveal an empirical link between persona and empathy in human conversations and may suggest that persona has a greater impact on empathetic conversations than non-empathetic ones
Tables
  • Table1: Statistics of PEC. #Avg.PS and #Std.PS denote average and standard deviation of the number of persona sentences per speaker, respectively. #Avg.U denotes the average utterance length. #Avg.P denotes the average persona sentence length
  • Table2: Sentiment and empathy of PEC and the control group based on human ratings. Sentiment ranges from -1 (negative) to 1 (positive). Empathy ranges from 0 (non-empathetic) to 1 (empathetic). Ratings are aggregated by majority voting (averaging shows similar results). The inter-annotator agreement, measured by Fleiss’ kappa (<a class="ref-link" id="cFleiss_1971_a" href="#rFleiss_1971_a">Fleiss, 1971</a>), for sentiment and empathy are 0.725 and 0.617, respectively. Both agreement statistics indicate “substantial agreement”
  • Table3: Two example conversations with personas from PEC. The persona sentences correspond to the last speakers in the conversations
  • Table4: Comparisons between PEC and related datasets. ED denotes EMPATHETICDIALOGUES (<a class="ref-link" id="cRashkin_et+al_2019_a" href="#rRashkin_et+al_2019_a">Rashkin et al, 2019</a>). PC denotes PERSONA-CHAT (<a class="ref-link" id="cZhang_et+al_2018_a" href="#rZhang_et+al_2018_a">Zhang et al, 2018a</a>). PCR denotes the persona-based conversations from Reddit (<a class="ref-link" id="cMazare_et+al_2018_a" href="#rMazare_et+al_2018_a">Mazare et al, 2018</a>). CS denotes crowd-sourced. The size denotes the number of expanded conversations
  • Table5: Test performance (in %) of CoBERT and all baselines. Values in bold denote best results
  • Table6: Transfer test of CoBERT in R@1 (in %)
  • Table7: Validation performance (in %), inference time (InfTime) and memory usage (RAM) for baselines, BERT-adapted models and ablation studies on PEC. InfTime and RAM are relative to the Bi-encoder
  • Table8: Validation R@1 (in %), inference time (InfTime) and memory usage (RAM) on PEC against different number of persona sentences nP
  • Table9: Test R@1 (in %) on PEC against examples with seen or unseen personas. nP denotes the number of persona sentences
  • Table10: Case study
Download tables as Excel
Related work
  • Empathetic Conversational Models Despite the growing number of studies in neural conversational models, less attention has been paid to make conversations empathetic until recently (Siddique et al, 2017; Morris et al, 2018; Shi and Yu, 2018; Lin et al, 2019b; Shin et al, 2019; Rashkin et al, 2019; Li et al, 2019; Lin et al, 2019a; Zandie and Mahoor, 2020), possibly due to the lack of empathetic conversation datasets. Rashkin et al (2019) proposed EMPATHETICDIALOGUES (ED), the first empathetic conversation dataset comprising 25K conversations in 32 emotions. Conversational models trained on the role of the listener in the dataset exhibited stronger empathy than models trained on non-empathetic datasets. We compare ED and PEC in the last paragraph of Section 3. Persona-Based Conversational Models In recent years, personalized conversational models are emerging (Li et al, 2016; Zhang et al, 2018a; Wolf et al, 2019; Chan et al, 2019; Madotto et al, 2019; Zheng et al, 2019). Li et al (2016) proposed persona embeddings in a response generation model and achieved improved generation quality and persona consistency. Zhang et al (2018a) proposed PERSONA-CHAT (PC), a crowd-sourced conversation dataset with persona information, to improve model engagingness and consistency. Mazare et al (2018) further presented a much larger personabased conversation dataset collected from Reddit (PCR) and showed that persona consistently improves model performance even when a large number of conversations is available for training. We compare PC, PCR, and PEC in the last paragraph of Section 3. Recently, Gu et al (2019) proposed DIM, a personalized response selection model with interactive matching and hierarchical aggregation, and achieved state-of-the-art performance on PC. Retrieval-based Conversational Models Recent neural retrieval-based conversational models gener-
Funding
  • This research is supported, in part, by Alibaba Group through Alibaba Innovative Research (AIR) Program and Alibaba-NTU Singapore Joint Research Institute (JRI) (Alibaba-NTU-AIR2019B1), Nanyang Technological University, Singapore
  • This research is also supported, in part, by the National Research Foundation, Prime Minister’s Office, Singapore under its AI Singapore Programme (AISG Award No: AISG-GC-2019-003) and under its NRF Investigatorship Programme (NRFI Award No NRF-NRFI05-2019-0002)
  • This research is also supported, in part, by the Singapore Ministry of Health under its National Innovation Challenge on Active and Confident Ageing (NIC Project No MOH/NIC/COG04/2017 and MOH/NIC/HAIG03/2017)
Reference
  • Scott Brave, Clifford Nass, and Kevin Hutchinson. 2005. Computers that care: investigating the effects of orientation of emotion exhibited by an embodied computer agent. International Journal of HumanComputer Studies, 62(2):161–178.
    Google ScholarLocate open access versionFindings
  • Zhangming Chan, Juntao Li, Xiaopeng Yang, Xiuying Chen, Wenpeng Hu, Dongyan Zhao, and Rui Yan. 2019. Modeling personalization in continuous space for response generation via augmented wasserstein autoencoders. In EMNLP-IJCNLP, pages 1931– 1940.
    Google ScholarLocate open access versionFindings
  • Qian Chen and Wen Wang. 2019. Sequential attentionbased network for noetic end-to-end response selection. arXiv preprint arXiv:1901.02609.
    Findings
  • Patricio Costa, Raquel Alves, Isabel Neto, Pedro Marvao, Miguel Portela, and Manuel Joao Costa. 201Associations between medical student empathy and personality: a multi-institutional study. PloS one, 9(3).
    Google ScholarLocate open access versionFindings
  • Yiming Cui, Zhipeng Chen, Si Wei, Shijin Wang, Ting Liu, and Guoping Hu. 2017. Attention-overattention neural networks for reading comprehension. In ACL, pages 593–602.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Jiazhan Feng, Chongyang Tao, Wei Wu, Yansong Feng, Dongyan Zhao, and Rui Yan. 2019. Learning a matching model with co-teaching for multi-turn response selection in retrieval-based dialogue systems. In ACL, pages 3805–3815.
    Google ScholarLocate open access versionFindings
  • Kathleen Kara Fitzpatrick, Alison Darcy, and Molly Vierhile. 2017. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (woebot): a randomized controlled trial. JMIR Mental Health, 4(2):e19.
    Google ScholarLocate open access versionFindings
  • Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378.
    Google ScholarLocate open access versionFindings
  • Jia-Chen Gu, Zhen-Hua Ling, Xiaodan Zhu, and Quan Liu. 2019. Dually interactive matching network for personalized response selection in retrieval-based chatbots. In EMNLP-IJCNLP, pages 1845–1854.
    Google ScholarLocate open access versionFindings
  • Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2020. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In ICLR.
    Google ScholarFindings
  • Carl Jung. 2016. Psychological types. Taylor & Francis.
    Google ScholarFindings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Jonathan Tarter Klein. 1998. Computer response to user frustration. Ph.D. thesis, Massachusetts Institute of Technology.
    Google ScholarFindings
  • Mark R Leary and Ashley Batts Allen. 2011. Personality and persona: Personality processes in selfpresentation. Journal of Personality, 79(6):1191– 1218.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 20A persona-based neural conversation model. In ACL, pages 994–1003.
    Google ScholarLocate open access versionFindings
  • Qintong Li, Hongshen Chen, Zhaochun Ren, Zhumin Chen, Zhaopeng Tu, and Jun Ma. 2019. EmpGAN: Multi-resolution interactive empathetic dialogue generation. arXiv preprint arXiv:1911.08698.
    Findings
  • Zhaojiang Lin, Andrea Madotto, Jamin Shin, Peng Xu, and Pascale Fung. 2019a. Moel: Mixture of empathetic listeners. In EMNLP-IJCNLP, pages 121– 132.
    Google ScholarLocate open access versionFindings
  • Zhaojiang Lin, Peng Xu, Genta Indra Winata, Zihan Liu, and Pascale Fung. 2019b. Caire: An end-to-end empathetic chatbot. arXiv preprint arXiv:1907.12108.
    Findings
  • K Liu and Rosalind W Picard. 2005. Embedded empathy in continuous, interactive health assessment. In CHI Workshop on HCI Challenges in Health Assessment, volume 1, page 3.
    Google ScholarLocate open access versionFindings
  • Ryan Lowe, Nissan Pow, Iulian Vlad Serban, and Joelle Pineau. 2015. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In SIGDIAL, pages 285–294.
    Google ScholarLocate open access versionFindings
  • Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In NIPS, pages 289–297.
    Google ScholarLocate open access versionFindings
  • Andrea Madotto, Zhaojiang Lin, Chien-Sheng Wu, and Pascale Fung. 2019. Personalizing dialogue agents via meta-learning. In ACL, pages 5454–5459.
    Google ScholarLocate open access versionFindings
  • Pierre-Emmanuel Mazare, Samuel Humeau, Martin Raison, and Antoine Bordes. 2018. Training millions of personalized dialogue agents. In EMNLP, pages 2775–2779.
    Google ScholarLocate open access versionFindings
  • Robert R Morris, Kareem Kouddous, Rohan Kshirsagar, and Stephen M Schueller. 2018. Towards an artificially empathic conversational agent for mental health applications: system design and user perceptions. Journal of Medical Internet Research, 20(6):e10148.
    Google ScholarLocate open access versionFindings
  • Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In NIPS, pages 8024–8035.
    Google ScholarLocate open access versionFindings
  • Hannah Rashkin, Eric Michael Smith, Margaret Li, and Y-Lan Boureau. 2019. Towards empathetic opendomain conversation models: A new benchmark and dataset. In ACL, pages 5370–5381.
    Google ScholarLocate open access versionFindings
  • Nadine R Richendoller and James B Weaver III. 1994. Exploring the links between personality and empathic response style. Personality and Individual Differences, 17(3):303–311.
    Google ScholarLocate open access versionFindings
  • Kimberley Rogers, Isabel Dziobek, Jason Hassenstab, Oliver T Wolf, and Antonio Convit. 2007. Who cares? revisiting empathy in asperger syndrome. Journal of Autism and Developmental Disorders, 37(4):709–715.
    Google ScholarLocate open access versionFindings
  • Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M Smith, et al. 2020. Recipes for building an open-domain chatbot. arXiv preprint arXiv:2004.13637.
    Findings
  • Weiyan Shi and Zhou Yu. 2018. Sentiment adaptive end-to-end dialog systems. In ACL, pages 1509– 1519.
    Google ScholarLocate open access versionFindings
  • Jamin Shin, Peng Xu, Andrea Madotto, and Pascale Fung. 2019. Happybot: Generating empathetic dialogue responses by improving user experience lookahead. arXiv preprint arXiv:1906.08487.
    Findings
  • Farhad Bin Siddique, Onno Kampman, Yang Yang, Anik Dey, and Pascale Fung. 2017. Zara returns: Improved personality induction and adaptation by an empathetic virtual agent. In ACL, pages 121–126.
    Google ScholarLocate open access versionFindings
  • Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. 2015. End-to-end memory networks. In NIPS, pages 2440–2448.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869.
    Findings
  • Thomas Wolf, Victor Sanh, Julien Chaumond, and Clement Delangue. 2019. Transfertransfo: A transfer learning approach for neural network based conversational agents. arXiv preprint arXiv:1901.08149.
    Findings
  • Peter Wright and John McCarthy. 2008. Empathy and experience in hci. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 637–646.
    Google ScholarLocate open access versionFindings
  • Yu Wu, Wei Wu, Chen Xing, Ming Zhou, and Zhoujun Li. 2017. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In ACL, pages 496–505.
    Google ScholarLocate open access versionFindings
  • Chunyuan Yuan, Wei Zhou, Mingming Li, Shangwen Lv, Fuqing Zhu, Jizhong Han, and Songlin Hu. 2019. Multi-hop selector network for multi-turn response selection in retrieval-based chatbots. In EMNLPIJCNLP, pages 111–120.
    Google ScholarLocate open access versionFindings
  • Rohola Zandie and Mohammad H Mahoor. 2020. Emptransfo: A multi-head transformer architecture for creating empathetic dialog systems. arXiv preprint arXiv:2003.02958.
    Findings
  • Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. 2018a. Personalizing dialogue agents: I have a dog, do you have pets too? In ACL, pages 2204–2213.
    Google ScholarLocate open access versionFindings
  • Zhuosheng Zhang, Jiangtong Li, Pengfei Zhu, Hai Zhao, and Gongshen Liu. 2018b. Modeling multiturn conversation with deep utterance aggregation. In COLING, pages 3740–3752.
    Google ScholarLocate open access versionFindings
  • Yinhe Zheng, Rongsheng Zhang, Xiaoxi Mao, and Minlie Huang. 2019. A pre-training based personalized dialogue generation model with persona-sparse data. arXiv preprint arXiv:1911.04700.
    Findings
  • Li Zhou, Jianfeng Gao, Di Li, and Heung-Yeung Shum. 2018a. The design and implementation of xiaoice, an empathetic social chatbot. Computational Linguistics, 0:1–62.
    Google ScholarLocate open access versionFindings
  • Xiangyang Zhou, Daxiang Dong, Hua Wu, Shiqi Zhao, Dianhai Yu, Hao Tian, Xuan Liu, and Rui Yan. 2016. Multi-view response selection for human-computer conversation. In EMNLP, pages 372–381.
    Google ScholarLocate open access versionFindings
  • Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, Ying Chen, Wayne Xin Zhao, Dianhai Yu, and Hua Wu. 2018b. Multi-turn response selection for chatbots with deep attention matching network. In ACL, pages 1118–1127.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments