EmoElicitor: An Open Domain Response Generation Model with User Emotional Reaction Awareness

IJCAI 2020, pp. 3637-3643, 2020.

Cited by: 0|Views27
EI
Weibo:
We propose a novel variational model named EmoElicitor to generate appropriate responses that can elicit user's specific emotion

Abstract:

Generating emotional responses is crucial for building human-like dialogue systems. However, existing studies have focused only on generating responses by controlling the agents' emotions, while the feelings of the users, which are the ultimate concern of a dialogue system, have been neglected. In this paper, we propose a novel variation...More

Code:

Data:

0
Introduction
  • Emotional interaction is a key factor in interpersonal communication and has become a crucial concern in building humanlike dialogue agents [Picard, 1997].
  • The previous methods have achieved promising results, these models have attempted only to control the emotion of the agent’s response.
  • The feelings of the user during the interaction, which are the ultimate concern when designing a dialogue agent, are neglected.
Highlights
  • Emotional interaction is a key factor in interpersonal communication and has become a crucial concern in building humanlike dialogue agents [Picard, 1997]
  • Emotional response generation (ERG) is an emotional interaction task for generating appropriate conversational responses conditioned on a given emotion
  • We focus on the task of open domain response generation for emotion elicitation (RGEE), in which, given the backward context and a desired emotional reaction, the objective is to elicit a topic-coherent response that can elicit the specific desired emotion
  • Our three main contributions can be summarized as follows: (i) We formulate the response generation for emotion elicitation problem as the problem of generating a response that can elicit a specific emotion from a user conditioned on the backward context and next-round utterance. We propose a variational model EmoElicitor which, to best of our knowledge, is the first to leverage sequential latent variables to capture context information and guide response generation with the help of pre-trained language model. We construct a large-scale dataset for the response generation for emotion elicitation task1
  • T-conditional VAE always generates the same words at the beginning of each sentence (e.g., They are)
  • We investigate emotional reactions, including next-round utterances and the corresponding elicited emotion labels, in dyadic dialogue generation and propose a novel model called EmoElicitor to generate responses with emotional reaction awareness
Results
  • Acc10 is the prediction accuracy for the emotion e that is elicited based on the generated response Ye and context C.
  • Pro@5 is the sum of the probabilities of the top 5 emotion labels for every context C
  • The authors built another classifier to predict the emotional reaction e based on a given context C.
  • The authors' model generates diverse responses for different emoji, demonstrating that latent variable zt at time step t is able to capture dependency between words and emotional reactions
Conclusion
  • The authors investigate emotional reactions, including next-round utterances and the corresponding elicited emotion labels, in dyadic dialogue generation and propose a novel model called EmoElicitor to generate responses with emotional reaction awareness.
  • By incorporating a latent variable zt in every time step, the authors can capture the dependency between words and emotional reactions, allowing the model to generate coherent, diverse responses with the intent of eliciting different emotional reactions
Summary
  • Introduction:

    Emotional interaction is a key factor in interpersonal communication and has become a crucial concern in building humanlike dialogue agents [Picard, 1997].
  • The previous methods have achieved promising results, these models have attempted only to control the emotion of the agent’s response.
  • The feelings of the user during the interaction, which are the ultimate concern when designing a dialogue agent, are neglected.
  • Results:

    Acc10 is the prediction accuracy for the emotion e that is elicited based on the generated response Ye and context C.
  • Pro@5 is the sum of the probabilities of the top 5 emotion labels for every context C
  • The authors built another classifier to predict the emotional reaction e based on a given context C.
  • The authors' model generates diverse responses for different emoji, demonstrating that latent variable zt at time step t is able to capture dependency between words and emotional reactions
  • Conclusion:

    The authors investigate emotional reactions, including next-round utterances and the corresponding elicited emotion labels, in dyadic dialogue generation and propose a novel model called EmoElicitor to generate responses with emotional reaction awareness.
  • By incorporating a latent variable zt in every time step, the authors can capture the dependency between words and emotional reactions, allowing the model to generate coherent, diverse responses with the intent of eliciting different emotional reactions
Tables
  • Table1: The automatic evaluation results for the generated response. The symbol * means the method is without pre-trained model
  • Table2: Automatic evaluation results for emotion elicitation
  • Table3: Manual evaluation based on Twitter word embeddings [<a class="ref-link" id="cGodin_2019_a" href="#rGodin_2019_a">Godin, 2019</a>]. Ue-avg and Ue-Gre calculate the semantic similarity between the generated response and the ground truth reaction utterance Ue based on average and greedy matching, respectively
  • Table4: Some examples of generated responses. The emojis represent the emotions that are to be elicited from user b based on the responses
Download tables as Excel
Related work
  • Emotion-aware dialogue systems have become an emerging area of research in recent years. Zhou et al [2018] first incorporated emotional factors into a dialogue generation task using an end-to-end neural learning framework. Zhong et al [2019] considered the VAD affect model and the effects of negators and intensifiers via an attention mechanism in conversation modeling. Zhou and Wang [2018] proposed a reinforced CVAE-based model called Mojitalk, which could generate responses based on emojis. Rashkin et al [2019] focused on empathethic dialogue generation, in which each conversation contained only one emotion label.

    Most previous studies, however, have considered only the emotion of the agent’s response while neglecting the user’s emotional reaction. Lubis et al [2018; 2019] generated responses that could elicit positive emotions. Hasegawa et al [2013] leveraged a statistical machine translation model to generate responses that could elicit predicted emotions from users. In contrast to the above two methods, our model not only utilizes finer-grained emoji labels but also considers the user’s next-round utterance to generate more topic-coherent and emotionally consistent responses.
Funding
  • The work was supported by the National Key R&D Program of China under grant 2018YFB1004700, National Natural Science Foundation of China (61872074, 61772122)
Reference
  • [Chung et al., 2015] Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C. Courville, and Yoshua Bengio. A recurrent latent variable model for sequential data. In ACL, pages 2980–2988, 2015.
    Google ScholarLocate open access versionFindings
  • [Fleiss, 1971] Joseph L Fleiss. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378, 1971.
    Google ScholarLocate open access versionFindings
  • [Glorot and Bengio, 2010] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, pages 249–256, 2010.
    Google ScholarLocate open access versionFindings
  • [Godin, 2019] Frederic Godin. Improving and Interpreting Neural Networks for Word-Level Prediction Tasks in Natural Language Processing. PhD thesis, Ghent University, Belgium, 2019.
    Google ScholarFindings
  • [Hasegawa et al., 2013] Takayuki Hasegawa, Nobuhiro Kaji, Naoki Yoshinaga, and Masashi Toyoda. Predicting and eliciting addressee’s emotion in online dialogue. In ACL Volume 1: Long Papers, pages 964–972, 2013.
    Google ScholarLocate open access versionFindings
  • [Kingma and Welling, 2014] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • [Lample and Conneau, 2019] Guillaume Lample and Alexis Conneau. Cross-lingual language model pretraining. CoRR, abs/1901.07291, 2019.
    Findings
  • [Li et al., 2016] Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. A diversity-promoting objective function for neural conversation models. In NAACL, pages 110–119, 2016.
    Google ScholarLocate open access versionFindings
  • [Liu et al., 2016] Chia-Wei Liu, Ryan Lowe, Iulian Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In EMNLP, pages 2122–2132, 2016.
    Google ScholarLocate open access versionFindings
  • [Lubis et al., 2018] Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, and Satoshi Nakamura. Eliciting positive emotion through affect-sensitive dialogue response generation: A neural network approach. In AAAI-18, pages 5293– 5300, 2018.
    Google ScholarLocate open access versionFindings
  • [Lubis et al., 2019] Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, and Satoshi Nakamura. Positive emotion elicitation in chat-based dialogue systems. IEEE/ACM Trans. Audio, Speech & Language Processing, 27(4):866–877, 2019.
    Google ScholarLocate open access versionFindings
  • [Papineni et al., 2002] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, pages 311–318, 2002.
    Google ScholarLocate open access versionFindings
  • [Partala and Surakka, 2004] Timo Partala and Veikko Surakka. The effects of affective interventions in human-computer interaction. Interacting with Computers, 16(2):295–309, 2004.
    Google ScholarLocate open access versionFindings
  • [Picard and Klein, 2001] Rosalind W. Picard and Jonathan Klein. Computers that recognise and respond to user emotion: theoretical and practical implications. Interacting with Computers, 14(2):141–169, 2001.
    Google ScholarLocate open access versionFindings
  • [Picard, 1997] RW Picard. Affective computing. 1997.
    Google ScholarFindings
  • [Prendinger and Ishizuka, 2005] Helmut Prendinger and Mitsuru Ishizuka. The empathic companion: A characterbased interface that addresses users’ affective states. Applied Artificial Intelligence, 19(3-4):267–285, 2005.
    Google ScholarLocate open access versionFindings
  • [Rashkin et al., 2019] Hannah Rashkin, Eric Michael Smith, Margaret Li, and Y-Lan Boureau. Towards empathetic open-domain conversation models: A new benchmark and dataset. In ACL Volume 1: Long Papers, pages 5370–5381, 2019.
    Google ScholarLocate open access versionFindings
  • [Serban et al., 2017] Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron C. Courville, and Yoshua Bengio. A hierarchical latent variable encoder-decoder model for generating dialogues. In AAAI, pages 3295–3301, 2017.
    Google ScholarLocate open access versionFindings
  • [Skowron, 2009] Marcin Skowron. Affect listeners: Acquisition of affective states by means of conversational systems. In Development of Multimodal Interfaces: Active Listening and Synchrony, Second COST 2102 International Training School, Dublin, Ireland, March 23-27, 2009, Revised Selected Papers, pages 169–181, 2009.
    Google ScholarLocate open access versionFindings
  • [Song et al., 2019] Zhenqiao Song, Xiaoqing Zheng, Lu Liu, Mu Xu, and Xuanjing Huang. Generating responses with a specific emotion in dialog. In ACL, pages 3685–3695, 2019.
    Google ScholarLocate open access versionFindings
  • [Sutskever et al., 2014] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In NIPS, pages 3104–3112, 2014.
    Google ScholarLocate open access versionFindings
  • [Wang and Wan, 2019] Tianming Wang and Xiaojun Wan. T-CVAE: transformer-based conditioned variational autoencoder for story completion. In IJCAI, pages 5233– 5239, 2019.
    Google ScholarLocate open access versionFindings
  • [Yang et al., 2019] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. Xlnet: Generalized autoregressive pretraining for language understanding. CoRR, abs/1906.08237, 2019.
    Findings
  • [Zhao et al., 2017] Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In ACL Volume 1: Long Papers, pages 654–664, 2017.
    Google ScholarLocate open access versionFindings
  • [Zhong et al., 2019] Peixiang Zhong, Di Wang, and Chunyan Miao. An affect-rich neural conversational model with biased attention and weighted cross-entropy loss. In AAAI, pages 7492–7500, 2019.
    Google ScholarLocate open access versionFindings
  • [Zhou and Wang, 2018] Xianda Zhou and William Yang Wang. Mojitalk: Generating emotional responses at scale. In ACL Volume 1: Long Papers, pages 1128–1137, 2018.
    Google ScholarLocate open access versionFindings
  • [Zhou et al., 2018] Hao Zhou, Minlie Huang, Tianyang Zhang, Xiaoyan Zhu, and Bing Liu. Emotional chatting machine: Emotional conversation generation with internal and external memory. In AAAI, pages 730–739, 2018.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments