CARE: Commonsense-Aware Emotional Response Generation with Latent Concepts

Cited by: 0|Views263
Weibo:
We further propose three methods to collaboratively incorporate the latent concepts into response generation

Abstract:

Rationality and emotion are two fundamental elements of humans. Endowing agents with rationality and emotion has been one of the major milestones in AI. However, in the field of conversational AI, most existing models only specialize in one aspect and neglect the other, which often leads to dull or unrelated responses. In this paper, we...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • Model EmoDS Ours (CARE)

    Message Gold Neutral

    Joy Sadness Neutral Joy Sadness

    Why do you avoid starbucks? the author doesn't like their coffee.
  • Joy Sadness Neutral Joy Sadness.
  • Why do you avoid starbucks?
  • The author doesn't like their coffee.
  • The author is not a gamer so it’s just that the author is a bit more tired of them.
  • That’s pretty cool.
  • The author likes that.
  • That sucks.
  • It’s just a general dislike of the company.
  • Tea is the author's favorite.
  • The author is not a big fan of coffee.
  • It ’s too bitter for me
Highlights
  • Model EmoDS Ours (CARE)

    Message Gold Neutral

    Joy Sadness Neutral Joy Sadness

    Why do you avoid starbucks? I don’t like their coffee
  • Concept-Aware Top-K Decoding During inference, we propose a concept-aware top-K decoding (CATD) method to encourage the generation of words that are more related to the associated latent concepts
  • Automatic evaluation metrics include 1) Fluency: perplexity (PPL), which measures the confidence of the generated responses; 2) Diversity: distinct-1 and distinct-2 (Li et al 2016a), which measure the percentage of unique unigrams and bigrams in the generated responses, respectively; 3) Emotion Accuracy (EA): the emotion accuracy of the generated responses measured by our trained emotion classifier; and 4) Commonsense Awareness (CA): the average number of commonsense triplets in one pair of message and generated response, measured by ConceptNet
  • We propose CARE as the first attempt to test the hypothesis that combing rationality and emotion into conversational agents can improve response quality and human ratings
  • We build an emotion-aware commonsense knowledge graph (EA-CKG) and leverage its TransE embeddings to allow CARE to reason over the EA-CKG and construct both relational and emotional latent concepts
  • We further propose three methods to collaboratively incorporate the latent concepts into response generation
Results
  • Evaluation Metrics

    The authors conduct both automatic and human evaluations. Automatic evaluation metrics include 1) Fluency: perplexity (PPL), which measures the confidence of the generated responses; 2) Diversity: distinct-1 (dist-1) and distinct-2 (dist2) (Li et al 2016a), which measure the percentage of unique unigrams and bigrams in the generated responses, respectively; 3) Emotion Accuracy (EA): the emotion accuracy of the generated responses measured by the trained emotion classifier; and 4) Commonsense Awareness (CA): the average number of commonsense triplets in one pair of message and generated response, measured by ConceptNet.

    8https://www.reddit.com/r/CasualConversation/ 9https://files.pushshift.io/reddit/comments/ 10https://github.com/Marsan-Ma/chat corpus/

    Following (Zhou et al 2018a), the authors conduct human evaluations to measure both content quality (rating scale in {0, 1, 2}) and emotion quality (rating scale in {0, 1}) of the generated responses.
  • The authors conduct both automatic and human evaluations.
  • Automatic evaluation metrics include 1) Fluency: perplexity (PPL), which measures the confidence of the generated responses; 2) Diversity: distinct-1 and distinct-2 (Li et al 2016a), which measure the percentage of unique unigrams and bigrams in the generated responses, respectively; 3) Emotion Accuracy (EA): the emotion accuracy of the generated responses measured by the trained emotion classifier; and 4) Commonsense Awareness (CA): the average number of commonsense triplets in one pair of message and generated response, measured by ConceptNet. Following (Zhou et al 2018a), the authors conduct human evaluations to measure both content quality and emotion quality of the generated responses.
Conclusion
  • The authors propose CARE as the first attempt to test the hypothesis that combing rationality and emotion into conversational agents can improve response quality and human ratings.
  • Extensive ablation studies show that the methods of constructing and incorporating latent concepts outperform alternative methods.
  • Both automatic and human evaluations show that CARE can produce more accurate and commonsense-aware emotional responses than state-of-theart commonsense-aware models and emotional models.
  • The authors plan to extend the work to other aspects of rationality, e.g., logical reasoning
Summary
  • Introduction:

    Model EmoDS Ours (CARE)

    Message Gold Neutral

    Joy Sadness Neutral Joy Sadness

    Why do you avoid starbucks? the author doesn't like their coffee.
  • Joy Sadness Neutral Joy Sadness.
  • Why do you avoid starbucks?
  • The author doesn't like their coffee.
  • The author is not a gamer so it’s just that the author is a bit more tired of them.
  • That’s pretty cool.
  • The author likes that.
  • That sucks.
  • It’s just a general dislike of the company.
  • Tea is the author's favorite.
  • The author is not a big fan of coffee.
  • It ’s too bitter for me
  • Results:

    Evaluation Metrics

    The authors conduct both automatic and human evaluations. Automatic evaluation metrics include 1) Fluency: perplexity (PPL), which measures the confidence of the generated responses; 2) Diversity: distinct-1 (dist-1) and distinct-2 (dist2) (Li et al 2016a), which measure the percentage of unique unigrams and bigrams in the generated responses, respectively; 3) Emotion Accuracy (EA): the emotion accuracy of the generated responses measured by the trained emotion classifier; and 4) Commonsense Awareness (CA): the average number of commonsense triplets in one pair of message and generated response, measured by ConceptNet.

    8https://www.reddit.com/r/CasualConversation/ 9https://files.pushshift.io/reddit/comments/ 10https://github.com/Marsan-Ma/chat corpus/

    Following (Zhou et al 2018a), the authors conduct human evaluations to measure both content quality (rating scale in {0, 1, 2}) and emotion quality (rating scale in {0, 1}) of the generated responses.
  • The authors conduct both automatic and human evaluations.
  • Automatic evaluation metrics include 1) Fluency: perplexity (PPL), which measures the confidence of the generated responses; 2) Diversity: distinct-1 and distinct-2 (Li et al 2016a), which measure the percentage of unique unigrams and bigrams in the generated responses, respectively; 3) Emotion Accuracy (EA): the emotion accuracy of the generated responses measured by the trained emotion classifier; and 4) Commonsense Awareness (CA): the average number of commonsense triplets in one pair of message and generated response, measured by ConceptNet. Following (Zhou et al 2018a), the authors conduct human evaluations to measure both content quality and emotion quality of the generated responses.
  • Conclusion:

    The authors propose CARE as the first attempt to test the hypothesis that combing rationality and emotion into conversational agents can improve response quality and human ratings.
  • Extensive ablation studies show that the methods of constructing and incorporating latent concepts outperform alternative methods.
  • Both automatic and human evaluations show that CARE can produce more accurate and commonsense-aware emotional responses than state-of-theart commonsense-aware models and emotional models.
  • The authors plan to extend the work to other aspects of rationality, e.g., logical reasoning
Tables
  • Table1: Sample responses from EmoDS (<a class="ref-link" id="cSong_et+al_2019_a" href="#rSong_et+al_2019_a">Song et al 2019</a>) and our model. EmoDS generates generic or unrelated emotional responses. Our model extracts the message concept “starbucks”, and generates more commonsense-aware emotional responses by referring to our constructed relational (in bold) and emotional (in italic) latent concepts, e.g., company, coffee and bitter
  • Table2: EA-CKG statistics. Reddit and Twitter are two conversation datasets used in our experiments
  • Table3: Dataset statistics
  • Table4: Automatic evaluation results. Size denotes model size. IT denotes inference time relative to Seq2Seq
  • Table5: Human evaluation results. Cont and Emot denote content quality and emotion quality, respectively. The inter-annotator agreement, measured by Fleiss’ Kappa (<a class="ref-link" id="cFleiss_1973_a" href="#rFleiss_1973_a">Fleiss and Cohen 1973</a>), are 0.441 and 0.626 for content and emotion on Reddit, respectively, and 0.479 and 0.673 for content and emotion on Twitter, respectively. Both datasets obtain “moderate agreement” and “substantial agreement” for content and emotion, respectively
  • Table6: Ablation study. -ET+EL: replace the tails of the extracted emotional triplets (ET) by randomly sampled corresponding emotional words from an emotion lexicon (EL) (<a class="ref-link" id="cMohammad_2013_a" href="#rMohammad_2013_a">Mohammad and Turney 2013</a>). -TransE: instead of using TransE, search neighbors with a growing neighborhood size (up to 3) on EA-CKG to find latent concepts. -EAGA: remove the emotion-aware graph attention. -DLS: remove the dynamic label smoothing. -DLS+LS: replace the dynamic label smoothing by conventional label smoothing (LS) of 0.1. -CATD: replace the concept-aware top-K decoding by the conventional top-K decoding
  • Table7: Case studies. Words in bold and italic denote relational and emotional latent concepts, respectively
Download tables as Excel
Related work
Funding
  • This research is supported, in part, by Alibaba Group through Alibaba Innovative Research (AIR) Program and Alibaba-NTU Singapore Joint Research Institute (JRI) (Alibaba-NTU-AIR2019B1), Nanyang Technological University, Singapore
  • This research is also supported, in part, by the National Research Foundation, Prime Minister’s Office, Singapore under its AI Singapore Programme (AISG Award No: AISG-GC-2019-003) and under its NRF Investigatorship Programme (NRFI Award No NRF-NRFI052019-0002)
  • This research is also supported, in part, by the Singapore Ministry of Health under its National Innovation Challenge on Active and Confident Ageing (NIC Project No MOH/NIC/COG04/2017 and MOH/NIC/HAIG03/2017)
Reference
  • Asghar, N.; Poupart, P.; Hoey, J.; Jiang, X.; and Mou, L. 2018. Affective Neural Response Generation. In ECIR, 154– 166.
    Google ScholarLocate open access versionFindings
  • Banerjee, P.; Pal, K. K.; Mitra, A.; and Baral, C. 2019. Careful Selection of Knowledge to Solve Open Book Question Answering. In ACL, 6120–6129.
    Google ScholarFindings
  • Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; and Yakhnenko, O. 201Translating embeddings for modeling multi-relational data. In NIPS, 2787–2795.
    Google ScholarFindings
  • Bosselut, A.; Rashkin, H.; Sap, M.; Malaviya, C.; Celikyilmaz, A.; and Choi, Y. 2019. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. In ACL, 4762–4779.
    Google ScholarFindings
  • Church, K. W.; and Hanks, P. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16(1): 22–29.
    Google ScholarLocate open access versionFindings
  • Colman, A. M. 2003.
    Google ScholarFindings
  • De Sousa, R. 1990. The rationality of emotion. MIT Press.
    Google ScholarFindings
  • Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL, 4171–4186.
    Google ScholarFindings
  • Ekman, P. 1992. An argument for basic emotions. Cognition & Emotion 6(3-4): 169–200.
    Google ScholarLocate open access versionFindings
  • Felbo, B.; Mislove, A.; Søgaard, A.; Rahwan, I.; and Lehmann, S. 2017. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In EMNLP, 1615–1625.
    Google ScholarLocate open access versionFindings
  • Fleiss, J. L.; and Cohen, J. 1973. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement 33(3): 613–619.
    Google ScholarLocate open access versionFindings
  • Ghazvininejad, M.; Brockett, C.; Chang, M.-W.; Dolan, B.; Gao, J.; Yih, W.-t.; and Galley, M. 2018. A knowledgegrounded neural conversation model. In AAAI, 5110–5117.
    Google ScholarFindings
  • Ghosh, S.; Chollet, M.; Laksana, E.; Morency, L.-P.; and Scherer, S. 2017. Affect-LM: A Neural Language Model for Customizable Affective Text Generation. In ACL, 634–642.
    Google ScholarLocate open access versionFindings
  • Han, S.; Bang, J.; Ryu, S.; and Lee, G. G. 2015. Exploiting knowledge base to generate responses for natural language dialog listening agents. In SIGDIAL, 129–133.
    Google ScholarLocate open access versionFindings
  • Hasegawa, T.; Kaji, N.; Yoshinaga, N.; and Toyoda, M. 2013. Predicting and eliciting addressee’s emotion in online dialogue. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 964–972.
    Google ScholarLocate open access versionFindings
  • Hu, Z.; Yang, Z.; Liang, X.; Salakhutdinov, R.; and Xing, E. P. 2017. Toward controlled generation of text. In ICML, 1587–1596.
    Google ScholarFindings
  • Keltner, D.; and Haidt, J. 1999. Social functions of emotions at four levels of analysis. Cognition & Emotion 13(5): 505– 521.
    Google ScholarLocate open access versionFindings
  • Keskar, N. S.; McCann, B.; Varshney, L. R.; Xiong, C.; and Socher, R. 2019. CTRL: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
    Findings
  • Kingma, D. P.; and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Kullback, S.; and Leibler, R. A. 1951. On information and sufficiency. The Annals of Mathematical Statistics 22(1): 79–86.
    Google ScholarLocate open access versionFindings
  • Levy, O.; Goldberg, Y.; and Dagan, I. 2015. Improving distributional similarity with lessons learned from word embeddings. TACL 3: 211–225.
    Google ScholarLocate open access versionFindings
  • Li, J.; Galley, M.; Brockett, C.; Gao, J.; and Dolan, B. 2016a. A Diversity-Promoting Objective Function for Neural Conversation Models. In NAACL, 110–119.
    Google ScholarLocate open access versionFindings
  • Li, J.; Galley, M.; Brockett, C.; Spithourakis, G.; Gao, J.; and Dolan, B. 2016b. A Persona-Based Neural Conversation Model. In ACL, 994–1003.
    Google ScholarLocate open access versionFindings
  • Li, J.; and Sun, X. 2018. A syntactically constrained bidirectional-asynchronous approach for emotional conversation generation. arXiv preprint arXiv:1806.07000.
    Findings
  • Li, P.; and Tuzhilin, A. 2019. Towards Controllable and Personalized Review Generation. In EMNLP-IJCNLP, 3228– 3236.
    Google ScholarLocate open access versionFindings
  • Li, X.; Taheri, A.; Tu, L.; and Gimpel, K. 2016c. Commonsense knowledge base completion. In ACL, 1445–1455.
    Google ScholarLocate open access versionFindings
  • Lin, Z.; Madotto, A.; Shin, J.; Xu, P.; and Fung, P. 2019. MoEL: Mixture of Empathetic Listeners. In EMNLPIJCNLP, 121–132.
    Google ScholarLocate open access versionFindings
  • Liu, S.; Chen, H.; Ren, Z.; Feng, Y.; Liu, Q.; and Yin, D. 2018. Knowledge diffusion for neural dialogue generation. In ACL, 1489–1498.
    Google ScholarFindings
  • Madotto, A.; Wu, C.-S.; and Fung, P. 2018. Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems. In ACL, 1468–1478.
    Google ScholarFindings
  • Mohammad, S. 2012. #Emotional Tweets. In SemEval, 246– 255.
    Google ScholarLocate open access versionFindings
  • Mohammad, S. 2018. Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 English words. In ACL, 174–184.
    Google ScholarLocate open access versionFindings
  • Mohammad, S.; Bravo-Marquez, F.; Salameh, M.; and Kiritchenko, S. 2018. SemEval-2018 Task 1: Affect in Tweets. In SemEval, 1–17.
    Google ScholarLocate open access versionFindings
  • Mohammad, S. M.; and Turney, P. D. 2013. Crowdsourcing a Word-Emotion Association Lexicon. Computational Intelligence 29(3): 436–465.
    Google ScholarLocate open access versionFindings
  • Moon, S.; Shah, P.; Kumar, A.; and Subba, R. 2019. OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs. In ACL, 845–854.
    Google ScholarLocate open access versionFindings
  • Peng, Y.; Fang, Y.; Xie, Z.; and Zhou, G. 2019. Topicenhanced emotional conversation generation with attention mechanism. Knowledge-Based Systems 163: 429–437.
    Google ScholarLocate open access versionFindings
  • Pennington, J.; Socher, R.; and Manning, C. 2014. Glove: Global vectors for word representation. In EMNLP, 1532– 1543.
    Google ScholarLocate open access versionFindings
  • Petroni, F.; Rocktaschel, T.; Riedel, S.; Lewis, P.; Bakhtin, A.; Wu, Y.; and Miller, A. 2019. Language Models as Knowledge Bases? In EMNLP-IJCNLP, 2463–2473.
    Google ScholarLocate open access versionFindings
  • Pham, M. T. 2007. Emotion and rationality: A critical review and interpretation of empirical evidence. Review of General Psychology 11(2): 155–178.
    Google ScholarLocate open access versionFindings
  • Prendinger, H.; and Ishizuka, M. 2005. The empathic companion: A character-based interface that addresses users’affective states. Applied Artificial Intelligence 19(34): 267–285.
    Google ScholarLocate open access versionFindings
  • Rashkin, H.; Smith, E. M.; Li, M.; and Boureau, Y.-L. 2019. Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset. In ACL, 5370–5381.
    Google ScholarFindings
  • Roller, S.; Dinan, E.; Goyal, N.; Ju, D.; Williamson, M.; Liu, Y.; Xu, J.; Ott, M.; Shuster, K.; Smith, E. M.; et al. 2020. Recipes for building an open-domain chatbot. arXiv preprint arXiv:2004.13637.
    Findings
  • Ross, J. J. 1978. Rationality and common sense. Philosophy 53(205): 374–381.
    Google ScholarLocate open access versionFindings
  • Saito, I.; Nishida, K.; Asano, H.; and Tomita, J. 2018. Commonsense knowledge base completion and generation. In CoNLL, 141–150.
    Google ScholarLocate open access versionFindings
  • Song, Z.; Zheng, X.; Liu, L.; Xu, M.; and Huang, X.-J. 2019. Generating Responses with a Specific Emotion in Dialog. In ACL, 3685–3695.
    Google ScholarLocate open access versionFindings
  • Speer, R.; Chin, J.; and Havasi, C. 2017. ConceptNet 5.5: an open multilingual graph of general knowledge. In AAAI, 4444–4451.
    Google ScholarFindings
  • Sun, H.; Dhingra, B.; Zaheer, M.; Mazaitis, K.; Salakhutdinov, R.; and Cohen, W. 2018. Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text. In EMNLP, 4231–4242.
    Google ScholarFindings
  • Sun, Z.; Deng, Z.-H.; Nie, J.-Y.; and Tang, J. 2019. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In ICLR.
    Google ScholarFindings
  • Tuan, Y.-L.; Chen, Y.-N.; and Lee, H.-y. 2019. DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs. In EMNLP-IJCNLP, 1855–1865.
    Google ScholarFindings
  • Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. In NIPS, 5998–6008.
    Google ScholarLocate open access versionFindings
  • Vinyals, O.; and Le, Q. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869.
    Findings
  • Wu, C.-S.; Socher, R.; and Xiong, C. 2019. Global-to-local Memory Pointer Networks for Task-Oriented Dialogue. In ICLR.
    Google ScholarFindings
  • Xing, C.; Wu, W.; Wu, Y.; Liu, J.; Huang, Y.; Zhou, M.; and Ma, W.-Y. 2017. Topic aware neural response generation. In AAAI, 3351–3357.
    Google ScholarFindings
  • Xu, C.; Wu, W.; Tao, C.; Hu, H.; Schuerman, M.; and Wang, Y. 2019. Neural Response Generation with Meta-words. In ACL, 5416–5426.
    Google ScholarFindings
  • Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; and Le, Q. V. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NeurIPS.
    Google ScholarFindings
  • Young, T.; Cambria, E.; Chaturvedi, I.; Zhou, H.; Biswas, S.; and Huang, M. 2018. Augmenting end-to-end dialogue systems with commonsense knowledge. In AAAI, 4970–4977.
    Google ScholarFindings
  • Zhang, H.; Liu, Z.; Xiong, C.; and Liu, Z. 2020. Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs. In ACL.
    Google ScholarLocate open access versionFindings
  • Zhong, P.; Wang, D.; and Miao, C. 2019a. An AffectRich Neural Conversational Model with Biased Attention and Weighted Cross-Entropy Loss. In AAAI, 7492–7500.
    Google ScholarFindings
  • Zhong, P.; Wang, D.; and Miao, C. 2019b. KnowledgeEnriched Transformer for Emotion Detection in Textual Conversations. In EMNLP-IJCNLP, 165–176.
    Google ScholarLocate open access versionFindings
  • Zhou, H.; Huang, M.; Zhang, T.; Zhu, X.; and Liu, B. 2018a. Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory. In AAAI, 730– 739.
    Google ScholarLocate open access versionFindings
  • Zhou, H.; Young, T.; Huang, M.; Zhao, H.; Xu, J.; and Zhu, X. 2018b. Commonsense Knowledge Aware Conversation Generation with Graph Attention. In IJCAI, 4623–4629.
    Google ScholarFindings
  • Zhou, L.; Gao, J.; Li, D.; and Shum, H.-Y. 2018c. The design and implementation of XiaoIce, an empathetic social chatbot. Computational Linguistics 0: 1–62.
    Google ScholarLocate open access versionFindings
  • Zhou, X.; and Wang, W. Y. 2018. MojiTalk: Generating Emotional Responses at Scale. In ACL, 1128–1137.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments