EmoSen: Generating Sentiment and Emotion Controlled Responses in a Multimodal Dialogue System

IEEE Transactions on Affective Computing(2022)

引用 23|浏览71
暂无评分
摘要
An essential skill for effective communication is the ability to express specific sentiment and emotion in a conversation. Any robust dialogue system should handle the combined effect of both sentiment and emotion while generating responses. This is expected to provide a better experience and concurrently increase users’ satisfaction. Previously, research on either emotion or sentiment controlled dialogue generation has shown great promise in developing the next generation conversational agents, but the simultaneous effect of both is still unexplored. The existing dialogue systems are majorly based on unimodal sources, predominantly the text, and thereby cannot utilize the information present in the other sources, such as video, audio, image, etc. In this article, we present at first a large scale benchmark Sentiment Emotion aware Multimodal Dialogue (SEMD) dataset for the task of sentiment and emotion controlled dialogue generation. The SEMD dataset consists of 55k conversations from 10 TV shows having text, audio, and video information. To utilize multimodal information, we propose multimodal attention based conditional variational autoencoder (M-CVAE) that outperforms several baselines. Quantitative and qualitative analyses show that multimodality, along with contextual information, plays an essential role in generating coherent and diverse responses for any given emotion and sentiment.
更多
查看译文
关键词
Conversational AI,natural language generation,sentiment-aware NLG,emotion-aware NLG,multimodality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要