A Multimodal Lstm For Predicting Listener Empathic Responses Over Time

Zong Xuan Tan,Arushi Goel,Thanh-Son Nguyen,Desmond C. Ong

2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019)（2019）

引用 14|浏览38

暂无评分

摘要

People naturally understand the emotions of-and often also empathize with-those around them. In this paper, we predict the emotional valence of an empathic listener over time as they listen to a speaker narrating a life story. We use the dataset provided by the OMG-Empathy Prediction Challenge, a workshop held in conjunction with IEEE FG 2019. We present a multimodal LSTM model with feature-level fusion and local attention that predicts empathic responses from audio, text, and visual features. Our best-performing model, which used only the audio and text features, achieved a concordance correlation coefficient ( CCC) of .29 and .32 on the Validation set for the Generalized and Personalized track respectively, and achieved a CCC of .14 and .14 on the held-out Test set. We discuss the difficulties faced and the lessons learnt tackling this challenge.

查看译文

关键词

empathic listener,OMG-Empathy Prediction Challenge,IEEE FG 2019,multimodal LSTM model,feature-level fusion,local attention,audio text,visual features,text features,concordance correlation coefficient,CCC,emotional valence prediction,listener empathic response prediction

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要