Dual Focus Attention Network For Video Emotion Recognition

2020 IEEE International Conference on Multimedia and Expo (ICME)(2020)

引用 4|浏览24
暂无评分
摘要
Video emotion recognition is a challenging task due to complex scenes and various forms of emotion expression. Most existing works focus on fusing multiple features over the whole video clips. According to our observations, given a long video clip, the emotion is usually presented by only several actions/objects in a few short snippets, and the meaningful cues are buried in the noisy background. When human judging the emotion in videos, we first find the informative clips and then closely look for emotional cues in the frames. In this paper, we propose Dual Focus Attention Network to mimic this process. First, three kinds of features including action, object, and scene are extracted from videos. Second, Two attention modules are used to focus on the visual features of the videos from temporal and spatial dimensions respectively. With our dual focus attention network, we can effectively discover the most emotional frames along the time dimension and the most emotional visual cues in each frame. Our experiments conducted on two widely used datasets Ekman and VideoEmotion show that our proposed approach outperforms the existing approaches.
更多
查看译文
关键词
Video emotion recognition,attention for video,deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要