Learning Visual Emotion Representations From Web Data

CVPR(2020)

引用 43|浏览314
暂无评分
摘要
We present a scalable approach for learning powerful visual features for emotion recognition. A critical bottleneck in emotion recognition is the lack of large scale datasets that can be used for learning visual emotion features. To this end, we curate a webly derived large scale dataset, StockEmotion, which has more than a million images. StockEmotion uses 690 emotion related tags as labels giving us a fine-grained and diverse set of emotion labels, circumventing the difficulty in manually obtaining emotion annotations. We use this dataset to train a feature extraction network, EmotionNet, which we further regularize using joint text and visual embedding and text distillation. Our experimental results establish that EmotionNet trained on the StockEmotion dataset outperforms SOTA models on four different visual emotion tasks. An aded benefit of our joint embedding training approach is that EmotionNet achieves competitive zero-shot recognition performance against fully supervised baselines on a challenging visual emotion dataset, EMOTIC, which further highlights the generalizability of the learned emotion features.
更多
查看译文
关键词
visual emotion representations,web data,scalable approach,powerful visual features,emotion recognition,critical bottleneck,visual emotion features,learned emotion features,EMOTIC,challenging visual emotion dataset,recognition performance,joint embedding training approach,different visual emotion tasks,StockEmotion dataset,text distillation,visual embedding,EmotionNet,feature extraction network,emotion annotations,emotion labels,fine-grained set,690 emotion related tags,webly derived large scale dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要