FERNIE-ViL: Facial Expression Enhanced Vision-and-Language Model

Soo-Ryeon Lee,Dohyun Kim,Mingyu Lee,SangKeun Lee

2021 IEEE 20th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)（2021）

引用 0|浏览6

暂无评分

摘要

Visual cognition requires analyzing actions, intentions, and emotions of persons in a given image. Visual Commonsense Reasoning (VCR) is a task that selects rationales and answers to questions for given images. In VCR, facial expressions are important nonverbal signals because they convey emotions and intentions in human interactions. However, ERNIE-ViL and UNITER, which are vision-and-language models to get image and text representations, do not learn them. We find that ERNIE-ViL and UNITER are vulnerable to the problem of identifying emotions. In this paper, therefore, we propose facial expression recognition FERNIE-ViL, which adapts a facial expression recognition module to the existing vision-and-language model. Experimental results (2.4% point improvement on VCR Q→A and 0.3% point improvement on VCR QA→R) demonstrate that our method can enhance visual commonsense reasoning by understanding human interactions.

查看译文

关键词

Artificial Intelligence,Machine Commonsense,Commonsense Reasoning,Multi-modal,Facial Expression,Natural Language Processing,Visual Recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要