Re-Attention For Visual Question Answering

THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE(2021)

引用 81|浏览144
暂无评分
摘要
Visual Question Answering (VQA) requires a simultaneous understanding of images and questions. Existing methods achieve well performance by focusing on both key objects in images and key words in questions. However, the answer also contains rich information which can help to better describe the image and generate more accurate attention maps. In this paper, to utilize the information in answer, we propose a re-attention framework for the VQA task. We first associate image and question by calculating the similarity of each object-word pairs in the feature space. Then, based on the answer, the learned model re-attends the corresponding visual objects in images and reconstructs the initial attention map to produce consistent results. Benefiting from the re-attention procedure, the question can be better understood, and the satisfactory answer is generated. Extensive experiments on the benchmark dataset demonstrate the proposed method performs favorably against the state-of-the-art approaches.
更多
查看译文
关键词
Visualization, Tires, Task analysis, Feature extraction, Training, Knowledge discovery, Image reconstruction, Visual question answering, attention mechanism, re-attention, gating mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要