Intra-Modality Feature Interaction Using Self-attention for Visual Question Answering.

Huan Shao,Yunlong Xu,Yi Ji,Jianyu Yang,Chunping Liu

ICONIP (5)（2019）

引用 2|浏览42

暂无评分

摘要

Better capturing the interactions of different modality is a hot research topic in visual question answering (VQA) recently. Inspired by human vision information processing, a method of VQA based on intra-modality features interactive with self-attention mechanism (IMFI-SA) is proposed. We adopted object-level features with bottom-up attention instead of feature mapping to extract the fine-grained information in images. Moreover, the interactions of intra-modality in the question and the image modality is also extracted by proposed IMFI-SA model respectively. Finally, we combined the enhanced object-level features interaction using top-down cross-attention and the question features interaction to predict the answer given a question and image. Experimental results on the VQA2.0 dataset show that the proposed method is superior to the existing method in the reasoning answer generating, especially in counting problems.

查看译文

关键词

Cross-modality, Object interaction, Visual Question Answering, Self-attention

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要