Intra-Modality Feature Interaction Using Self-attention for Visual Question Answering.

ICONIP (5)(2019)

引用 2|浏览42
暂无评分
摘要
Better capturing the interactions of different modality is a hot research topic in visual question answering (VQA) recently. Inspired by human vision information processing, a method of VQA based on intra-modality features interactive with self-attention mechanism (IMFI-SA) is proposed. We adopted object-level features with bottom-up attention instead of feature mapping to extract the fine-grained information in images. Moreover, the interactions of intra-modality in the question and the image modality is also extracted by proposed IMFI-SA model respectively. Finally, we combined the enhanced object-level features interaction using top-down cross-attention and the question features interaction to predict the answer given a question and image. Experimental results on the VQA2.0 dataset show that the proposed method is superior to the existing method in the reasoning answer generating, especially in counting problems.
更多
查看译文
关键词
Cross-modality, Object interaction, Visual Question Answering, Self-attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要