Multi-modality Latent Interaction Network for Visual Question Answering
international conference on computer vision, pp. 5825-5835, 2019.
Abstract:
Exploiting relationships between visual regions and question words have achieved great success in learning multi-modality features for Visual Question Answering (VQA). However, we argue that existing methods mostly model relations between individual visual regions and words, which are not enough to correctly answer the question. From hu...More
Code:
Data:
Full Text
Tags
Comments