Multi-modality Latent Interaction Network for Visual Question Answering

international conference on computer vision, pp. 5825-5835, 2019.

Cited by: 5|Bibtex|Views14
Other Links: academic.microsoft.com|arxiv.org

Abstract:

Exploiting relationships between visual regions and question words have achieved great success in learning multi-modality features for Visual Question Answering (VQA). However, we argue that existing methods mostly model relations between individual visual regions and words, which are not enough to correctly answer the question. From hu...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments