Self-Adaptive Neural Module Transformer for Visual Question Answering

Zhong Huasong
Zhong Huasong
Jingyuan Chen
Jingyuan Chen
Chen Shen
Chen Shen
Xian-Sheng Hua
Xian-Sheng Hua

IEEE Transactions on Multimedia, pp. 1-1, 2020.

Cited by: 0|Bibtex|Views26|DOI:https://doi.org/10.1109/TMM.2020.2995278
Other Links: academic.microsoft.com

Abstract:

Vision and language understanding is one of the most fundamental and difficult tasks in Multimedia Intelligence. Simultaneously Visual Question Answering (VQA) is even more challenging since it requires complex reasoning steps to the correct answer. To achieve this, Neural Module Network (NMN) and its variants rely on parsing the natural ...More

Code:

Data:

Your rating :
0

 

Tags
Comments