Self-Adaptive Neural Module Transformer for Visual Question Answering
IEEE Transactions on Multimedia, pp. 1-1, 2020.
Vision and language understanding is one of the most fundamental and difficult tasks in Multimedia Intelligence. Simultaneously Visual Question Answering (VQA) is even more challenging since it requires complex reasoning steps to the correct answer. To achieve this, Neural Module Network (NMN) and its variants rely on parsing the natural ...More
Full Text (Upload PDF)
PPT (Upload PPT)