An Answer FeedBack Network for Visual Question Answering.

IJCNN(2023)

引用 0|浏览1
暂无评分
摘要
Recent advances have explored the power of transformer architecture in Visual Question Answering(VQA). However, most of the models suffer from misalignment of multimodal features, and they focus on unimportant image regions when answering the given questions. To address this, in this paper, we propose an Answer FeedBack Network (AFBN) to focus on image region features that are more beneficial for answering questions. The generate answers of the backbone network are again inputted into the network as feedback information. Then, we propose a FeedBack module (FB) to control the answer feedback. Additionally, we adopt the consistency loss function to reconstruct the image region features. By this function, the model can ensure the same of the image region features related to the question or answer. Extensive experiments on VQA-v2 benchmark dataset show that our method achieves better performance than the state-of-the-art methods.
更多
查看译文
关键词
Answer FeedBack Network,backbone network,feedback information,FeedBack module,given questions,image region features,multimodal features,unimportant image regions,Visual Question
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要