Outside Knowledge Visual Question Answering Version 2.0

Benjamin Z. Reichman,Anirudh Sundar,Christopher Richardson,Tamara Zubatiy, Prithwijit Chowdhury,Aaryan Shah, Jack Truxal, Micah Grimes, Dristi Shah, Woo Ju Chee, Saif Punjwani, Atishay Jain,Larry Heck

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2023）

引用 1|浏览0

暂无评分

摘要

Visual question answering (VQA) lies at the intersection of language and vision research. It functions as a building block for multimodal conversational AI and serves as a testbed for assessing a model’s capability for open-domain scene understanding. While progress in this area was initially accelerated with the 2015 release of the popular and large dataset "VQA", new datasets are required to continue this research momentum. For example, the 2019 Outside Knowledge VQA dataset "OKVQA" extends VQA by adding more challenging questions that require complex, factual, and commonsense knowledge. However, in our analysis, we found that 41.4% of the dataset needed to be corrected and 10.6% needed to be removed. This paper describes the analysis, corrections, and removals completed and presents a new dataset: OK-VQA Version 2.0. To gain insights into the impact of the changes on OK-VQA research, the paper presents results on state-of-the-art models retrained with this new dataset. The side-by-side comparisons show that one method in particular, Knowledge Augmented Transformer for Vision-and-Language, extends its relative lead over competing methods. The dataset is available online. ¹

查看译文

关键词

Visual Question Answering,Outside Knowledge VQA,Datasets

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要