Embedding Spatial Relations in Visual Question Answering for Remote Sensing

Maxime Faure,Sylvain Lobry,Camille Kurtz,Laurent Wendling

2022 26th International Conference on Pattern Recognition (ICPR)（2022）

引用 1|浏览14

暂无评分

摘要

Remote sensing images carry a wealth of information that is not easily accessible to end-users as it requires strong technical skills and knowledge. Visual Question Answering (VQA), a task that aims at answering an open-ended question in natural language from an image, can provide an easier access to this information. Considering the geographical information contained in remote sensing images, questions often embed an important spatial aspect, for instance regarding the relative position of two objects. Our objective is to better model the spatial relations in the construction of a ground-truth database of image/question/answer triplets and to assess the capacity a VQA model has to answer these questions. In this article, we propose to use histograms of forces to model the directional spatial relations between geo-localized objects. This allows a finer modeling of ambiguous relationships between objects and to provide different levels of assessment of a relation (e.g. object A is slightly/strictly to the west of object B). Using this new dataset, we evaluate the performances of a classical VQA model and propose a curriculum learning strategy to better take into account the varying difficulty of questions embedding spatial relations. With this approach, we show an improvement in the performances of our model, highlighting the interest of embedding spatial relations in VQA for remote sensing applications.

查看译文

关键词

spatial relations,remote sensing,visual question

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要