Embedding Spatial Relations in Visual Question Answering for Remote Sensing

2022 26th International Conference on Pattern Recognition (ICPR)(2022)

引用 1|浏览14
暂无评分
摘要
Remote sensing images carry a wealth of information that is not easily accessible to end-users as it requires strong technical skills and knowledge. Visual Question Answering (VQA), a task that aims at answering an open-ended question in natural language from an image, can provide an easier access to this information. Considering the geographical information contained in remote sensing images, questions often embed an important spatial aspect, for instance regarding the relative position of two objects. Our objective is to better model the spatial relations in the construction of a ground-truth database of image/question/answer triplets and to assess the capacity a VQA model has to answer these questions. In this article, we propose to use histograms of forces to model the directional spatial relations between geo-localized objects. This allows a finer modeling of ambiguous relationships between objects and to provide different levels of assessment of a relation (e.g. object A is slightly/strictly to the west of object B). Using this new dataset, we evaluate the performances of a classical VQA model and propose a curriculum learning strategy to better take into account the varying difficulty of questions embedding spatial relations. With this approach, we show an improvement in the performances of our model, highlighting the interest of embedding spatial relations in VQA for remote sensing applications.
更多
查看译文
关键词
spatial relations,remote sensing,visual question
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要