A Spatial Hierarchical Reasoning Network for Remote Sensing Visual Question Answering

IEEE Transactions on Geoscience and Remote Sensing(2023)

引用 2|浏览123
暂无评分
摘要
For visual question answering on remote sensing (RSVQA), current methods scarcely consider geospatial objects typically with large-scale differences and positional sensitive properties. Besides, modeling and reasoning the relationships between entities have rarely been explored, which leads to one-sided and inaccurate answer predictions. In this article, a novel method called spatial hierarchical reasoning network (SHRNet) is proposed, which endows a remote sensing (RS) visual question answering (VQA) system with enhanced visual–spatial reasoning capability. Specifically, a hash-based spatial multiscale visual representation module is first designed to encode multiscale visual features embedded with spatial positional information. Then, spatial hierarchical reasoning is conducted to learn the high-order inner group object relations across multiple scales under the guidance of linguistic cues. Finally, a visual-question (VQ) interaction module is employed to learn an effective image–text joint embedding for the final answer predicting. Experimental results on three public RS VQA datasets confirm the effectiveness and superiority of our model SHRNet.
更多
查看译文
关键词
Attention mechanism,multiscale representation,relational reasoning,visual question answering on remote sensing (RSVQA)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要