Robust image captioning with post-generation ensemble method

IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM(2023)

引用 0|浏览3
暂无评分
摘要
Remote sensing image captioning is a research domain that aims to automatically generate natural language descriptions of the contents within remote sensed images. Providing accurate depictions of image contents holds great significance for downstream applications such as image retrieval and image understanding. While there is a pressing need for reliable results, current research predominantly focuses on single captioning algorithms, striving to enhance their performance on specific target-oriented datasets. Undoubtedly, this research trajectory is highly important. However, we believe that relying solely on the output of a single captioner may introduce a vulnerability from a robustness standpoint. This concern is particularly relevant in remote sensing, where the scarcity of large-scale datasets can limit the robustness and reliability of resulting algorithms. In this paper, we propose an approach that harnesses the advantages of ensembles to enhance accuracy and reliability in the context of image captioning. Our method introduces a novel technique for utilizing an ensemble of diverse captioning algorithms and automatically selecting the most suitable caption from the set of predictions. By decoupling the description generation and selection phases, this approach enables high flexibility of integration of architecturally different captioning algorithms in the pipeline.
更多
查看译文
关键词
Image captioning,ensemble of captioners,CLIP model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要