One for All: Toward Unified Foundation Models for Earth Vision.
CoRR(2024)
Abstract
Foundation models characterized by extensive parameters and trained onlarge-scale datasets have demonstrated remarkable efficacy across variousdownstream tasks for remote sensing data. Current remote sensing foundationmodels typically specialize in a single modality or a specific spatialresolution range, limiting their versatility for downstream datasets. Whilethere have been attempts to develop multi-modal remote sensing foundationmodels, they typically employ separate vision encoders for each modality orspatial resolution, necessitating a switch in backbones contingent upon theinput data. To address this issue, we introduce a simple yet effective method,termed OFA-Net (One-For-All Network): employing a single, shared Transformerbackbone for multiple data modalities with different spatial resolutions. Usingthe masked image modeling mechanism, we pre-train a single Transformer backboneon a curated multi-modal dataset with this simple design. Then the backbonemodel can be used in different downstream tasks, thus forging a path towards aunified foundation backbone model in Earth vision. The proposed method isevaluated on 12 distinct downstream tasks and demonstrates promisingperformance.
MoreTranslated text
Key words
Foundation models,remote sensing,Earth observation,self-supervised learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined