Self-Supervised Pre-Training with Monocular Height Estimation for Semantic Segmentation
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING(2024)
Abstract
Monocular height estimation (MHE) is key for generating 3-D city models, essential for swift disaster response. Moving beyond the traditional focus on performance enhancement, our study breaks new ground by probing the interpretability of MHE networks. We have pioneeringly discovered that neurons within MHE models demonstrate selectivity for both height and semantic classes. This insight sheds light on the complex inner workings of MHE models and inspires innovative strategies for leveraging elevation data more effectively. Informed by this insight, we propose a pioneering framework that employs MHE as a self-supervised pretraining method for remote sensing (RS) imagery. This approach significantly enhances the performance of semantic segmentation tasks. Furthermore, we develop a disentangled latent transformer (DLT) module that leverages explainable deep representations from pretrained MHE networks for unsupervised semantic segmentation. Our method demonstrates the significant potential of MHE tasks in developing foundation models for sophisticated pixel-level semantic analyses. Additionally, we present a new dataset designed to benchmark the performance of both semantic segmentation and height estimation tasks. The dataset and code will be publicly available at https://github.com/zhu-xlab/DLT-MHE.pytorch.
MoreTranslated text
Key words
Semantics,Task analysis,Estimation,Neurons,Semantic segmentation,Data models,Buildings,Foundation models,interpretable deep learning,monocular height estimation (MHE),self-supervised pretraining
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined