DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
ICLR 2024(2023)
摘要
We present DFormer, a novel RGB-D pretraining framework to learn transferable
representations for RGB-D segmentation tasks. DFormer has two new key
innovations: 1) Unlike previous works that encode RGB-D information with RGB
pretrained backbone, we pretrain the backbone using image-depth pairs from
ImageNet-1K, and hence the DFormer is endowed with the capacity to encode RGB-D
representations; 2) DFormer comprises a sequence of RGB-D blocks, which are
tailored for encoding both RGB and depth information through a novel building
block design. DFormer avoids the mismatched encoding of the 3D geometry
relationships in depth maps by RGB pretrained backbones, which widely lies in
existing methods but has not been resolved. We finetune the pretrained DFormer
on two popular RGB-D tasks, i.e., RGB-D semantic segmentation and RGB-D salient
object detection, with a lightweight decoder head. Experimental results show
that our DFormer achieves new state-of-the-art performance on these two tasks
with less than half of the computational cost of the current best methods on
two RGB-D semantic segmentation datasets and five RGB-D salient object
detection datasets. Our code is available at:
https://github.com/VCIP-RGBD/DFormer.
更多查看译文
关键词
RGB-D Semantic Segmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要