Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding
CoRR(2024)
Abstract
Data diversity and abundance are essential for improving the performance and
generalization of models in natural language processing and 2D vision. However,
3D vision domain suffers from the lack of 3D data, and simply combining
multiple 3D datasets for pretraining a 3D backbone does not yield significant
improvement, due to the domain discrepancies among different 3D datasets that
impede effective feature learning. In this work, we identify the main sources
of the domain discrepancies between 3D indoor scene datasets, and propose
Swin3D++, an enhanced architecture based on Swin3D for efficient pretraining on
multi-source 3D point clouds. Swin3D++ introduces domain-specific mechanisms to
Swin3D's modules to address domain discrepancies and enhance the network
capability on multi-source pretraining. Moreover, we devise a simple
source-augmentation strategy to increase the pretraining data scale and
facilitate supervised pretraining. We validate the effectiveness of our design,
and demonstrate that Swin3D++ surpasses the state-of-the-art 3D pretraining
methods on typical indoor scene understanding tasks. Our code and models will
be released at https://github.com/microsoft/Swin3D
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined