Cross-Modal Contrastive Learning for Domain Adaptation in 3D Semantic Segmentation.

AAAI(2023)

引用 1|浏览16
暂无评分
摘要
Domain adaptation for 3D point cloud has attracted a lot of interest since it can avoid the time-consuming labeling process of 3D data to some extent. A recent work named xMUDA leveraged multi-modal data to domain adaptation task of 3D semantic segmentation by mimicking the predictions between 2D and 3D modalities, and outperformed the previous single modality methods only using point clouds. Based on it, in this paper, we propose a novel cross-modal contrastive learning scheme to further improve the adaptation effects. By employing constraints from the correspondences between 2D pixel features and 3D point features, our method not only facilitates interaction between the two different modalities, but also boosts feature representations in both labeled source domain and unlabeled target domain. Meanwhile, to sufficiently utilize 2D context information for domain adaptation through cross-modal learning, we introduce a neighborhood feature aggregation module to enhance pixel features. The module employs neighborhood attention to aggregate nearby pixels in the 2D image, which relieves the mismatching between the two different modalities, arising from projecting relative sparse point cloud to dense image pixels. We evaluate our method on three unsupervised domain adaptation scenarios, including country-to-country, day-to-night, and dataset-to-dataset. Experimental results show that our approach outperforms existing methods, which demonstrates the effectiveness of the proposed method.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要