# ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation

Weibo:

Abstract:

Due to its robust and precise distance measurements, LiDAR plays an important role in scene understanding for autonomous driving. Training deep neural networks (DNNs) on LiDAR data requires large-scale point-wise annotations, which are time-consuming and expensive to obtain. Instead, simulation-to-real domain adaptation (SRDA) trains a ...More

Code:

Data:

Introduction

- Many types of multimedia data, such as images captured by cameras and point clouds collected by LiDAR (Light Detection And Ranging) and RaDAR (Radio Detection And Ranging) can help to understand the semantics of complex scenes for autonomous driving.
- Among these sensors, LiDAR is an essential one for its specific properties (Wu et al 2018a).
- Domain adaptation (DA) aims to learn a transferable model to minimize the impact of domain shift between the source and target domains

Highlights

- Many types of multimedia data, such as images captured by cameras and point clouds collected by LiDAR (Light Detection And Ranging) and RaDAR (Radio Detection And Ranging) can help to understand the semantics of complex scenes for autonomous driving
- Domain adaptation (DA) aims to learn a transferable model to minimize the impact of domain shift between the source and target domains
- We compare to baselines of three types: (1) source-only, directly transferring the model trained on the simulation domain; (2) SqueezeSegV2 (Wu et al 2019), one state-of-the-art simulation-to-real domain adaptation (SRDA) method for LiDAR point cloud segmentation; (3) state-of-the-art DA methods for RGB image classification and segmentation tasks: DAN (Long et al 2015), CORAL (Sun and Saenko 2016), adversarial discriminative methods (ADDA) (Tzeng et al 2017), CyCADA (Hoffman et al 2018), and HoMM (Chen et al 2020)
- We explore the differences caused by various normalization schemes, including batch normalization (BN) (Ioffe and Szegedy 2015), instance normalization (IN) (Ulyanov, Vedaldi, and Lempitsky 2016), layer normalization (LN) (Ba, Kiros, and Hinton 2016), and group normalization (GN) (Wu and He 2018)
- We proposed an end-to-end simulation-to-real domain adaptation (SRDA) framework, named ePointDA, for LiDAR point cloud segmentation
- The extensive experiments adapting from synthetic GTA-LiDAR to real KITTI and SemanticKITTI demonstrated that ePointDA significantly outperforms the state-of-the-art SRDA methods

Methods

- Pre Rec IoU Pre Rec IoU Source-only.
- DAN (Long et al 2015).
- SqueezeSegV2 (Wu et al 2019) - ePointDA (Ours)

Results

- Similar to (Wu et al 2018a), the authors employ precision, recall, and intersection-over-union (IoU) to evaluate the class-level segmentation results.
- Similar to (Wu et al 2018a), the authors employ precision, recall, and intersection-over-union (IoU) to evaluate the class-level segmentation results by comparing the predicted results with ground-truth labels point-wisely: P rel |Pl ∩Gl |Pl | Recl |Pl ∩Gl |Gl the author oUls , |Pl ∩Gl | |Pl ∪Gl | where Pl and.

Conclusion

- The authors proposed an end-to-end simulation-to-real domain adaptation (SRDA) framework, named ePointDA, for LiDAR point cloud segmentation.
- The extensive experiments adapting from synthetic GTA-LiDAR to real KITTI and SemanticKITTI demonstrated that ePointDA significantly outperforms the state-of-the-art SRDA methods.
- The authors plan to construct a large-scale synthetic dataset for LiDAR point cloud segmentation containing more compatible categories with SemanticKITTI and extend the framework to corresponding SRDA tasks.
- The authors will explore multi-modal domain adaptation by jointly modeling multiple modalities, such as image and LiDAR

Summary

## Introduction:

Many types of multimedia data, such as images captured by cameras and point clouds collected by LiDAR (Light Detection And Ranging) and RaDAR (Radio Detection And Ranging) can help to understand the semantics of complex scenes for autonomous driving.- Among these sensors, LiDAR is an essential one for its specific properties (Wu et al 2018a).
- Domain adaptation (DA) aims to learn a transferable model to minimize the impact of domain shift between the source and target domains
## Objectives:

Given labeled synthetic LiDAR and unlabeled real LiDAR, the goal is to learn a transferable segmentation model by aligning the source simulation domain and target real domain.- On the basis of covariate shift and concept drift (Patel et al 2015), the authors aim to learn a segmentation model that can correctly predict the labels for each pixel of a real sample trained on {(Xs, Ys)} and {Xr}
## Methods:

Pre Rec IoU Pre Rec IoU Source-only.- DAN (Long et al 2015).
- SqueezeSegV2 (Wu et al 2019) - ePointDA (Ours)
## Results:

Similar to (Wu et al 2018a), the authors employ precision, recall, and intersection-over-union (IoU) to evaluate the class-level segmentation results.- Similar to (Wu et al 2018a), the authors employ precision, recall, and intersection-over-union (IoU) to evaluate the class-level segmentation results by comparing the predicted results with ground-truth labels point-wisely: P rel |Pl ∩Gl |Pl | Recl |Pl ∩Gl |Gl the author oUls , |Pl ∩Gl | |Pl ∪Gl | where Pl and.
## Conclusion:

The authors proposed an end-to-end simulation-to-real domain adaptation (SRDA) framework, named ePointDA, for LiDAR point cloud segmentation.- The extensive experiments adapting from synthetic GTA-LiDAR to real KITTI and SemanticKITTI demonstrated that ePointDA significantly outperforms the state-of-the-art SRDA methods.
- The authors plan to construct a large-scale synthetic dataset for LiDAR point cloud segmentation containing more compatible categories with SemanticKITTI and extend the framework to corresponding SRDA tasks.
- The authors will explore multi-modal domain adaptation by jointly modeling multiple modalities, such as image and LiDAR

- Table1: Comparison with the state-of-the-art DA methods for LiDAR point cloud segmentation from GTA-LiDAR to KITTI, where +ASAC denotes using the spatial feature aligned SAC module, and +HHead denotes replacing the CRF layer with an conv layer. The best IoU of each category trained on the simulation domain is emphasized in bold
- Table2: Comparison with the state-of-the-art DA methods from GTA-LiDAR to SemanticKITTI
- Table3: Ablation study on different components, where Baseline denotes a simplified SqueezeSegV2 model (<a class="ref-link" id="cWu_et+al_2019_a" href="#rWu_et+al_2019_a">Wu et al 2019</a>) for fair comparison taking the Cartesian coordinates as input and using batch normalization, frequencybased DNR, and geodesic correlation alignment
- Table4: Ablation study on different normalization schemes using both frequency-based DNR and our learned DNR without feature alignment. ‘BN’, ‘IN’, ‘LN’, ‘GN’ are short for batch normalization, instance normalization, layer normalization, and group normalization, respectively
- Table5: Comparison between ordinary SAC (<a class="ref-link" id="cXu_et+al_2020_a" href="#rXu_et+al_2020_a">Xu et al 2020</a>) and our aligned SAC (ASAC). Baseline corresponds the “+SDNR+IN+HoMM” setting in Table 3
- Table6: Ablation study on the number of convolution layers (#Conv) that are appended to the last deconvolution layer. This experiment is conducted after dropout noise rendering and feature alignment, i.e. +SDNR+IN+HoMM+ASAC

Related work

- Point Cloud Segmentation. Recent efforts on point cloud segmentation are typically based on DNNs. One straightforward way is to use the raw, un-ordered point clouds as input to a DNN. To deal with the order missing problem, symmetrical operators are usually applied, such as in PointNet (Qi et al 2017a), PointNet++ (Qi et al 2017b), and their improvements on hierarchical architecture (Klokov and Lempitsky 2017), sampling (Dovrat, Lang, and Avidan 2019), reordering (Li et al 2018a), grouping (Li, Chen, and Hee Lee 2018), and efficiency (Liu et al 2019b,a; Zhang, Hua, and Yeung 2019). There are also methods converting point clouds to regular 3D voxel grids (Wang et al 2017; Huang, Wang, and Neumann 2018; Le and Duan 2018; Lei, Akhtar, and Mian 2019; Mao, Wang, and Li 2019; Meng et al 2019) or constructing graphs from point clouds for network processing (Te et al 2018; Jiang et al 2019; Xu et al 2018; Landrieu and Simonovsky 2018; Wang et al 2019b,c). However, these methods suffer from some limitations, such as inefficiency and point collision (Lyu, Huang, and Zhang 2020). To address the efficiency problem and enable realtime inference, one popular method is to project 3D point clouds to 2D images, including sphere mapping (Wu et al 2018a, 2019; Milioto et al 2019; Behley et al 2019; Xu et al 2020), 2D grid sampling (Caltagirone et al 2017), and graph drawing (Lyu, Huang, and Zhang 2020). In this paper, we follow the spherical projection method of SqueezeSeg (Wu et al 2018a, 2019).

Funding

- (3) We conduct extensive SRDA experiments from synthetic GTA-LiDAR (Wu et al 2019) to real KITTI (Geiger, Lenz, and Urtasun 2012) and SemanticKITTI (Behley et al 2019), and respectively achieve 8.8% and 7.5% better IoU scores (on the “car” class) than the best DA baseline
- 72.1 85.6 64.2 31.2 57.5 25.3 tribute to the SRDA task; (2) among all these components, SDNR provides the highest performance improvement (5.6%), which demonstrates the important role that dropout noise plays in the domain gap between the simulation and real domains and the necessity of exploring effective dropout noise rendering (DNR) model

Reference

- Achituve, I.; Maron, H.; and Chechik, G. 2020. SelfSupervised Learning for Domain Adaptation on Point-Clouds. arXiv:2003.12641.
- Ba, J. L.; Kiros, J. R.; and Hinton, G. E. 2016. Layer normalization. arXiv:1607.06450.
- Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C.; and Gall, J. 2019. SemanticKITTI: A dataset for semantic scene understanding of lidar sequences. In ICCV, 9297– 9307.
- Caltagirone, L.; Scheidegger, S.; Svensson, L.; and Wahde, M. 2017. Fast LIDAR-based road detection using fully convolutional neural networks. In IV, 1019–1024.
- Carlucci, F. M.; D’Innocente, A.; Bucci, S.; Caputo, B.; and Tommasi, T. 2019. Domain generalization by solving jigsaw puzzles. In CVPR, 2229–2238.
- Chen, C.; Fu, Z.; Chen, Z.; Jin, S.; Cheng, Z.; Jin, X.; and Hua, X.-S. 2020. HoMM: Higher-order Moment Matching for Unsupervised Domain Adaptation. In AAAI.
- Dai, J.; Li, Y.; He, K.; and Sun, J. 2016. R-fcn: Object detection via region-based fully convolutional networks. In NeurIPS, 379–387.
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; and Wei, Y. 2017. Deformable convolutional networks. In ICCV, 764–773.
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; and Koltun, V. 2017. CARLA: An open urban driving simulator. arXiv:1711.03938.
- Dovrat, O.; Lang, I.; and Avidan, S. 2019. Learning to sample. In CVPR, 2760–2769.
- Feng, Z.; Xu, C.; and Tao, D. 2019. Self-Supervised Representation Learning From Multi-Domain Data. In ICCV, 3245–3255.
- Geiger, A.; Lenz, P.; and Urtasun, R. 20Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 3354–3361.
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; WardeFarley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative adversarial nets. In NeurIPS, 2672–2680.
- Hoffman, J.; Tzeng, E.; Park, T.; Zhu, J.-Y.; Isola, P.; Saenko, K.; Efros, A. A.; and Darrell, T. 2018. CyCADA: Cycle-Consistent Adversarial Domain Adaptation. In ICML, 1994–2003.
- Huang, Q.; Wang, W.; and Neumann, U. 2018. Recurrent slice networks for 3d segmentation of point clouds. In CVPR, 2626– 2635.
- Ioffe, S.; and Szegedy, C. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In ICML, 448–456.
- Jiang, L.; Zhao, H.; Liu, S.; Shen, X.; Fu, C.-W.; and Jia, J. 2019. Hierarchical point-edge interaction network for point cloud semantic segmentation. In ICCV, 10433–10441.
- Jiang, P.; and Saripalli, S. 2020. LiDARNet: A Boundary-Aware Domain Adaptation Model for Lidar Point Cloud Semantic Segmentation. arXiv:2003.01174.
- Klokov, R.; and Lempitsky, V. 2017. Escape from cells: Deep kdnetworks for the recognition of 3d point cloud models. In ICCV, 863–872.
- Krahenbuhl, P. 2018. Free supervision from video games. In CVPR, 2955–2964.
- Landrieu, L.; and Simonovsky, M. 2018. Large-scale point cloud semantic segmentation with superpoint graphs. In CVPR, 4558– 4567.
- Le, T.; and Duan, Y. 2018. Pointgrid: A deep network for 3d shape understanding. In CVPR, 9204–9214.
- Lee, S.; Kim, D.; Kim, N.; and Jeong, S.-G. 2019. Drop to adapt: Learning discriminative features for unsupervised domain adaptation. In ICCV, 91–100.
- Lei, H.; Akhtar, N.; and Mian, A. 2019. Octree guided CNN with spherical kernels for 3D point clouds. In CVPR, 9631–9640.
- Li, J.; Chen, B. M.; and Hee Lee, G. 2018. So-net: Self-organizing network for point cloud analysis. In CVPR, 9397–9406.
- Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; and Chen, B. 2018a. Pointcnn: Convolution on x-transformed points. In NeurIPS, 820– 830.
- Li, Y.; Chen, Y.; Wang, N.; and Zhang, Z. 2019. Scale-aware trident networks for object detection. In ICCV, 6054–6063.
- Li, Y.; Wang, N.; Shi, J.; Hou, X.; and Liu, J. 2018b. Adaptive Batch Normalization for practical domain adaptation. PR 80: 109– 117.
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; and Dollar, P. 2017. Focal loss for dense object detection. In ICCV, 2980–2988.
- Liu, Y.; Fan, B.; Meng, G.; Lu, J.; Xiang, S.; and Pan, C. 2019a. DensePoint: Learning densely contextual representation for efficient point cloud processing. In ICCV, 5239–5248.
- Liu, Z.; Tang, H.; Lin, Y.; and Han, S. 2019b. Point-Voxel CNN for efficient 3D deep learning. In NeurIPS, 963–973.
- Long, M.; Cao, Y.; Wang, J.; and Jordan, M. 2015. Learning transferable features with deep adaptation networks. In ICML, 97–105.
- Lyu, Y.; Huang, X.; and Zhang, Z. 2020. Learning to Segment 3D Point Clouds in 2D Image Space. In CVPR, 12252–12261.
- Mao, J.; Wang, X.; and Li, H. 2019. Interpolated convolutional networks for 3d point cloud understanding. In ICCV, 1578–1587.
- Meng, H.-Y.; Gao, L.; Lai, Y.-K.; and Manocha, D. 2019. VV-Net: Voxel vae net with group convolutions for point cloud segmentation. In ICCV, 8500–8508.
- Milioto, A.; Vizzo, I.; Behley, J.; and Stachniss, C. 2019. Rangenet++: Fast and accurate lidar semantic segmentation. In IROS.
- Morerio, P.; Cavazza, J.; and Murino, V. 2018. Minimal-Entropy Correlation Alignment for Unsupervised Deep Domain Adaptation. In ICLR.
- Patel, V. M.; Gopalan, R.; Li, R.; and Chellappa, R. 2015. Visual domain adaptation: A survey of recent advances. IEEE SPM 32(3): 53–69.
- Peng, X.; Bai, Q.; Xia, X.; Huang, Z.; Saenko, K.; and Wang, B. 2019. Moment matching for multi-source domain adaptation. In ICCV, 1406–1415.
- Qi, C. R.; Su, H.; Mo, K.; and Guibas, L. J. 2017a. Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 652–660.
- Qi, C. R.; Yi, L.; Su, H.; and Guibas, L. J. 2017b. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In NeurIPS, 5099–5108.
- Qin, C.; You, H.; Wang, L.; Kuo, C.-C. J.; and Fu, Y. 2019. PointDAN: A multi-scale 3D domain adaption network for point cloud representation. In NeurIPS, 7190–7201.
- Richter, S. R.; Vineet, V.; Roth, S.; and Koltun, V. 2016. Playing for data: Ground truth from computer games. In ECCV, 102–118.
- Rist, C. B.; Enzweiler, M.; and Gavrila, D. M. 2019. Cross-Sensor Deep Domain Adaptation for LiDAR Detection and Segmentation. In IV, 1535–1542.
- Russo, P.; Carlucci, F. M.; Tommasi, T.; and Caputo, B. 2018. From source to target and back: symmetric bi-directional adaptive gan. In CVPR, 8099–8108.
- Saleh, K.; Abobakr, A.; Attia, M.; Iskander, J.; Nahavandi, D.; Hossny, M.; and Nahvandi, S. 2019. Domain Adaptation for Vehicle Detection from Bird’s Eye View LiDAR Point Cloud Data. In ICCVW, 1–8.
- Sankaranarayanan, S.; Balaji, Y.; Castillo, C. D.; and Chellappa, R. 2018. Generate to adapt: Aligning domains using generative adversarial networks. In CVPR, 8503–8512.
- Shrivastava, A.; Pfister, T.; Tuzel, O.; Susskind, J.; Wang, W.; and Webb, R. 2017. Learning from simulated and unsupervised images through adversarial training. In CVPR, 2107–2116.
- Sun, B.; Feng, J.; and Saenko, K. 2016. Return of frustratingly easy domain adaptation. In AAAI, 2058–2065.
- Sun, B.; and Saenko, K. 2016. Deep CORAL: Correlation Alignment for Deep Domain Adaptation. In ECCVW, 443–450.
- Sun, Y.; Tzeng, E.; Darrell, T.; and Efros, A. A. 2019. Unsupervised Domain Adaptation through Self-Supervision. arXiv:1909.11825.
- Te, G.; Hu, W.; Zheng, A.; and Guo, Z. 2018. Rgcnn: Regularized graph cnn for point cloud segmentation. In ACM MM, 746–754.
- Tripathi, S.; Chandra, S.; Agrawal, A.; Tyagi, A.; Rehg, J. M.; and Chari, V. 2019. Learning to generate synthetic data via compositing. In CVPR, 461–470.
- Tzeng, E.; Hoffman, J.; Saenko, K.; and Darrell, T. 2017. Adversarial discriminative domain adaptation. In CVPR, 2962–2971.
- Ulyanov, D.; Vedaldi, A.; and Lempitsky, V. 2016. Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022.
- Wang, B.; Wu, V.; Wu, B.; and Keutzer, K. 2019a. LATTE: accelerating lidar point cloud annotation via sensor fusion, one-click annotation, and tracking. In ITSC, 265–272.
- Wang, L.; Huang, Y.; Hou, Y.; Zhang, S.; and Shan, J. 2019b. Graph attention convolution for point cloud semantic segmentation. In CVPR, 10296–10305.
- Wang, P.-S.; Liu, Y.; Guo, Y.-X.; Sun, C.-Y.; and Tong, X. 2017. O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM TOG 36(4): 1–11.
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S. E.; Bronstein, M. M.; and Solomon, J. M. 2019c. Dynamic graph cnn for learning on point clouds. ACM TOG 38(5): 1–12.
- Wu, B.; Wan, A.; Yue, X.; and Keutzer, K. 2018a. Squeezeseg: Convolutional neural nets with recurrent crf for real-time roadobject segmentation from 3d lidar point cloud. In ICRA, 1887– 1893.
- Wu, B.; Zhou, X.; Zhao, S.; Yue, X.; and Keutzer, K. 2019. Squeezesegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In ICRA, 4376–4382.
- Wu, Y.; and He, K. 2018. Group normalization. In ECCV, 3–19.
- Wu, Z.; Han, X.; Lin, Y.-L.; Gokhan Uzunbas, M.; Goldstein, T.; Nam Lim, S.; and Davis, L. S. 2018b. Dcan: Dual channel-wise alignment networks for unsupervised scene adaptation. In ECCV, 518–534.
- Xu, C.; Wu, B.; Wang, Z.; Zhan, W.; Vajda, P.; Keutzer, K.; and Tomizuka, M. 2020. SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation. In ECCV.
- Xu, Y.; Fan, T.; Xu, M.; Zeng, L.; and Qiao, Y. 2018. Spidercnn: Deep learning on point sets with parameterized convolutional filters. In ECCV, 87–102.
- Yu, F.; and Koltun, V. 2016. Multi-scale context aggregation by dilated convolutions. In ICLR.
- Yue, X.; Wu, B.; Seshia, S. A.; Keutzer, K.; and SangiovanniVincentelli, A. L. 2018. A lidar point cloud generator: from a virtual world to autonomous driving. In ICMR, 458–464.
- Zhang, Z.; Hua, B.-S.; and Yeung, S.-K. 2019. Shellnet: Efficient point cloud convolutional neural networks using concentric shells statistics. In ICCV, 1607–1616.
- Zhao, H.; Zhang, S.; Wu, G.; Moura, J. M.; Costeira, J. P.; and Gordon, G. J. 2018. Adversarial multiple source domain adaptation. In NeurIPS, 8559–8570.
- Zhao, S.; Li, B.; Yue, X.; Gu, Y.; Xu, P.; Hu, R.; Chai, H.; and Keutzer, K. 2019a. Multi-source Domain Adaptation for Semantic Segmentation. In NeurIPS, 7285–7298.
- Zhao, S.; Lin, C.; Xu, P.; Zhao, S.; Guo, Y.; Krishna, R.; Ding, G.; and Keutzer, K. 2019b. CycleEmotionGAN: Emotional Semantic Consistency Preserved CycleGAN for Adapting Image Emotions. In AAAI, 2620–2627.
- Zhao, S.; Wang, G.; Zhang, S.; Gu, Y.; Li, Y.; Song, Z.; Xu, P.; Hu, R.; Chai, H.; and Keutzer, K. 2020. Multi-source Distilling Domain Adaptation. In AAAI, 12975–12983.
- Zhu, J.-Y.; Park, T.; Isola, P.; and Efros, A. A. 2017. Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV, 2223–2232.
- Zhuo, J.; Wang, S.; Zhang, W.; and Huang, Q. 2017. Deep Unsupervised Convolutional Domain Adaptation. In ACM MM, 261– 269.

Full Text

Tags

Comments