Dilated Point Convolutions: On the Receptive Field Size of Point Convolutions on 3D Point Clouds

ICRA, pp. 9463-9469, 2020.

Cited by: 3|Views59
EI
Weibo:
We analyzed and compared different network architectures based on the receptive field size which we showed to be directly related to the performance of point convolutional networks

Abstract:

In this work, we propose Dilated Point Convolutions (DPC). In a thorough ablation study, we show that the receptive field size is directly related to the performance of 3D point cloud processing tasks, including semantic segmentation and object classification. Point convolutions are widely used to efficiently process 3D data representatio...More

Code:

Data:

0
Introduction
  • The past years have witnessed a tremendous development of 3D scene understanding methods on several tasks including semantic segmentation [18], object detection [32], and instance segmentation [5]
  • Recent advancements such as point convolutional layers [25], [26], [27] which can directly operate on 3D point clouds further boosted the field.
  • Large receptive fields are important since they enable reasoning on a larger input context
Highlights
  • The past years have witnessed a tremendous development of 3D scene understanding methods on several tasks including semantic segmentation [18], object detection [32], and instance segmentation [5]
  • We compare different network architectures and propose our dilation mechanism as a simple yet elegant solution to significantly increase the receptive field size of point convolutions and improve their performance on multiple point cloud processing tasks
  • We propose to visualize the receptive fields to analyze different network architectures and we present a thorough ablation study comparing several strategies which increase the receptive field of point convolutions
  • We propose Dilated Point Convolutions as a means to significantly increase the receptive field size of point convolutions
  • We reviewed several mechanisms to increase the receptive field size of 3D point convolutions
  • We observe that existing point convolutional networks have inherently small receptive field sizes. Assisted by this observation, we compare different network architectures and propose our dilation mechanism as a simple yet elegant solution to significantly increase the receptive field size of point convolutions and improve their performance on multiple point cloud processing tasks
  • We analyzed and compared different network architectures based on the receptive field size which we showed to be directly related to the performance of point convolutional networks
Methods
  • PointNet [18] KWYND [6] PointCNN [14]

    SPG [12] PCNN [25] DPC (Ours)

    DPC (Val-set) DPC (Test-set) mIoU mAcc oAcc ScanNet.
  • PointNet [18] KWYND [6] PointCNN [14].
  • SPG [12] PCNN [25] DPC (Ours).
  • DPC (Val-set) DPC (Test-set) mIoU mAcc oAcc ScanNet
Results
  • The authors report scores of the best performing models on the ScanNet v2 dataset [4] and the S3DIS dataset [1] in Table I.
  • The authors' dilated point convolutional model is able to outperform other recent KNN-based point convolutional networks by a significant margin on S3DIS,.
  • TABLE I 3D SEMANTIC SEGMENTATION ON S3DIS (A5) AND SCANNET V2.
  • S3DIS Area 5
Conclusion
  • The authors reviewed several mechanisms to increase the receptive field size of 3D point convolutions.
  • The authors analyzed and compared different network architectures based on the receptive field size which the authors showed to be directly related to the performance of point convolutional networks.
  • The authors' dilation mechanism can be integrated into most existing point convolutional networks.
  • The authors hope these insights enable the research community to develop better performing models
Summary
  • Introduction:

    The past years have witnessed a tremendous development of 3D scene understanding methods on several tasks including semantic segmentation [18], object detection [32], and instance segmentation [5]
  • Recent advancements such as point convolutional layers [25], [26], [27] which can directly operate on 3D point clouds further boosted the field.
  • Large receptive fields are important since they enable reasoning on a larger input context
  • Objectives:

    The authors' goal is to increase the size of the receptive field.
  • Methods:

    PointNet [18] KWYND [6] PointCNN [14]

    SPG [12] PCNN [25] DPC (Ours)

    DPC (Val-set) DPC (Test-set) mIoU mAcc oAcc ScanNet.
  • PointNet [18] KWYND [6] PointCNN [14].
  • SPG [12] PCNN [25] DPC (Ours).
  • DPC (Val-set) DPC (Test-set) mIoU mAcc oAcc ScanNet
  • Results:

    The authors report scores of the best performing models on the ScanNet v2 dataset [4] and the S3DIS dataset [1] in Table I.
  • The authors' dilated point convolutional model is able to outperform other recent KNN-based point convolutional networks by a significant margin on S3DIS,.
  • TABLE I 3D SEMANTIC SEGMENTATION ON S3DIS (A5) AND SCANNET V2.
  • S3DIS Area 5
  • Conclusion:

    The authors reviewed several mechanisms to increase the receptive field size of 3D point convolutions.
  • The authors analyzed and compared different network architectures based on the receptive field size which the authors showed to be directly related to the performance of point convolutional networks.
  • The authors' dilation mechanism can be integrated into most existing point convolutional networks.
  • The authors hope these insights enable the research community to develop better performing models
Tables
  • Table1: OBJECT CLASSIFICATION SCORES ON MODELNET40
  • Table2: ABLATION STUDY: DILATED POINT CONVOLUTIONS. VARYING
  • Table3: ABLATION STUDY: STACKING POINT CONVOLUTIONS AND VARYING
Related work
  • Receptive Field Analysis. Few works systematically study the influence of receptive fields on 2D image CNNs [15], [17]. In general, deeper networks which stack multiple layers of 2D convolutions have proven to work better [22], [23]. Dilated convolutions [30], previously introduced as atrous convolutions [3], used in 2D image semantic segmentation, allow to efficiently enlarge the receptive field of filters to incorporate larger context without increasing the number of model parameters. In this work, we propose a simple yet effective dilation mechanism for 3D point convolutions. III. APPROACH
Funding
  • This work was supported by the ERC Consolidator Grant DeeViSe(ERC-2017-COG773161)
Reference
  • Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3D Semantic Parsing of LargeScale Indoor Spaces. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • Alexandre Boulch, Joris Guerry, Bertrand Le Saux, and Nicolas Audebert. SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks. In Computers & Graphics, 2017.
    Google ScholarLocate open access versionFindings
  • Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. In International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. In Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    Google ScholarLocate open access versionFindings
  • Cathrin Elich, Francis Engelmann, Jonas Schult, Theodora Kontogianni, and Bastian Leibe. 3D-BEVIS: Birds-Eye-View Instance Segmentation. In German Conference on Pattern Recognition (GCPR), 2019.
    Google ScholarLocate open access versionFindings
  • Francis Engelmann, Theodora Kontogianni, Jonas Schult, and Bastian Leibe. Know What Your Neighbors Do: 3D Semantic Segmentation of Point Clouds. In European Conference on Computer Vision Workshop (ECCV’W), 2018.
    Google ScholarLocate open access versionFindings
  • Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • Binh-Son Hua, Minh-Khoi Tran, and Sai-Kit Yeung. Pointwise Convolutional Neural Network. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Roman Klokov and Victor Lempitsky. Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models. In International Conference on Computer Vision (ICCV), 2017.
    Google ScholarLocate open access versionFindings
  • Loıc Landrieu and Mohamed Boussaha. Point Cloud Oversegmentation with Graph-Structured Deep Metric Learning. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Loıc Landrieu and Martin Simonovsky. Large-scale Point Cloud Semantic Segmentation with Superpoint Graphs. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Jiaxin Li, Ben M Chen, and Gim Hee Lee. SO-Net: Self-Organizing Network for Point Cloud Analysis. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. PointCNN: Convolution On X-Transformed Points. In Neural Information Processing Systems (NIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Wenjie Luo, Yujia Li, Raquel Urtasun, and Richard Zemel. Understanding the effective receptive field in deep convolutional neural networks. In Neural Information Processing Systems (NIPS), 2016.
    Google ScholarLocate open access versionFindings
  • Daniel Maturana and Sebastian Scherer. VoxNet: A 3D Convolutional Neural Network for real-time object recognition. In International Conference on Intelligent Robots and Systems (IROS), 2015.
    Google ScholarLocate open access versionFindings
  • Dmytro Mishkin, Nikolay Sergievskiy, and Jiri Matas. Systematic Evaluation of Convolution Neural Network Advances on the ImageNet. Computer Vision and Image Understanding (CVIU), 2017.
    Google ScholarLocate open access versionFindings
  • Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    Google ScholarLocate open access versionFindings
  • Charles R. Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. Volumetric and Multi-View CNNs for Object Classification on 3D Data. Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • Charles R. Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. OctNet: Learning Deep 3D Representations at High Resolutions. In Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going Deeper with Convolutions. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    Google ScholarLocate open access versionFindings
  • Lyne P. Tchapmi, Christopher B. Choy, Iro Armeni, JunYoung Gwak, and Silvio Savarese. SEGCloud: Semantic Segmentation of 3D Point Clouds. In International Conference on 3D Vision (3DV), 2017.
    Google ScholarLocate open access versionFindings
  • S. Wang, S. Suo, W.C. Ma, A. Pokrovsky, and R. Urtasun. Deep Parametric Continuous Convolutional Neural Networks. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic Graph CNN for Learning on Point Clouds. In ACM Transactions on Graphics (TOG), 2018.
    Google ScholarLocate open access versionFindings
  • Wenxuan Wu, Zhongang Qi, and Fuxin Li. PointConv: Deep Convolutional Networks on 3D Point Clouds. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3D ShapeNets: A Deep Representation for Volumetric Shapes. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    Google ScholarLocate open access versionFindings
  • Yifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, and Yu Qiao. SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters. In European Conference on Computer Vision (ECCV), 2018.
    Google ScholarLocate open access versionFindings
  • Fisher Yu and Vladlen Koltun. Multi-Scale Context Aggregation by Dilated Convolutions. In International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • Matthew D. Zeiler and Rob Fergus. Visualizing and Understanding Convolutional Networks. In European Conference on Computer Vision (ECCV), 2014.
    Google ScholarLocate open access versionFindings
  • Yin Zhou and Oncel Tuzel. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments