AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We have presented Group Contextual Encoding as an effective method to acquire the global context in 3D point clouds, and evaluated this method on several prevailing benchmarks of 3D point clouds

Group Contextual Encoding for 3D Point Clouds

NIPS 2020, (2020)

Cited by: 0|Views73
EI
Full Text
Bibtex
Weibo

Abstract

Global context is crucial for 3D point cloud scene understanding tasks. In this work, we extend the contextual encoding layer that was originally designed for 2D tasks to 3D point cloud scenarios. The encoding layer learns a set of code words in feature space of the 3D point cloud to characterize the global semantic context, and then base...More

Code:

Data:

0
Introduction
  • Object detection in 3D point clouds is a challenging problem because it requires localizing and classifying objects from sparse and irregularly-distributed points
  • Conventional methods such as PointNet++ [16] and ASIS-PointNet++ [18] were proposed to solve this problem, which can learn the local features hierarchically.
  • For 2D Semantic Segmentation, Zhang et al, [24] proposed a contextual encoding layer to learn a descriptor to model the global context by encoding features with a dictionary with only a few code words and aggregating the encoded information.
  • Compared to 2D scenarios, data sparsity becomes a major issue in 3D point cloud scenarios, and the performance of global contextual encoding quickly saturates when the number of
Highlights
  • Object detection in 3D point clouds is a challenging problem because it requires localizing and classifying objects from sparse and irregularly-distributed points
  • We further propose a Group Contextual Encoding (GCE) method, which divides the channel into groups and performs encoding on group-divided feature vectors, to facilitate effective learning of global context in grouped subspaces for 3D point clouds
  • We extend the contextual encoding layer to 3D point cloud scenarios to better model the global contextual information efficiently
  • We have presented Group Contextual Encoding as an effective method to acquire the global context in 3D point clouds, and evaluated this method on several prevailing benchmarks of 3D point clouds
  • Experimental results have shown that the proposed method outperforms the non-grouping baseline methods significantly across the board, and demonstrates state-of-the-art performance on these benchmarks, indicating our method as a compelling alternative to the original “encoding layer” for global context in 3D Point Clouds
  • The results show that our result has outperformed the baseline method of PointNet++ [16] with an increase of 1.5% in accuracy
  • We will investigate the generalizability of our method to other tasks and frameworks, e.g., Graph Convolution network, 3D sparse CNNs, where the global context plays a crucial role in these tasks
Methods
  • The “Grouping” method follows the rule of “locality” that for feature of each individual point, the vectors with adjacent channels will be grouped together.
  • The authors compare with the method of “shuffle” [26], denoted by “w/ shuffle”, which can weaken such constraint of “locality”.
  • The results given in Table 2 show that no significant performance improvement is gained when “channel shuffle” is incorporated.
  • Table 2 shows the improvement of the methods compared with the original SA layer of PointNet++ [16].
  • For the setting of C × 3, the method has outperformed the original SA layer with 2.9 mAP.
  • More ablation experiments on different seed layers will be given in the supplementary material
Results
  • The authors compare the results with the previous state-of-the-art methods, including VoteNet [13] and F-PointNet [14], the results show that the methods have outperformed state-of-the-arts methods on SUN-RGBD [17] and ScanNet [4] benchmarks by a large margin.
  • The result on SUN-RGBD benchmark can be found in Table 3.
  • Compared with the “Geometry Only” method of (a) SUN-RGBD (b) ScanNet cab bed chair sofa table door wind bkshf pic cntr desk curt fridg showr toil sink bath ofurn
Conclusion
  • The authors have presented Group Contextual Encoding as an effective method to acquire the global context in 3D point clouds, and evaluated this method on several prevailing benchmarks of 3D point clouds.
  • Experimental results have shown that the proposed method outperforms the non-grouping baseline methods significantly across the board, and demonstrates state-of-the-art performance on these benchmarks, indicating the method as a compelling alternative to the original “encoding layer” for global context in 3D Point Clouds.
  • This issue should be taken seriously and measures should be taken for preparation
Summary
  • Introduction:

    Object detection in 3D point clouds is a challenging problem because it requires localizing and classifying objects from sparse and irregularly-distributed points
  • Conventional methods such as PointNet++ [16] and ASIS-PointNet++ [18] were proposed to solve this problem, which can learn the local features hierarchically.
  • For 2D Semantic Segmentation, Zhang et al, [24] proposed a contextual encoding layer to learn a descriptor to model the global context by encoding features with a dictionary with only a few code words and aggregating the encoded information.
  • Compared to 2D scenarios, data sparsity becomes a major issue in 3D point cloud scenarios, and the performance of global contextual encoding quickly saturates when the number of
  • Methods:

    The “Grouping” method follows the rule of “locality” that for feature of each individual point, the vectors with adjacent channels will be grouped together.
  • The authors compare with the method of “shuffle” [26], denoted by “w/ shuffle”, which can weaken such constraint of “locality”.
  • The results given in Table 2 show that no significant performance improvement is gained when “channel shuffle” is incorporated.
  • Table 2 shows the improvement of the methods compared with the original SA layer of PointNet++ [16].
  • For the setting of C × 3, the method has outperformed the original SA layer with 2.9 mAP.
  • More ablation experiments on different seed layers will be given in the supplementary material
  • Results:

    The authors compare the results with the previous state-of-the-art methods, including VoteNet [13] and F-PointNet [14], the results show that the methods have outperformed state-of-the-arts methods on SUN-RGBD [17] and ScanNet [4] benchmarks by a large margin.
  • The result on SUN-RGBD benchmark can be found in Table 3.
  • Compared with the “Geometry Only” method of (a) SUN-RGBD (b) ScanNet cab bed chair sofa table door wind bkshf pic cntr desk curt fridg showr toil sink bath ofurn
  • Conclusion:

    The authors have presented Group Contextual Encoding as an effective method to acquire the global context in 3D point clouds, and evaluated this method on several prevailing benchmarks of 3D point clouds.
  • Experimental results have shown that the proposed method outperforms the non-grouping baseline methods significantly across the board, and demonstrates state-of-the-art performance on these benchmarks, indicating the method as a compelling alternative to the original “encoding layer” for global context in 3D Point Clouds.
  • This issue should be taken seriously and measures should be taken for preparation
Tables
  • Table1: Ablation studies of code word number K with SA2 feature of Group Contextual Encoding PointNet++ and G is set to be 1. Evaluated with mAP@0.25
  • Table2: Ablation studies of Channel Number, Encoding layer and Grouping method on SUN-RGBD benchmark. The w/o encoding refers to the cases without encoding, the result of G = 1, noted as w/o group division and the result of our method, noted as w/ group division as well as “channel shuffle”, noted as “w/ shuffle” are also listed for comparisons
  • Table3: Comparison with the state-of-the-art algorithm on SUN RGB-D V1 benchmark
  • Table4: Comparison of our method with state-of-the-art methods on ScanNetV2, evaluated with mAP@0.25
  • Table5: Comparison of our method with state-of-the-art methods on ScanNetV2, evaluated with mAP@0.5
  • Table6: ScanNet Voxel Labeling Performance
Download tables as Excel
Related work
  • For BEV-based methods [3; 9; 12], the data are firstly projected on the ground plane with the bird’s eye view and then the conventional convolution networks are applied to generate features and predict bounding boxes. For Voxel-based methods such as VoxelNet [28; 22; 10], the point clouds are firstly allocated to regular-sized grid in the 3D Cartesian space. Then the conventional 2D or 3D convolution neural networks are applied to extract features and predict bounding boxes. However, these methods inevitably introduce information loss at the initial pre-processing process, making them inadequate for scenes with cluttered points.

    Recently, quantization-free PointNet-based detectors such as VoteNet [13], PointRCNN [11] and STD [23] are proposed. They can model the point cloud directly from the raw input with PointNet/PointNet++ Backbone. Since errors in quantization/projection process can be avoided, these methods have achieved promising results on 3D objection detection benchmarks such as [17; 4].
Funding
  • This research was supported by GCL program of The Univ. of Tokyo by MEXT and in part by National Natural Science Foundation of China under Grant No 61872012, National Key R&D Program of China (2019YFF0302902), and Beijing Academy of Artificial Intelligence (BAAI)
Reference
  • Angelina, M., Lee, G.H.: Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4470–4479 (2018)
    Google ScholarLocate open access versionFindings
  • Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: Cnn architecture for weakly supervised place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5297–5307 (2016)
    Google ScholarLocate open access versionFindings
  • Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1907–1915 (2017)
    Google ScholarLocate open access versionFindings
  • Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: Richlyannotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5828–5839 (2017)
    Google ScholarLocate open access versionFindings
  • Hou, J., Dai, A., Nießner, M.: 3d-sis: 3d semantic instance segmentation of rgb-d scans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4421–4430 (2019)
    Google ScholarLocate open access versionFindings
  • Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7132–7141 (2018)
    Google ScholarLocate open access versionFindings
  • Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
    Findings
  • Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. pp. 1097–1105 (2012)
    Google ScholarFindings
  • Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 1–8. IEEE (2018)
    Google ScholarLocate open access versionFindings
  • Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 12697–12705 (2019)
    Google ScholarLocate open access versionFindings
  • Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: Pointcnn: Convolution on x-transformed points. In: Advances in Neural Information Processing Systems. pp. 820–830 (2018)
    Google ScholarLocate open access versionFindings
  • Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 641–656 (2018)
    Google ScholarLocate open access versionFindings
  • Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3d object detection in point clouds. arXiv preprint arXiv:1904.09664 (2019)
    Findings
  • Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 918–927 (2018)
    Google ScholarLocate open access versionFindings
  • Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 652–660 (2017)
    Google ScholarLocate open access versionFindings
  • Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems. pp. 5099–5108 (2017)
    Google ScholarFindings
  • Song, S., Lichtenberg, S.P., Xiao, J.: Sun rgb-d: A rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 567–576 (2015)
    Google ScholarLocate open access versionFindings
  • Wang, X., Liu, S., Shen, X., Shen, C., Jia, J.: Associatively segmenting instances and semantics in point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4096–4105 (2019)
    Google ScholarLocate open access versionFindings
  • Wu, Y., He, K.: Group normalization. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 3–19 (2018)
    Google ScholarLocate open access versionFindings
  • Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1492–1500 (2017)
    Google ScholarLocate open access versionFindings
  • Xu Wang, J.H., Ma, L.: Exploiting local and global structure for point cloud semantic segmentation with contextual point representations. In: NeurIPS (2019)
    Google ScholarFindings
  • Yan, Y., Mao, Y., Li, B.: Second: Sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
    Google ScholarLocate open access versionFindings
  • Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: Std: Sparse-to-dense 3d object detector for point cloud. arXiv preprint arXiv:1907.10471 (2019)
    Findings
  • Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., Agrawal, A.: Context encoding for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7151–7160 (2018)
    Google ScholarLocate open access versionFindings
  • Zhang, H., Xue, J., Dana, K.: Deep ten: Texture encoding network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 708–717 (2017)
    Google ScholarLocate open access versionFindings
  • Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 6848–6856 (2018)
    Google ScholarLocate open access versionFindings
  • Zhao, H., Jiang, L., Fu, C.W., Jia, J.: Pointweb: Enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5565–5573 (2019)
    Google ScholarLocate open access versionFindings
  • Zhou, Y., Tuzel, O.: Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4490–4499 (2018)
    Google ScholarLocate open access versionFindings
Author
Xu Liu
Xu Liu
Chengtao Li
Chengtao Li
Jingbo Wang
Jingbo Wang
Your rating :
0

 

Tags
Comments
小科