A Hierarchical Graph Network for 3D Object Detection on Point Clouds

Jintai Chen
Jintai Chen
Biwen Lei
Biwen Lei
Qingyu Song
Qingyu Song

CVPR, pp. 389-398, 2020.

Cited by: 0|Views502
EI
Weibo:
We propose a new graph convolution based hierarchical graph network for 3D object detection, which processes raw point clouds directly to predict 3D bounding boxes

Abstract:

3D object detection on point clouds finds many applications. However, most known point cloud object detection methods did not adequately accommodate the characteristics (e.g., sparsity) of point clouds, and thus some key semantic information (e.g., shape information) is not well captured. In this paper, we propose a new graph convolution ...More

Code:

Data:

0
Introduction
  • Charles et al proposed VoteNet [29], voting for points to be at the object centers based on learned features from PointNet++ [32].
  • The authors propose a novel Hierarchical Graph Network (HGNet) for 3D object detection on point clouds, based on graph convolutions (GConvs).
  • Shape-attentive GConv (SA-GConv), which captures the object shape information by modelling the relative geometric positions of points.
Highlights
  • Our main contributions in this work are as follows: (A) We develop a novel Hierarchical Graph Network (HGNet) for 3D object detection on point clouds, which outperforms the state-of-the-art methods by a clear margin. (B) We propose a novel set abstraction module-(De)graph convolution, which is effective at aggregating features and capturing shape information of objects in point clouds. (C) We build a new GConv based U-shape network for generating multi-level features, which are vital for 3D object detection. (D) Leveraging global information, we propose the ProRe Module to promote performance by reasoning on proposals
  • This likely is due to the proposed feature pyramid and our hierarchical graph modelling (SA-graph convolution, GConv based U-shape network, and ProRe Module)
  • Hierarchical graph network often detects some objects in the scenes that are not annotated by the ground truth
  • For 3D object detection on point clouds, we proposed a novel framework hierarchical graph network, learning the semantics via hierarchical graph modelling
  • We built GConv based U-shape network based on set abstraction module-graph convolution and set abstraction module-DeGConv, generating the feature pyramid containing the multi-level semantics
Results
  • The SA-GConv based GU-net takes a point cloud as input and captures the semantics of multi-levels, which are further aggregated to generate proposals by the Proposal Generator that contains an improved voting module.
  • The authors' main contributions in this work are as follows: (A) The authors develop a novel Hierarchical Graph Network (HGNet) for 3D object detection on point clouds, which outperforms the state-of-the-art methods by a clear margin.
  • VoteNet [29] proposed a new voting method, predicting the object centers with the features learned which helped aggregate distant semantic information.
  • The authors develop an end-to-end hierarchical graph network (HGNet) for 3D object detection on point clouds, as shown in Fig. 2.
  • To capture the multi-level semantics, the authors propose a new U-shape network called GU-net, based on SA-(De)GConv.
  • Three point feature maps are generated by GU-net, containing the multi-level semantics.
  • Different from the previous GConvs, ProRe considers the relative geometric positions among proposals in feature aggregation using γ, which transforms P into size n × F ′ for Hadamard production.
  • This likely is due to the proposed feature pyramid and the hierarchical graph modelling (SA-GConv, GU-net, and ProRe Module).
  • To answer question Q2, the authors evaluate the contributions of SA-GConv, GU-net, and the ProRe Module via ablation experiments on the SUN RGB-D dataset.
  • For 3D object detection on point clouds, the authors proposed a novel framework HGNet, learning the semantics via hierarchical graph modelling.
  • The authors proposed the novel and light Shape-attentive (De)GConv to capture the local shape semantics, which aggregates the features considering the relative geometric positions of points.
Conclusion
  • The authors built GU-net based on SA-GConv and SA-DeGConv, generating the feature pyramid containing the multi-level semantics.
  • The points on the feature pyramid vote to be at the corresponding object centers and the semantics of multilevels are further aggregated to generate proposals.
  • Different from the previous methods, HGNet attains better performance by carefully considering the shape information and aggregating the semantics of multi-levels.
Summary
  • Charles et al proposed VoteNet [29], voting for points to be at the object centers based on learned features from PointNet++ [32].
  • The authors propose a novel Hierarchical Graph Network (HGNet) for 3D object detection on point clouds, based on graph convolutions (GConvs).
  • Shape-attentive GConv (SA-GConv), which captures the object shape information by modelling the relative geometric positions of points.
  • The SA-GConv based GU-net takes a point cloud as input and captures the semantics of multi-levels, which are further aggregated to generate proposals by the Proposal Generator that contains an improved voting module.
  • The authors' main contributions in this work are as follows: (A) The authors develop a novel Hierarchical Graph Network (HGNet) for 3D object detection on point clouds, which outperforms the state-of-the-art methods by a clear margin.
  • VoteNet [29] proposed a new voting method, predicting the object centers with the features learned which helped aggregate distant semantic information.
  • The authors develop an end-to-end hierarchical graph network (HGNet) for 3D object detection on point clouds, as shown in Fig. 2.
  • To capture the multi-level semantics, the authors propose a new U-shape network called GU-net, based on SA-(De)GConv.
  • Three point feature maps are generated by GU-net, containing the multi-level semantics.
  • Different from the previous GConvs, ProRe considers the relative geometric positions among proposals in feature aggregation using γ, which transforms P into size n × F ′ for Hadamard production.
  • This likely is due to the proposed feature pyramid and the hierarchical graph modelling (SA-GConv, GU-net, and ProRe Module).
  • To answer question Q2, the authors evaluate the contributions of SA-GConv, GU-net, and the ProRe Module via ablation experiments on the SUN RGB-D dataset.
  • For 3D object detection on point clouds, the authors proposed a novel framework HGNet, learning the semantics via hierarchical graph modelling.
  • The authors proposed the novel and light Shape-attentive (De)GConv to capture the local shape semantics, which aggregates the features considering the relative geometric positions of points.
  • The authors built GU-net based on SA-GConv and SA-DeGConv, generating the feature pyramid containing the multi-level semantics.
  • The points on the feature pyramid vote to be at the corresponding object centers and the semantics of multilevels are further aggregated to generate proposals.
  • Different from the previous methods, HGNet attains better performance by carefully considering the shape information and aggregating the semantics of multi-levels.
Tables
  • Table1: Table 1
  • Table2: Table 2
  • Table3: Quantitative ablation experiments on SUN RGB-D. “FP”
  • Table4: Comparison of voting results between SA-GConv and SA module in HGNet on SUN RGB-D dataset
Download tables as Excel
Related work
  • 2.1. 3D Object Detection on Point Clouds

    Point clouds have some special characteristics (e.g., sparse and irregular), which are often not suitable for convolutional neural networks to process. Many methods [2, 38, 20, 44, 9, 23] have been proposed for 3D object detection on point clouds, such as projection methods (e.g., Complex-YOLO [35], BirdNet [4]), volumetric convolution based methods (e.g., 3DFCN [19], Vote3Deep [8]), and PointNet based methods (e.g., F-PointNet [30], STD [46]). PointNet [31] pioneered a method using raw points as input and obtained good performances, followed by many frameworks [31, 32, 14, 29, 42]. Lang et al [17] introduced the Pillar Feature Network, encoding point clouds into pseudo images and being processed by 2D CNN. Although novel and fast, the localization information of the framework [17] was not well preserved. PointNet based methods showed good performance, as they dealt with raw points directly. However, PointNet did not consider the dependence of points in information aggregation. Yang et al [46] proposed a two-stage fusion method STD, combining PointNet based methods and volumetric convolution based methods. However, the two-stage process might learn some unmatched features for object detection. VoteNet [29] proposed a new voting method, predicting the object centers with the features learned which helped aggregate distant semantic information. However, the local shape information was not well accounted for in the VoteNet. Since there can be a variety of objects, the features needed for detecting different objects may not be in an identical distribution. In other words, semantics of multi-levels may be needed for identifying different objects.
Funding
  • The research of Real Doctor AI Research Centre was partially supported by the Zhejiang University Education Foundation under grants No.K18-511120-004, No.K17-511120-017, and No.K17-518051-021, the National Natural Science Foundation of China under grant No.61672453, the National key R&D program sub project “large scale cross-modality medical knowledge management” under grant No.2018AAA0102100, the Zhejiang public welfare technology research project under grant No.LGF20F020013, the National Key R&D Program Project of “Software Testing Evaluation Method Research and its Database Development on Artificial Intelligence Medical Information System” under the Fifth Electronics Research Institute of the Ministry of Industry and Information Technology (No.2019YFC0118802), and The National Key R&D Program Project of “Full Life Cycle Detection Platform and Application Demonstration of Medical Artificial Intelligence Product” under the National Institutes for Food and Drug Control (No.2019YFB1404802), and the Key Laboratory of Medical Neurobiology of Zhejiang Province
  • Chen’s research was supported in part by NSF Grant CCF-1617735
Reference
  • Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, and Alexander A Alemi. Watch Your Step: Learning Node Embeddings via Graph Attention. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Eduardo Arnold, Omar Y Al-Jarrah, Mehrdad Dianati, Saber Fallah, David Oxtoby, and Alex Mouzakitis. A Survey on 3D Object Detection Methods for Autonomous Driving Applications. T-ITS, 2019.
    Google ScholarLocate open access versionFindings
  • James Atwood and Don Towsley. Diffusion-Convolutional Neural Networks. In NeurIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Jorge Beltran, Carlos Guindel, Francisco Miguel Moreno, Daniel Cruzado, Fernando Garcia, and Arturo De La Escalera. BirdNet: A 3D Object Detection Framework from LiDAR Information. In ITSC, 2018.
    Google ScholarLocate open access versionFindings
  • Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Michael Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In NeurIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Yuval Eldar, Michael Lindenbaum, Moshe Porat, and Yehoshua Y Zeevi. The Farthest Point Strategy for Progressive Image Sampling. IEEE Transactions on Image Processing, 1997.
    Google ScholarLocate open access versionFindings
  • Martin Engelcke, Dushyant Rao, Dominic Zeng Wang, Chi Hay Tong, and Ingmar Posner. Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks. In ICRA, 2017.
    Google ScholarLocate open access versionFindings
  • Di Feng, Lars Rosenbaum, and Klaus Dietmayer. Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network for Lidar 3D Vehicle Detection. In ITSC, 2018.
    Google ScholarLocate open access versionFindings
  • Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive Representation Learning on Large Graphs. In NeurIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Mikael Henaff, Joan Bruna, and Yann LeCun. Deep Convolutional Networks on Graph-structured Data. arXiv preprint arXiv:1506.05163, 2015.
    Findings
  • Ji Hou, Angela Dai, and Matthias Nießner. 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Qiangui Huang, Weiyue Wang, and Ulrich Neumann. Recurrent Slice Networks for 3D Segmentation of Point Clouds. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Thomas N Kipf and Max Welling. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Jean Lahoud and Bernard Ghanem. 2D-Driven 3D Object Detection in RGB-D Images. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Alex H. Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. PointPillars: Fast Encoders for Object Detection From Point Clouds. In CVPR, 2019.
    Google ScholarFindings
  • John Boaz Lee, Ryan Rossi, and Xiangnan Kong. Graph Classification Using Structural Attention. In KDD, 2018.
    Google ScholarLocate open access versionFindings
  • Bo Li. 3D Fully Convolutional Network for Vehicle Detection in Point Cloud. In IROS, 2017.
    Google ScholarLocate open access versionFindings
  • Bo Li, Tianlei Zhang, and Tian Xia. Vehicle Detection from 3D Lidar Using Fully Convolutional Network. arXiv preprint arXiv:1608.07916, 2016.
    Findings
  • Guohao Li, Matthias Muller, Ali Thabet, and Bernard Ghanem. DeepGCNs: Can GCNs Go as Deep as CNNs? In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature Pyramid Networks for Object Detection. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Or Litany et al. ASIST: Automatic Semantically Invariant Scene Transformation. Computer Vision and Image Understanding, 2017.
    Google ScholarLocate open access versionFindings
  • Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. SSD: Single Shot Multibox Detector. In ECCV, 2016.
    Google ScholarLocate open access versionFindings
  • Ziqi Liu, Chaochao Chen, Longfei Li, Jun Zhou, Xiaolong Li, Le Song, and Yuan Qi. Geniepath: Graph Neural Networks with Adaptive Receptive Paths. In AAAI, 2019.
    Google ScholarLocate open access versionFindings
  • Alessio Micheli. Neural Network for Graphs: A Contextual Constructive Approach. IEEE Transactions on Neural Networks, 2009.
    Google ScholarLocate open access versionFindings
  • Carsten Moenning and Neil A Dodgson. Fast Marching Farthest Point Sampling. Technical report, University of Cambridge, Computer Laboratory, 2003.
    Google ScholarFindings
  • Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning Convolutional Neural Networks for Graphs. In ICML, 2016.
    Google ScholarLocate open access versionFindings
  • Charles R. Qi, Or Litany, Kaiming He, and Leonidas J Guibas. Deep Hough Voting for 3D Object Detection in Point Clouds. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Charles R Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J Guibas. Frustum PointNets for 3D Object Detection from RGB-D Data. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Charles R. Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In NeurIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Zhile Ren and Erik B Sudderth. Three-Dimensional Object Detection and Layout Prediction Using Clouds of Oriented Gradients. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI, 2015.
    Google ScholarLocate open access versionFindings
  • Martin Simon, Stefan Milz, Karl Amende, and HorstMichael Gross. Complex-YOLO: An Euler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Shuran Song, Samuel P Lichtenberg, and Jianxiong Xiao. SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite. In CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • Shuran Song and Jianxiong Xiao. Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. Multi-view Convolutional Neural Networks for 3D Shape Recognition. In ICCV, 2015.
    Google ScholarLocate open access versionFindings
  • Zhi Tian et al. Fcos: Fully Convolutional One-stage Object Detection. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph Attention Networks. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Nitika Verma et al. Feastnet: Feature-steered Graph Convolutions for 3D Shape Analysis. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic Graph CNN for Learning on Point Clouds. TOG, 2019.
    Google ScholarLocate open access versionFindings
  • Kun Wei et al. Adversarial Fine-Grained Composition Learning for Unseen Attribute-Object Recognition. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Bichen Wu, Alvin Wan, Xiangyu Yue, and Kurt Keutzer. Squeeze-Seg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D Lidar Point Cloud. In ICRA, 2018.
    Google ScholarLocate open access versionFindings
  • Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How Powerful are Graph Neural Networks? In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, and Jiaya Jia. STD: Sparse-to-Dense 3D Object Detector for Point Cloud. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Li Yi, Wang Zhao, He Wang, Minhyuk Sung, and Leonidas J Guibas. GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments