FPConv: Learning Local Flattening for Point Convolution

Lin Yiqun
Lin Yiqun
Yan Zizheng
Yan Zizheng
Cui Shuguang
Cui Shuguang

CVPR, pp. 4292-4301, 2020.

Cited by: 11|Views46
EI
Weibo:
We report the mean of class-wise intersection over union, overall point-wise accuracy and the mean of class-wise accuracy

Abstract:

We introduce FPConv, a novel surface-style convolution operator designed for 3D point cloud analysis. Unlike previous methods, FPConv doesn't require transforming to intermediate representation like 3D grid or graph and directly works on surface geometry of point cloud. To be more specific, for each point, FPConv performs a local flatte...More

Code:

Data:

0
Introduction
  • With the rapid development of 3D scan devices, it is more and more easy to generate and access 3D data in the form of point clouds.
  • With the explosive growth of machine learning and deep learning techniques, Deep Neural Network (CNN) based methods have been introduced into this task [37, 38] and reveal promising improvements
  • Both PointNet [37] and PointNet++ [38] doesn’t support convolution operation which is a key contributing factor in Convolutional Neural Network (CNN) for efficient local processing and handling large-scale data
Highlights
  • With the rapid development of 3D scan devices, it is more and more easy to generate and access 3D data in the form of point clouds
  • We report the mean of class-wise intersection over union, overall point-wise accuracy and the mean of class-wise accuracy
  • We propose FPConv, a novel surface-style convolution operator on 3D point cloud
  • Our experiments demonstrate that FPConv significantly improved the performance of surface-style convolution methods
  • We discover that surface-style convolution can be a complementary to volumetric-style convolution and jointly training can boost the performance into state-of-the-art
Methods
  • ScanNet S3DIS S3DIS-6.
  • SPGraph [22] ResGCN [24] HPEIN [20] G- PointNet [37].
  • V FPS - 41.1 47.6.
  • PointNet++ [38] V FPS 33.9 -.
  • PointCNN [26] V FPS 45.8 57.3 -.
  • PointConv [49] V FPS 55.6 58.3† - KPConv [45].
  • TangentConv [44] S Grid 43.8 52.6 -.
  • SurfaceConv [34] S - 44.2 -
Results
  • Following [37], the authors report the results on two settings for S3DIS, the first one is evaluation on Area 5, and another one is 6-fold cross validation.
  • The authors report the mean of class-wise intersection over union, overall point-wise accuracy and the mean of class-wise accuracy.
  • For Scannet [9], the authors report the mIoU score tested on ScanNet bencemark
Conclusion
  • The authors propose FPConv, a novel surface-style convolution operator on 3D point cloud.
  • FPConv takes a local region of point cloud as input, and flattens it onto a 2D grid plane by predicting projection weights, followed by regular 2D convolutions.
  • The authors' experiments demonstrate that FPConv significantly improved the performance of surface-style convolution methods.
  • The authors believe that surface-style convolutions can play an important role in feature learning of 3D data and is a promising direction to explore.
Summary
  • Introduction:

    With the rapid development of 3D scan devices, it is more and more easy to generate and access 3D data in the form of point clouds.
  • With the explosive growth of machine learning and deep learning techniques, Deep Neural Network (CNN) based methods have been introduced into this task [37, 38] and reveal promising improvements
  • Both PointNet [37] and PointNet++ [38] doesn’t support convolution operation which is a key contributing factor in Convolutional Neural Network (CNN) for efficient local processing and handling large-scale data
  • Methods:

    ScanNet S3DIS S3DIS-6.
  • SPGraph [22] ResGCN [24] HPEIN [20] G- PointNet [37].
  • V FPS - 41.1 47.6.
  • PointNet++ [38] V FPS 33.9 -.
  • PointCNN [26] V FPS 45.8 57.3 -.
  • PointConv [49] V FPS 55.6 58.3† - KPConv [45].
  • TangentConv [44] S Grid 43.8 52.6 -.
  • SurfaceConv [34] S - 44.2 -
  • Results:

    Following [37], the authors report the results on two settings for S3DIS, the first one is evaluation on Area 5, and another one is 6-fold cross validation.
  • The authors report the mean of class-wise intersection over union, overall point-wise accuracy and the mean of class-wise accuracy.
  • For Scannet [9], the authors report the mIoU score tested on ScanNet bencemark
  • Conclusion:

    The authors propose FPConv, a novel surface-style convolution operator on 3D point cloud.
  • FPConv takes a local region of point cloud as input, and flattens it onto a 2D grid plane by predicting projection weights, followed by regular 2D convolutions.
  • The authors' experiments demonstrate that FPConv significantly improved the performance of surface-style convolution methods.
  • The authors believe that surface-style convolutions can play an important role in feature learning of 3D data and is a promising direction to explore.
Tables
  • Table1: Mean IoU of large scene segmentation result. The second column is the convolution type (graph, surface or volumetric-style) and third column indicates sampling strategy. S3DIS-6 represents 6-fold cross validation. ⊕ is fusion in final feature level while ⊗ is fusion in convolutional feature level by applying parallel block. † indicates our implementation
  • Table2: Classification Accuracy on ModelNet40 shape classification. Two large scale datasets named Stanford Large-Scale 3D Indoor Space (S3DIS) [<a class="ref-link" id="c1" href="#r1">1</a>] and ScanNet [<a class="ref-link" id="c9" href="#r9">9</a>] are used for 3D point cloud segmentation. We implement our FPConv with PyTorch [<a class="ref-link" id="c35" href="#r35">35</a>]. Momentum gradient descent optimizer is used to optimize a point-wise cross entropy loss, with a momentum of 0.98, and an initial learning rate of 0.01 scheduled by cosine LR scheduler [<a class="ref-link" id="c28" href="#r28">28</a>]. Leaky ReLU and batch normalization are applied after each layer except the last fully connected layer. We trained our models 100 epochs for S3DIS, 300 epochs for ScanNet
  • Table3: Detailed semantic segmentation scores on S3DIS Area-5. ⊕ represents fusion in final feature level while ⊗ represents fusion in convolutional feature level. Note that PointConv† indicates our implementation on S3DIS
  • Table4: Different normalization results on S3DIS area 5. 6x6 and 5x5 represent different plane sizes
  • Table5: Quantitative results of the segmentation task on evaluation dataset of ScanNet. PointConv† indicates our reimplementation of PointConv [<a class="ref-link" id="c49" href="#r49">49</a>]
  • Table6: Comparison of trainable parameters between different convolution operators on ScanNet evaluation dataset. † indicates our implementation. + mid ch /2 is halving the middle channel size of bottleneck in residual block
  • Table7: Fusion results on S3DIS area 5. ⊕ indicates fusing in final feature level
  • Table8: Detailed semantic segmentation scores on S3DIS 6-fold cross validation
Download tables as Excel
Related work
  • Deep learning based 3D data analysis has been a quite hot research topic in recent years. In this section, we mainly focus on point cloud analysis and briefly review previous works according to their underling methodologies.

    Volumetric-style point convolution Since a point cloud disorderly distributes in a 3D space without any regular structures, pioneer works sample points into grids for conventional 3D convolutions apply, but limited by high computational load and low representation efficiency [31, 50, 40, 42]. PointNet [37] proposes a shared MLP on every point individually followed by a global max-pooling to extract global feature of the input point cloud. [38] extends it with nested partitionings of point set to hierarchically learn more local features, and many works follow that to approximate point convolutions by MLPs [25, 26, 16, 47]. However, adopting such a representation can not capture the local features very well. Recent works define explicit convolution kernels for points, whose weights are directly learned like image convolutions [17, 51, 12, 2, 45]. Among them, KPConv [45] proposes a spatially deformable point convolution with any number of kernel points which alleviates both varying densities and computational cost, outperform all associated methods on point analysis tasks. However, there volumetric-style approaches may not capture uniform areas very well.
Funding
  • Acknowledge This work was supported in part by grants No.2018YFB1800800, No.ZDSYS201707251409055, NSFC-61902334, NSFC-61629101, No.2018B030338001, and No.2017ZT07X152
Reference
  • Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2016. 3, 6, 11
    Google ScholarLocate open access versionFindings
  • Matan Atzmon, Haggai Maron, and Yaron Lipman. Point convolutional neural networks by extension operators. arXiv preprint arXiv:1803.10091, 2018. 2
    Findings
  • Aseem Behl, Omid Hosseini Jafari, Siva Karthik Mustikovela, Hassan Abu Alhaija, Carsten Rother, and Andreas Geiger. Bounding boxes, segmentations and object coordinates: How important is recognition for 3d scene flow estimation in autonomous driving scenarios? In Proceedings of the IEEE International Conference on Computer Vision, pages 2574–2583, 2017. 1
    Google ScholarLocate open access versionFindings
  • Alexandre Boulch, Bertrand Le Saux, and Nicolas Audebert. Unstructured point cloud semantic labeling using deep segmentation networks. 3DOR, 2:7, 2017. 3
    Google ScholarFindings
  • Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017. 2
    Google ScholarLocate open access versionFindings
  • Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203, 2013. 2
    Findings
  • Amin P Charaniya, Roberto Manduchi, and Suresh K Lodha. Supervised parametric classification of aerial lidar data. In 2004 Conference on Computer Vision and Pattern Recognition Workshop, pages 30–30. IEEE, 2004. 1
    Google ScholarLocate open access versionFindings
  • Nesrine Chehata, Li Guo, and Clement Mallet. Contribution of airborne full-waveform lidar and image data for urban scene classification. In 2009 16th IEEE International Conference on Image Processing (ICIP), pages 1669–1672. IEEE, 2009. 1
    Google ScholarLocate open access versionFindings
  • Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5828–5839, 2017. 1, 3, 6, 7, 11
    Google ScholarLocate open access versionFindings
  • Michael Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pages 3844–3852, 2016. 2
    Google ScholarLocate open access versionFindings
  • Aleksey Golovinskiy, Vladimir G Kim, and Thomas Funkhouser. Shape-based recognition of 3d point clouds in urban environments. In 2009 IEEE 12th International Conference on Computer Vision, pages 2154–2161. IEEE, 2009. 1
    Google ScholarLocate open access versionFindings
  • Fabian Groh, Patrick Wieschollek, and Hendrik PA Lensch. Flex-convolution. In Asian Conference on Computer Vision, pages 105–122. Springer, 2018. 2
    Google ScholarLocate open access versionFindings
  • Saurabh Gupta, Pablo Arbelaez, Ross Girshick, and Jitendra Malik. Indoor scene understanding with rgb-d images: Bottom-up segmentation, object detection and semantic segmentation. International Journal of Computer Vision, 112(2):133–149, 2015. 3
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015. 5
    Findings
  • Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163, 202
    Findings
  • Pedro Hermosilla, Tobias Ritschel, Pere-Pau Vazquez, Alvar Vinacua, and Timo Ropinski. Monte carlo convolution for learning on non-uniformly sampled point clouds. In SIGGRAPH Asia 2018 Technical Papers, page 235. ACM, 2018. 2
    Google ScholarLocate open access versionFindings
  • Binh-Son Hua, Minh-Khoi Tran, and Sai-Kit Yeung. Pointwise convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 984–993, 2018. 2
    Google ScholarLocate open access versionFindings
  • Jingwei Huang, Haotian Zhang, Li Yi, Thomas Funkhouser, Matthias Nießner, and Leonidas J Guibas. Texturenet: Consistent local parametrizations for learning from highresolution signals on meshes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4440–4449, 2019. 2, 3, 6
    Google ScholarLocate open access versionFindings
  • Qiangui Huang, Weiyue Wang, and Ulrich Neumann. Recurrent slice networks for 3d segmentation of point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2626–2635, 2018. 12
    Google ScholarLocate open access versionFindings
  • Li Jiang, Hengshuang Zhao, Shu Liu, Xiaoyong Shen, ChiWing Fu, and Jiaya Jia. Hierarchical point-edge interaction network for point cloud semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pages 10433–10441, 2019. 6, 12
    Google ScholarLocate open access versionFindings
  • Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016. 2
    Findings
  • Loic Landrieu and Martin Simonovsky. Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4558–4567, 2018. 6, 12
    Google ScholarLocate open access versionFindings
  • Felix Jaremo Lawin, Martin Danelljan, Patrik Tosteberg, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. Deep projective 3d semantic segmentation. In International Conference on Computer Analysis of Images and Patterns, pages 95–107. Springer, 2017. 3
    Google ScholarLocate open access versionFindings
  • Guohao Li, Matthias Muller, Ali Thabet, and Bernard Ghanem. Deepgcns: Can gcns go as deep as cnns? In Proceedings of the IEEE International Conference on Computer Vision, pages 9267–9276, 2019. 6
    Google ScholarLocate open access versionFindings
  • Jiaxin Li, Ben M Chen, and Gim Hee Lee. So-net: Selforganizing network for point cloud analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9397–9406, 2018. 2
    Google ScholarLocate open access versionFindings
  • Yangyan Li, Rui Bu, Mingchao Sun, and Baoquan Chen. Pointcnn. arXiv preprint arXiv:1801.07791, 2018. 2, 6, 12
    Findings
  • Zhen Li, Yukang Gan, Xiaodan Liang, Yizhou Yu, Hui Cheng, and Liang Lin. Lstm-cf: Unifying context modeling and fusion with lstms for rgb-d scene labeling. In European conference on computer vision, pages 541–557. Springer, 2016. 3
    Google ScholarLocate open access versionFindings
  • Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016. 6
    Findings
  • Andelo Martinovic, Jan Knopp, Hayko Riemenschneider, and Luc Van Gool. 3d all the way: Semantic segmentation of urban scenes from start to end in 3d. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4456–4465, 2015. 1
    Google ScholarLocate open access versionFindings
  • Jonathan Masci, Davide Boscaini, Michael Bronstein, and Pierre Vandergheynst. Geodesic convolutional neural networks on riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops, pages 37–45, 2015. 2
    Google ScholarLocate open access versionFindings
  • Daniel Maturana and Sebastian Scherer. Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 922–928. IEEE, 2015. 1, 2
    Google ScholarLocate open access versionFindings
  • John McCormac, Ankur Handa, Andrew Davison, and Stefan Leutenegger. Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In 2017 IEEE International Conference on Robotics and automation (ICRA), pages 4628–4635. IEEE, 2017. 3
    Google ScholarLocate open access versionFindings
  • Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodola, Jan Svoboda, and Michael M Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5115–5124, 2017. 2
    Google ScholarLocate open access versionFindings
  • Hao Pan, Shilin Liu, Yang Liu, and Xin Tong. Convolutional neural networks on 3d surfaces using parallel frames. arXiv preprint arXiv:1808.04952, 2018. 2, 3, 6
    Findings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS-W, 2017. 6
    Google ScholarLocate open access versionFindings
  • Antonio Pomares, Jorge L Martınez, Anthony Mandow, Marıa A Martınez, Mariano Moran, and Jesus Morales. Ground extraction from 3d lidar point clouds with the classification learner app. In 2018 26th Mediterranean Conference on Control and Automation (MED), pages 1–9. IEEE, 2018. 1
    Google ScholarLocate open access versionFindings
  • Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 652–660, 2017. 1, 2, 4, 6, 7, 12
    Google ScholarLocate open access versionFindings
  • Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pages 5099–5108, 2017. 1, 2, 5, 6
    Google ScholarLocate open access versionFindings
  • Jason Rambach, Alain Pagani, and Didier Stricker. [poster] augmented things: Enhancing ar applications leveraging the internet of things and universal 3d object tracking. In 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct), pages 103–108. IEEE, 2017. 1
    Google ScholarLocate open access versionFindings
  • Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. Octnet: Learning deep 3d representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3577–3586, 2017. 1, 2
    Google ScholarLocate open access versionFindings
  • Martin Simonovsky and Nikos Komodakis. Dynamic edgeconditioned filters in convolutional neural networks on graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3693–3702, 2017. 2
    Google ScholarLocate open access versionFindings
  • Shuran Song, Fisher Yu, Andy Zeng, Angel X Chang, Manolis Savva, and Thomas Funkhouser. Semantic scene completion from a single depth image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1746–1754, 2017. 2
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015. 6
    Google ScholarLocate open access versionFindings
  • Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, and QianYi Zhou. Tangent convolutions for dense prediction in 3d. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3887–3896, 2018. 2, 3, 6
    Google ScholarLocate open access versionFindings
  • Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Francois Goulette, and Leonidas J Guibas. Kpconv: Flexible and deformable convolution for point clouds. arXiv preprint arXiv:1904.08889, 2019. 1, 2, 5, 6, 7, 8, 11, 12
    Findings
  • Nitika Verma, Edmond Boyer, and Jakob Verbeek. Feastnet: Feature-steered graph convolutions for 3d shape analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2598–2606, 2018. 2
    Google ScholarLocate open access versionFindings
  • Shenlong Wang, Simon Suo, Wei-Chiu Ma, Andrei Pokrovsky, and Raquel Urtasun. Deep parametric continuous convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2589–2597, 2018. 2
    Google ScholarLocate open access versionFindings
  • Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG), 38(5):146, 2019. 2
    Google ScholarLocate open access versionFindings
  • Wenxuan Wu, Zhongang Qi, and Li Fuxin. Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9621–9630, 2019. 1, 6, 7, 11
    Google ScholarLocate open access versionFindings
  • Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • Yifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, and Yu Qiao. Spidercnn: Deep learning on point sets with parameterized convolutional filters. In Proceedings of the European Conference on Computer Vision (ECCV), pages 87–102, 2018. 2
    Google ScholarLocate open access versionFindings
  • Li Yi, Hao Su, Xingwen Guo, and Leonidas J Guibas. Syncspeccnn: Synchronized spectral cnn for 3d shape segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2282–2290, 2017. 2
    Google ScholarLocate open access versionFindings
  • Xiangyu Yue, Bichen Wu, Sanjit A Seshia, Kurt Keutzer, and Alberto L Sangiovanni-Vincentelli. A lidar point cloud generator: from a virtual world to autonomous driving. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pages 458–464. ACM, 2018. 1
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments