## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Neural Star Domain as Primitive Representation

NIPS 2020, (2020)

EI

Keywords

Abstract

Reconstructing 3D objects from 2D images is a fundamental task in computer vision. Accurate structured reconstruction by parsimonious and semantic primitive representation further broadens its application. When reconstructing a target shape with multiple primitives, it is preferable that one can instantly access the union of basic prope...More

Code:

Data:

Introduction

- Understanding 3D objects by decomposing them into simpler shapes, called primitives, has been widely studied in computer vision [1,2,3].
- The use of implicit representations allows the set of primitives to be represented as a single collective shape by considering a union [5, 6, 12].
- This property contributes to improving the reconstruction accuracy during training

Highlights

- Understanding 3D objects by decomposing them into simpler shapes, called primitives, has been widely studied in computer vision [1,2,3]
- We propose a novel primitive representation named neural star domain (NSD) that learns shapes in the star domain by neural networks
- We demonstrate that the complexity of shapes the neural star domain can represent is equivalent to the approximation ability of the neural network
- We propose NSD as a novel primitive representation
- We show that our primitive based approach achieves significantly better semantic capability of reconstructed primitives
- A potential risk involved with NSD is that it can be extended to plagiarize 3D objects, such as furniture and appliance design etc

Methods

- The authors begin by formulating the problem setting in Section 3.1. the authors define star domain in Section 3.2.
- To realize implicit and explicit shape representation simultaneously, the authors further require Oand Pto be related as O(P(s)) = τo, where s ∈ S2 and τo ∈ [0, 1] is a constant to represent isosurface
- The authors ensure that both the composite indicator function and the surface points are approximated as O ≈ Oand P ≈ P, respectively through training losses.
- Since BSP-Net uses different train and test splits, the authors evaluated it on the intersection of the test splits from [25] and [5]

Results

- The authors show that the primitive based approach achieves significantly better semantic capability of reconstructed primitives.

Conclusion

- The authors propose NSD as a novel primitive representation. The authors show that the method consistently outperforms previous primitive-based approaches and show that the only primitive-based approach performing relatively better than the leading reconstruction technique (OccNet [25]) in single view reconstruction task.
- As the NSDN solely consists of very simple fully connected layers and mesh extraction processes, it is very fast and cost effective; one might be able to run the model on mobile devices with proper hardware optimization
- This opens up more democratized 3D reconstruction, but it comes with the possible risk of being applied to plagiarize the design of real world products by combination with 3D printers

Summary

## Introduction:

Understanding 3D objects by decomposing them into simpler shapes, called primitives, has been widely studied in computer vision [1,2,3].- The use of implicit representations allows the set of primitives to be represented as a single collective shape by considering a union [5, 6, 12].
- This property contributes to improving the reconstruction accuracy during training
## Objectives:

The authors' goal is to parametrize the 3D shape by a composite indicator function O, and surface points Pwhich can be decomposed into a collection of N primitives.## Methods:

The authors begin by formulating the problem setting in Section 3.1. the authors define star domain in Section 3.2.- To realize implicit and explicit shape representation simultaneously, the authors further require Oand Pto be related as O(P(s)) = τo, where s ∈ S2 and τo ∈ [0, 1] is a constant to represent isosurface
- The authors ensure that both the composite indicator function and the surface points are approximated as O ≈ Oand P ≈ P, respectively through training losses.
- Since BSP-Net uses different train and test splits, the authors evaluated it on the intersection of the test splits from [25] and [5]
## Results:

The authors show that the primitive based approach achieves significantly better semantic capability of reconstructed primitives.## Conclusion:

The authors propose NSD as a novel primitive representation. The authors show that the method consistently outperforms previous primitive-based approaches and show that the only primitive-based approach performing relatively better than the leading reconstruction technique (OccNet [25]) in single view reconstruction task.- As the NSDN solely consists of very simple fully connected layers and mesh extraction processes, it is very fast and cost effective; one might be able to run the model on mobile devices with proper hardware optimization
- This opens up more democratized 3D reconstruction, but it comes with the possible risk of being applied to plagiarize the design of real world products by combination with 3D printers

- Table1: Overview of shape representation in previous works. SQ denotes superquadrics [<a class="ref-link" id="c8" href="#r8">8</a>]. We regard a primitive as having an explicit representation if it has access to the explicit surface in both the inference and the training. Moreover, we say that a primitive representation is semantic if it can reconstruct semantic shapes in addition to part correspondence
- Table2: Reconstruction performance on ShapeNet [<a class="ref-link" id="c32" href="#r32">32</a>]. In the far right column of the table, denoted as time, we report per object average duration (in seconds) of mesh sampling to show the time cost to produce an evaluated mesh. Since we do not perform data augmentation as opposed to the original implementation of OccNet [<a class="ref-link" id="c25" href="#r25">25</a>], we also report the results of pretrained OccNet trained without data augmentation denoted as OccNet*
- Table3: Effects of different losses on the F-score. In the table, check marks under the implicit and explicit columns denote if the loss uses a corresponding shape representation. In NSDN, O denotes using only occupancy loss, C for using only Chamfer loss without surface point extraction, and S for using only surface point loss
- Table4: Effects of the overlap regularization
- Table5: Mesh sampling speed for given mesh properties. #V and #F denote the numbers of mesh vertices (×100) and mesh faces (×100), respectively. Ico# denotes number of icosphere subdivisions used as the mesh template of the primitive. Up# denotes the number of upsampling steps in MISE [<a class="ref-link" id="c25" href="#r25">25</a>]. Up0 equals to 323 voxel sampling and up2 to 1283
- Table6: Mean and standard deviation of discrete gaussian curvature [<a class="ref-link" id="c38" href="#r38">38</a>]

Related work

- Methods to decompose shapes to primitives have been studied extensively in computer vision [1]. Some of the classical primitives used in computer vision are generalized cylinders [2] and geons [3]. For deep generative models, cuboids [11, 10] and superquadrics [8, 9] are used to realize consistent parsing across shapes. However, these methods have poor reconstruction accuracies due to limitations in the parameter spaces of the primitives. Thus, their application is limited to shape abstraction. Using parametrized convexes to improve the reconstruction accuracy has been recently proposed in [5, 6]. However, since the shapes of the primitives are constrained to be convex, the interpretability of shapes is limited to part parsing. In this work, we study the star domain as a primitive representation that has more expressive power than that of previously proposed primitive representations.

Funding

- This work was partially supported by JST AIP Acceleration Research Grant Number JPMJCR20U3, and partially supported by JSPS KAKENHI Grant Number JP19H01115

Reference

- L. G. Roberts, Machine perception of three-dimensional solids. PhD thesis, Massachusetts Institute of Technology, 1963.
- I. Binford, “Visual perception by computer,” in Proceedings IEEE Conference of Systems and Control, 1971.
- I. Biederman, “Recognition-by-components: a theory of human image understanding.,” Psychological review, vol. 94, no. 2, p. 115, 1987.
- D. H. Laidlaw, W. B. Trumbore, and J. F. Hughes, “Constructive solid geometry for polyhedral objects,” in Proceedings International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 1986.
- Z. Chen, A. Tagliasacchi, and H. Zhang, “Bsp-net: Generating compact meshes via binary space partitioning,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- B. Deng, K. Genova, S. Yazdani, S. Bouaziz, G. Hinton, and A. Tagliasacchi, “Cvxnets: Learnable convex decomposition,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- T. Deprelle, T. Groueix, M. Fisher, V. Kim, B. Russell, and M. Aubry, “Learning elementary structures for 3d shape generation and matching,” in Advances in Neural Information Processing Systems (NeurIPS), 2019.
- D. Paschalidou, A. O. Ulusoy, and A. Geiger, “Superquadrics revisited: Learning 3d shape parsing beyond cuboids,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- D. Paschalidou, L. van Gool, and A. Geiger, “Learning unsupervised hierarchical part decomposition of 3d objects from a single rgb image,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- C. Niu, J. Li, and K. Xu, “Im2struct: Recovering 3d shape structure from a single rgb image,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- S. Tulsiani, H. Su, L. J. Guibas, A. A. Efros, and J. Malik, “Learning shape abstractions by assembling volumetric primitives,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- K. Genova, F. Cole, D. Vlasic, A. Sarna, W. T. Freeman, and T. Funkhouser, “Learning shape templates with structured implicit functions,” in Proceedings IEEE International Conference on Computer Vision (ICCV), 2019.
- T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry, “A papier-mâché approach to learning 3d surface generation,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Y. Liao, S. Donne, and A. Geiger, “Deep marching cubes: Learning explicit surface representations,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- V. Chvatal, “A combinatorial theorem in plane geometry,” Journal of Combinatorial Theory, Series B, vol. 18, no. 1, pp. 39–41, 1975.
- J. M. Keil, “Decomposing a polygon into simpler components,” SIAM Journal on Computing, vol. 14, no. 4, pp. 799–817, 1985.
- X. Liu, R. Su, S. B. Kang, and S. Heung-Yeung, “Directional histogram model for threedimensional shape similarity,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2003.
- A. Poulenard, M.-J. Rakotosaona, Y. Ponty, and M. Ovsjanikov, “Effective rotation-invariant point cnn with spherical harmonics kernels,” in Proceedings International Conference on 3D Vision (3DV), 2019.
- T. S. Cohen, M. Geiger, J. Köhler, and M. Welling, “Spherical cnns,” in Proceedings International Conference on Learning Representations (ICLR), 2018.
- R. Kondor, Z. Lin, and S. Trivedi, “Clebsch–gordan nets: a fully fourier space spherical convolutional neural network,” in Advances in Neural Information Processing Systems (NeurIPS), 2018.
- A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, “End-to-end recovery of human shape and pose,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- A. Ranjan, T. Bolkart, S. Sanyal, and M. J. Black, “Generating 3d faces using convolutional mesh autoencoders,” in Proceedings European Conference on Computer Vision (ECCV), 2018.
- N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang, “Pixel2mesh: Generating 3d mesh models from single rgb images,” in Proceedings European Conference on Computer Vision (ECCV), 2018.
- L. Gao, J. Yang, T. Wu, Y.-J. Yuan, H. Fu, Y.-K. Lai, and H. Zhang, “Sdm-net: Deep generative network for structured deformable mesh,” ACM Transactions on Graphics (TOG), vol. 38, no. 6, pp. 1–15, 2019.
- L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3d reconstruction in function space,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove, “Deepsdf: Learning continuous signed distance functions for shape representation,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- D. A. Varshalovich, A. N. Moskalev, and V. K. Khersonskii, Quantum theory of angular momentum. World Scientific, 1988.
- C. E. Burkhardt and J. J. Leventhal, Foundations of quantum physics. Springer Science & Business Media, 2008.
- K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991.
- R. Arora, A. Basu, P. Mianjy, and A. Mukherjee, “Understanding deep neural networks with rectified linear units,” in Proceedings International Conference on Learning Representations (ICLR), 2016.
- T. Nguyen-Phuoc, C. Li, L. Theis, C. Richardt, and Y.-L. Yang, “Hologan: Unsupervised learning of 3d representations from natural images,” in Proceedings IEEE International Conference on Computer Vision (ICCV), 2019.
- A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al., “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012, 2015.
- C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, “3d-r2n2: A unified approach for single and multi-view 3d object reconstruction,” in Proceedings European Conference on Computer Vision (ECCV), 2016.
- K. Mo, S. Zhu, A. X. Chang, L. Yi, S. Tripathi, L. J. Guibas, and H. Su, “Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Z. Chen, K. Yin, M. Fisher, S. Chaudhuri, and H. Zhang, “Bae-net: branched autoencoder for shape co-segmentation,” in Proceedings IEEE International Conference on Computer Vision (ICCV), 2019.
- M. Tatarchenko, S. R. Richter, R. Ranftl, Z. Li, V. Koltun, and T. Brox, “What do single-view 3d reconstruction networks learn?,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- J. Bednarik, S. Parashar, E. Gundogdu, and P. Salzmann, Mathieu andFua, “Shape reconstruction by learning differentiable surface representations,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- D. Cohen-Steiner and J.-M. Morvan, “Restricted delaunay triangulations and normal cycle,” in Proceedings Annual Symposium on Computational Geometry (SoCG), 2003.

Tags

Comments