AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose neural star domain as a novel primitive representation

Neural Star Domain as Primitive Representation

NIPS 2020, (2020)

Cited by: 0|Views10
EI
Full Text
Bibtex
Weibo

Abstract

Reconstructing 3D objects from 2D images is a fundamental task in computer vision. Accurate structured reconstruction by parsimonious and semantic primitive representation further broadens its application. When reconstructing a target shape with multiple primitives, it is preferable that one can instantly access the union of basic prope...More

Code:

Data:

0
Introduction
  • Understanding 3D objects by decomposing them into simpler shapes, called primitives, has been widely studied in computer vision [1,2,3].
  • The use of implicit representations allows the set of primitives to be represented as a single collective shape by considering a union [5, 6, 12].
  • This property contributes to improving the reconstruction accuracy during training
Highlights
  • Understanding 3D objects by decomposing them into simpler shapes, called primitives, has been widely studied in computer vision [1,2,3]
  • We propose a novel primitive representation named neural star domain (NSD) that learns shapes in the star domain by neural networks
  • We demonstrate that the complexity of shapes the neural star domain can represent is equivalent to the approximation ability of the neural network
  • We propose NSD as a novel primitive representation
  • We show that our primitive based approach achieves significantly better semantic capability of reconstructed primitives
  • A potential risk involved with NSD is that it can be extended to plagiarize 3D objects, such as furniture and appliance design etc
Methods
  • The authors begin by formulating the problem setting in Section 3.1. the authors define star domain in Section 3.2.
  • To realize implicit and explicit shape representation simultaneously, the authors further require Oand Pto be related as O(P(s)) = τo, where s ∈ S2 and τo ∈ [0, 1] is a constant to represent isosurface
  • The authors ensure that both the composite indicator function and the surface points are approximated as O ≈ Oand P ≈ P, respectively through training losses.
  • Since BSP-Net uses different train and test splits, the authors evaluated it on the intersection of the test splits from [25] and [5]
Results
  • The authors show that the primitive based approach achieves significantly better semantic capability of reconstructed primitives.
Conclusion
  • The authors propose NSD as a novel primitive representation. The authors show that the method consistently outperforms previous primitive-based approaches and show that the only primitive-based approach performing relatively better than the leading reconstruction technique (OccNet [25]) in single view reconstruction task.
  • As the NSDN solely consists of very simple fully connected layers and mesh extraction processes, it is very fast and cost effective; one might be able to run the model on mobile devices with proper hardware optimization
  • This opens up more democratized 3D reconstruction, but it comes with the possible risk of being applied to plagiarize the design of real world products by combination with 3D printers
Summary
  • Introduction:

    Understanding 3D objects by decomposing them into simpler shapes, called primitives, has been widely studied in computer vision [1,2,3].
  • The use of implicit representations allows the set of primitives to be represented as a single collective shape by considering a union [5, 6, 12].
  • This property contributes to improving the reconstruction accuracy during training
  • Objectives:

    The authors' goal is to parametrize the 3D shape by a composite indicator function O, and surface points Pwhich can be decomposed into a collection of N primitives.
  • Methods:

    The authors begin by formulating the problem setting in Section 3.1. the authors define star domain in Section 3.2.
  • To realize implicit and explicit shape representation simultaneously, the authors further require Oand Pto be related as O(P(s)) = τo, where s ∈ S2 and τo ∈ [0, 1] is a constant to represent isosurface
  • The authors ensure that both the composite indicator function and the surface points are approximated as O ≈ Oand P ≈ P, respectively through training losses.
  • Since BSP-Net uses different train and test splits, the authors evaluated it on the intersection of the test splits from [25] and [5]
  • Results:

    The authors show that the primitive based approach achieves significantly better semantic capability of reconstructed primitives.
  • Conclusion:

    The authors propose NSD as a novel primitive representation. The authors show that the method consistently outperforms previous primitive-based approaches and show that the only primitive-based approach performing relatively better than the leading reconstruction technique (OccNet [25]) in single view reconstruction task.
  • As the NSDN solely consists of very simple fully connected layers and mesh extraction processes, it is very fast and cost effective; one might be able to run the model on mobile devices with proper hardware optimization
  • This opens up more democratized 3D reconstruction, but it comes with the possible risk of being applied to plagiarize the design of real world products by combination with 3D printers
Tables
  • Table1: Overview of shape representation in previous works. SQ denotes superquadrics [<a class="ref-link" id="c8" href="#r8">8</a>]. We regard a primitive as having an explicit representation if it has access to the explicit surface in both the inference and the training. Moreover, we say that a primitive representation is semantic if it can reconstruct semantic shapes in addition to part correspondence
  • Table2: Reconstruction performance on ShapeNet [<a class="ref-link" id="c32" href="#r32">32</a>]. In the far right column of the table, denoted as time, we report per object average duration (in seconds) of mesh sampling to show the time cost to produce an evaluated mesh. Since we do not perform data augmentation as opposed to the original implementation of OccNet [<a class="ref-link" id="c25" href="#r25">25</a>], we also report the results of pretrained OccNet trained without data augmentation denoted as OccNet*
  • Table3: Effects of different losses on the F-score. In the table, check marks under the implicit and explicit columns denote if the loss uses a corresponding shape representation. In NSDN, O denotes using only occupancy loss, C for using only Chamfer loss without surface point extraction, and S for using only surface point loss
  • Table4: Effects of the overlap regularization
  • Table5: Mesh sampling speed for given mesh properties. #V and #F denote the numbers of mesh vertices (×100) and mesh faces (×100), respectively. Ico# denotes number of icosphere subdivisions used as the mesh template of the primitive. Up# denotes the number of upsampling steps in MISE [<a class="ref-link" id="c25" href="#r25">25</a>]. Up0 equals to 323 voxel sampling and up2 to 1283
  • Table6: Mean and standard deviation of discrete gaussian curvature [<a class="ref-link" id="c38" href="#r38">38</a>]
Download tables as Excel
Related work
  • Methods to decompose shapes to primitives have been studied extensively in computer vision [1]. Some of the classical primitives used in computer vision are generalized cylinders [2] and geons [3]. For deep generative models, cuboids [11, 10] and superquadrics [8, 9] are used to realize consistent parsing across shapes. However, these methods have poor reconstruction accuracies due to limitations in the parameter spaces of the primitives. Thus, their application is limited to shape abstraction. Using parametrized convexes to improve the reconstruction accuracy has been recently proposed in [5, 6]. However, since the shapes of the primitives are constrained to be convex, the interpretability of shapes is limited to part parsing. In this work, we study the star domain as a primitive representation that has more expressive power than that of previously proposed primitive representations.
Funding
  • This work was partially supported by JST AIP Acceleration Research Grant Number JPMJCR20U3, and partially supported by JSPS KAKENHI Grant Number JP19H01115
Reference
  • L. G. Roberts, Machine perception of three-dimensional solids. PhD thesis, Massachusetts Institute of Technology, 1963.
    Google ScholarFindings
  • I. Binford, “Visual perception by computer,” in Proceedings IEEE Conference of Systems and Control, 1971.
    Google ScholarLocate open access versionFindings
  • I. Biederman, “Recognition-by-components: a theory of human image understanding.,” Psychological review, vol. 94, no. 2, p. 115, 1987.
    Google ScholarLocate open access versionFindings
  • D. H. Laidlaw, W. B. Trumbore, and J. F. Hughes, “Constructive solid geometry for polyhedral objects,” in Proceedings International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 1986.
    Google ScholarLocate open access versionFindings
  • Z. Chen, A. Tagliasacchi, and H. Zhang, “Bsp-net: Generating compact meshes via binary space partitioning,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
    Google ScholarLocate open access versionFindings
  • B. Deng, K. Genova, S. Yazdani, S. Bouaziz, G. Hinton, and A. Tagliasacchi, “Cvxnets: Learnable convex decomposition,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
    Google ScholarLocate open access versionFindings
  • T. Deprelle, T. Groueix, M. Fisher, V. Kim, B. Russell, and M. Aubry, “Learning elementary structures for 3d shape generation and matching,” in Advances in Neural Information Processing Systems (NeurIPS), 2019.
    Google ScholarLocate open access versionFindings
  • D. Paschalidou, A. O. Ulusoy, and A. Geiger, “Superquadrics revisited: Learning 3d shape parsing beyond cuboids,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • D. Paschalidou, L. van Gool, and A. Geiger, “Learning unsupervised hierarchical part decomposition of 3d objects from a single rgb image,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
    Google ScholarLocate open access versionFindings
  • C. Niu, J. Li, and K. Xu, “Im2struct: Recovering 3d shape structure from a single rgb image,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • S. Tulsiani, H. Su, L. J. Guibas, A. A. Efros, and J. Malik, “Learning shape abstractions by assembling volumetric primitives,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    Google ScholarLocate open access versionFindings
  • K. Genova, F. Cole, D. Vlasic, A. Sarna, W. T. Freeman, and T. Funkhouser, “Learning shape templates with structured implicit functions,” in Proceedings IEEE International Conference on Computer Vision (ICCV), 2019.
    Google ScholarLocate open access versionFindings
  • T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry, “A papier-mâché approach to learning 3d surface generation,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Y. Liao, S. Donne, and A. Geiger, “Deep marching cubes: Learning explicit surface representations,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • V. Chvatal, “A combinatorial theorem in plane geometry,” Journal of Combinatorial Theory, Series B, vol. 18, no. 1, pp. 39–41, 1975.
    Google ScholarLocate open access versionFindings
  • J. M. Keil, “Decomposing a polygon into simpler components,” SIAM Journal on Computing, vol. 14, no. 4, pp. 799–817, 1985.
    Google ScholarLocate open access versionFindings
  • X. Liu, R. Su, S. B. Kang, and S. Heung-Yeung, “Directional histogram model for threedimensional shape similarity,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2003.
    Google ScholarLocate open access versionFindings
  • A. Poulenard, M.-J. Rakotosaona, Y. Ponty, and M. Ovsjanikov, “Effective rotation-invariant point cnn with spherical harmonics kernels,” in Proceedings International Conference on 3D Vision (3DV), 2019.
    Google ScholarLocate open access versionFindings
  • T. S. Cohen, M. Geiger, J. Köhler, and M. Welling, “Spherical cnns,” in Proceedings International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • R. Kondor, Z. Lin, and S. Trivedi, “Clebsch–gordan nets: a fully fourier space spherical convolutional neural network,” in Advances in Neural Information Processing Systems (NeurIPS), 2018.
    Google ScholarLocate open access versionFindings
  • A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, “End-to-end recovery of human shape and pose,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • A. Ranjan, T. Bolkart, S. Sanyal, and M. J. Black, “Generating 3d faces using convolutional mesh autoencoders,” in Proceedings European Conference on Computer Vision (ECCV), 2018.
    Google ScholarLocate open access versionFindings
  • N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang, “Pixel2mesh: Generating 3d mesh models from single rgb images,” in Proceedings European Conference on Computer Vision (ECCV), 2018.
    Google ScholarLocate open access versionFindings
  • L. Gao, J. Yang, T. Wu, Y.-J. Yuan, H. Fu, Y.-K. Lai, and H. Zhang, “Sdm-net: Deep generative network for structured deformable mesh,” ACM Transactions on Graphics (TOG), vol. 38, no. 6, pp. 1–15, 2019.
    Google ScholarLocate open access versionFindings
  • L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3d reconstruction in function space,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove, “Deepsdf: Learning continuous signed distance functions for shape representation,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • D. A. Varshalovich, A. N. Moskalev, and V. K. Khersonskii, Quantum theory of angular momentum. World Scientific, 1988.
    Google ScholarLocate open access versionFindings
  • C. E. Burkhardt and J. J. Leventhal, Foundations of quantum physics. Springer Science & Business Media, 2008.
    Google ScholarFindings
  • K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991.
    Google ScholarLocate open access versionFindings
  • R. Arora, A. Basu, P. Mianjy, and A. Mukherjee, “Understanding deep neural networks with rectified linear units,” in Proceedings International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • T. Nguyen-Phuoc, C. Li, L. Theis, C. Richardt, and Y.-L. Yang, “Hologan: Unsupervised learning of 3d representations from natural images,” in Proceedings IEEE International Conference on Computer Vision (ICCV), 2019.
    Google ScholarLocate open access versionFindings
  • A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al., “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012, 2015.
    Findings
  • C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, “3d-r2n2: A unified approach for single and multi-view 3d object reconstruction,” in Proceedings European Conference on Computer Vision (ECCV), 2016.
    Google ScholarLocate open access versionFindings
  • K. Mo, S. Zhu, A. X. Chang, L. Yi, S. Tripathi, L. J. Guibas, and H. Su, “Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Z. Chen, K. Yin, M. Fisher, S. Chaudhuri, and H. Zhang, “Bae-net: branched autoencoder for shape co-segmentation,” in Proceedings IEEE International Conference on Computer Vision (ICCV), 2019.
    Google ScholarLocate open access versionFindings
  • M. Tatarchenko, S. R. Richter, R. Ranftl, Z. Li, V. Koltun, and T. Brox, “What do single-view 3d reconstruction networks learn?,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • J. Bednarik, S. Parashar, E. Gundogdu, and P. Salzmann, Mathieu andFua, “Shape reconstruction by learning differentiable surface representations,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
    Google ScholarLocate open access versionFindings
  • D. Cohen-Steiner and J.-M. Morvan, “Restricted delaunay triangulations and normal cycle,” in Proceedings Annual Symposium on Computational Geometry (SoCG), 2003.
    Google ScholarLocate open access versionFindings
Author
Your rating :
0

 

Tags
Comments
小科