ALIGNet: Partial-Shape Agnostic Alignment via Unsupervised Learning

ACM Transactions on Graphics (TOG), Volume abs/1804.08497, Issue 1, 2019.

Cited by: 11|Bibtex|Views174|DOI:https://doi.org/10.1145/3267347
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|dl.acm.org|arxiv.org
Weibo:
All networks are trained on strictly binary silhouettes, but we overlay a checkerboard texture on the source shape to clearly visualize the smoothness of the estimated mappings and display the texture transfer abilities facilitated by our system

Abstract:

The process of aligning a pair of shapes is a fundamental operation in computer graphics. Traditional approaches rely heavily on matching corresponding points or features to guide the alignment, a paradigm that falters when significant shape portions are missing. These techniques generally do not incorporate prior knowledge about expected...More

Code:

Data:

0
Introduction
  • Shape registration is a fundamental problem in computer graphics and computer vision, with diverse applications ranging from object recognition and scene understanding to texture or attribute transfer and synthesis.
  • While successful in straightforward scenarios, the limitations of this paradigm are revealed as the authors seek to match increasingly distinct shapes differing in geometry and topology.
  • Given such differences, restricting the pool of allowed transformations is often insufficient, requiring higher-order deformations.
  • Traditional methods locally optimize the alignment between a pair of shapes without incorporating prior knowledge about their geometric and semantic identity.
  • This information may provide invaluable cues as to the expected outcome, and guide the alignment both generally, as well as in and around
Highlights
  • Shape registration is a fundamental problem in computer graphics and computer vision, with diverse applications ranging from object recognition and scene understanding to texture or attribute transfer and synthesis
  • We introduce a total variation penalty that is key to producing smooth deformations
  • All networks are trained on strictly binary silhouettes, but we overlay a checkerboard texture on the source shape to clearly visualize the smoothness of the estimated mappings and display the texture transfer abilities facilitated by our system
  • Aligning one shape to another with a high order deformation is a fundamentally ill-posed problem, but when missing parts are thrown into the mix, the ambiguity only deepens
  • The system learns the space of plausible deformations associated with the dataset, and effectively becomes agnostic to missing parts, without memorization or over-fitting
Methods
  • The input to the system is made up of pairs of source and target shapes, where a target shape may be partial at various locations and to varying extents.
  • Similar to Jaderberg et al (2015), the authors train the network from scratch by setting the convolutional layer weights randomly, and initializing the last FC to yield the identity displacement field
  • This is followed by the differentiable FFD sampling layer, which warps the source to the target by upsampling the low resolution FFD
Results
  • RESULTS AND EVALUATION

    the authors examine and test ALIGNet through various qualitative and quantitative evaluations.
  • All experiments are performed on the reserved test set data, meaning that these shapes were never trained on.
  • All networks are trained on strictly binary silhouettes, but the authors overlay a checkerboard texture on the source shape to clearly visualize the smoothness of the estimated mappings and display the texture transfer abilities facilitated by the system.
  • 5.1 Training Data In the experiments, the authors train ALIGNet on a class of shapes where pairs of source and target instances are randomly drawn from the pre-defined training set.
  • The authors have experimented with sets containing renders of lower and upper case letters for 330 fonts, as well as 2D projections of 3D objects from ShapeNet (2015) and COSEG (Wang et al 2012)
Conclusion
  • Aligning one shape to another with a high order deformation is a fundamentally ill-posed problem, but when missing parts are thrown into the mix, the ambiguity only deepens.
  • Instead of synthesizing completely new shapes from scratch, the authors learn to align one shape to another, resulting in a tool that enables them to deform a high quality source shape to a low quality target, thereby forming novel instances.
  • By taking this route, the authors simplify the learning process, requiring a smaller network that is less sensitive and able to converge faster
Summary
  • Introduction:

    Shape registration is a fundamental problem in computer graphics and computer vision, with diverse applications ranging from object recognition and scene understanding to texture or attribute transfer and synthesis.
  • While successful in straightforward scenarios, the limitations of this paradigm are revealed as the authors seek to match increasingly distinct shapes differing in geometry and topology.
  • Given such differences, restricting the pool of allowed transformations is often insufficient, requiring higher-order deformations.
  • Traditional methods locally optimize the alignment between a pair of shapes without incorporating prior knowledge about their geometric and semantic identity.
  • This information may provide invaluable cues as to the expected outcome, and guide the alignment both generally, as well as in and around
  • Methods:

    The input to the system is made up of pairs of source and target shapes, where a target shape may be partial at various locations and to varying extents.
  • Similar to Jaderberg et al (2015), the authors train the network from scratch by setting the convolutional layer weights randomly, and initializing the last FC to yield the identity displacement field
  • This is followed by the differentiable FFD sampling layer, which warps the source to the target by upsampling the low resolution FFD
  • Results:

    RESULTS AND EVALUATION

    the authors examine and test ALIGNet through various qualitative and quantitative evaluations.
  • All experiments are performed on the reserved test set data, meaning that these shapes were never trained on.
  • All networks are trained on strictly binary silhouettes, but the authors overlay a checkerboard texture on the source shape to clearly visualize the smoothness of the estimated mappings and display the texture transfer abilities facilitated by the system.
  • 5.1 Training Data In the experiments, the authors train ALIGNet on a class of shapes where pairs of source and target instances are randomly drawn from the pre-defined training set.
  • The authors have experimented with sets containing renders of lower and upper case letters for 330 fonts, as well as 2D projections of 3D objects from ShapeNet (2015) and COSEG (Wang et al 2012)
  • Conclusion:

    Aligning one shape to another with a high order deformation is a fundamentally ill-posed problem, but when missing parts are thrown into the mix, the ambiguity only deepens.
  • Instead of synthesizing completely new shapes from scratch, the authors learn to align one shape to another, resulting in a tool that enables them to deform a high quality source shape to a low quality target, thereby forming novel instances.
  • By taking this route, the authors simplify the learning process, requiring a smaller network that is less sensitive and able to converge faster
Tables
  • Table1: ALIGNet Architecture and Parameters Used for All Results (Except Where Otherwise Specified), Where m and n Represent the Resolution of the Grid
  • Table2: Quantitative Results for 3D Point Cloud Registration on the Airplane Class samples IOU
  • Table3: Quantitative Results of Average IOU on the Entire Test Set of the Vase, Airplane and Vessel Classes vase
  • Table4: Quantitative Results on Complete (Not Partial) Shapes vase
Download tables as Excel
Related work
  • Common shape registration methods estimate a transformation from a pre-determined class of transformations, by evaluating the overall shape-to-shape alignment. The classic Iterative Closest Point (ICP) method (Besl and McKay 1992) uses nearest neighbor correspondences to refine the transformation, which minimizes the mismatch between the source and target points. Follow-up works propose numerous variants of ICP involving modifying the constraints defined to compute the transformation (Rusinkiewicz and Levoy 2001). Other methods assume a global probabilistic approach (Jian and Vemuri 2011; Myronenko and Song 2010; Tsin and Kanade 2004), or more recently incorporate local structure to further improve results (Ma et al 2014, 2016). To handle partial shapes, these approaches must explicitly incorporate outlier rejection and intelligently select a subset of appropriate points, a challenging task in and of itself.

    To handle partial shapes, approaches such as RANSAC (Fischler and Bolles 1981), sample a small number of points to form candidate transformation models, and determine the best one by voting. This approach works well when the two shapes share similar
Funding
  • This research was supported by the Israel Science Foundation as part of the ISF-NSFC joint program grant number (2217/15, 2472/17), and partially supported by ISF grant 2366/16
Study subjects and analysis
test pairs: 870
For each class dataset, we reserve 30 models for testing and use the rest for training. For example, a class with a set of 330 shapes yields ∼90k distinct training pairs and 870 test pairs. We augment the number of training set examples by generating an arbitrarily large amount of partial data, and by applying small vertical and horizontal scaling on-the-fly during training

point samples: 3000
ALIGNet, trained on complete surfaces with segments removed at random, generalizes to sparse surfaces at test time. We compute the average IOU between deformed dource and full target given target in point cloud form for both 500 and 3000 point samples. Note that the IOU is computed on the full voxelized representation of the shapes generated from their mesh forms, where the applied warp field is computed using the point cloud representation of the target

reserved test set pairs: 870
5.6 Comparisons. In this section we compare the results of applying ALIGNet, versus existing approaches, on the 870 reserved test set pairs for each class (as described in Section 5.1). The quantitative results are summarized in Table 3, with visual examples shown in Figure 22

Reference
  • Dror Aiger, Niloy J. Mitra, and Daniel Cohen-Or. 2008. 4-points congruent sets for robust pairwise surface registration. In ACM SIGGRAPH 2008 Paper (SIGGRAPH’08). ACM, 85:1–85:10.
    Google ScholarLocate open access versionFindings
  • S. Belongie, J. Malik, and J. Puzicha. 200Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24, 4 (Apr. 2002), 509– 522.
    Google ScholarLocate open access versionFindings
  • P. J. Besl and N. D. McKay. 1992. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 2 (Feb. 1992), 239–256.
    Google ScholarLocate open access versionFindings
  • R. Hanocka et al. Fred L. Bookstein. 1989. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11, 6 (1989), 567–585.
    Google ScholarLocate open access versionFindings
  • Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 201ShapeNet: An Information-Rich 3D Model Repository. Technical Report. Stanford University/Princeton University /Toyota Technological Institute at Chicago.
    Google ScholarFindings
  • Christopher B. Choy, JunYoung Gwak, Silvio Savarese, and Manmohan Chandraker. 201Universal correspondence network. In Advances in Neural Information Processing Systems. 2414–2422.
    Google ScholarLocate open access versionFindings
  • Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazirbas, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, and Thomas Brox. 2015. FlowNet: Learning optical flow with convolutional networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). IEEE Computer Society, 2758–2766.
    Google ScholarLocate open access versionFindings
  • Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (1981), 381–395.
    Google ScholarLocate open access versionFindings
  • Ravi Garg, Vijay Kumar BG, Gustavo Carneiro, and Ian Reid. 2016. Unsupervised cnn for single view depth estimation: Geometry to the rescue. In European Conference on Computer Vision. Springer, 740–756.
    Google ScholarLocate open access versionFindings
  • Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial transformer networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15). MIT Press, Cambridge, MA, 2017–2025.
    Google ScholarLocate open access versionFindings
  • B. Jian and B. C. Vemuri. 20Robust point set registration using gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 33, 8 (Aug. 2011), 1633–1645.
    Google ScholarLocate open access versionFindings
  • Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision. Springer, 694–711.
    Google ScholarLocate open access versionFindings
  • Angjoo Kanazawa, David W. Jacobs, and Manmohan Chandraker. 2016. Warpnet: Weakly supervised matching for single-view reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3253–3261.
    Google ScholarLocate open access versionFindings
  • Longin Jan Latecki and Rolf Lakamper. 2000. Shape similarity measure based on correspondence of visual parts. IEEE Trans. Pattern Anal. Mach. Intell. 22, 10 (2000), 1185–1190.
    Google ScholarLocate open access versionFindings
  • Jun Li, Yongjun Chen, Lei Cai, Ian Davidson, and Shuiwang Ji. 2017. Dense transformer networks. arXiv Preprint arXiv:1705.08881 (2017).
    Findings
  • Haibin Ling and David W. Jacobs. 2007. Shape classification using the inner-distance. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2 (2007), 286–299.
    Google ScholarLocate open access versionFindings
  • Or Litany, Emanuele Rodolà, Alex M. Bronstein, and Michael M. Bronstein. 20Fully spectral partial shape matching. In Computer Graphics Forum, Vol. 36. Wiley Online Library, 247–258.
    Google ScholarLocate open access versionFindings
  • Jiayi Ma, Ji Zhao, Jinwen Tian, Alan L. Yuille, and Zhuowen Tu. 2014. Robust point matching via vector field consensus. IEEE Trans. Image Process. 23, 4 (2014), 1706– 1721.
    Google ScholarLocate open access versionFindings
  • Jiayi Ma, Ji Zhao, and Alan L. Yuille. 2016. Non-rigid point set registration by preserving global and local structures. IEEE Trans. Image Process. 25, 1 (2016), 53–64.
    Google ScholarLocate open access versionFindings
  • G. Mori and J. Malik. 2003. Recognizing objects in adversarial clutter: Breaking a visual CAPTCHA. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. I–134–I–141 vol.1.
    Google ScholarLocate open access versionFindings
  • A. Myronenko and X. Song. 2010. Point set registration: Coherent point drift. IEEE Trans. Pattern Anal. Mach. Intell. 32, 12 (Dec. 2010), 2262–2275.
    Google ScholarLocate open access versionFindings
  • S. Rusinkiewicz and M. Levoy. 2001. Efficient variants of the ICP algorithm. In Proceedings of the 3rd International Conference on 3-D Digital Imaging and Modeling. 145–152.
    Google ScholarLocate open access versionFindings
  • Thomas W. Sederberg and Scott R. Parry. 1986. Free-form deformation of solid geometric models. ACM SIGGRAPH Comput. Graph. 20, 4 (Aug. 1986), 151–160.
    Google ScholarLocate open access versionFindings
  • Amit Shaked and Lior Wolf. 2016. Improved stereo matching with constant highway networks and reflective confidence learning. arXiv Preprint arXiv:1701.00165 (2016).
    Findings
  • Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, and Francesc Moreno-Noguer. 2015. Discriminative learning of deep convolutional feature point descriptors. In Proceedings of the IEEE International Conference on Computer Vision. 118–126.
    Google ScholarLocate open access versionFindings
  • Ayush Tewari, Michael Zollhöfer, Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Perez, and Christian Theobalt. 2017. Mofa: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17), Vol. 2.
    Google ScholarLocate open access versionFindings
  • A. Thayananthan, B. Stenger, P. H. S. Torr, and R. Cipolla. 2003. Shape context and chamfer matching in cluttered scenes. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’03). IEEE Computer Society, Los Alamitos, CA, 127–133.
    Google ScholarLocate open access versionFindings
  • Bin Fan Yurun Tian and Fuchao Wu. 2017. L2-net: Deep learning of discriminative patch descriptor in euclidean space. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’17).
    Google ScholarLocate open access versionFindings
  • Yanghai Tsin and Takeo Kanade. 2004. A Correlation-Based Approach to Robust Point Set Registration. Springer, 558–569.
    Google ScholarFindings
  • Yunhai Wang, Shmulik Asafi, Oliver van Kaick, Hao Zhang, Daniel Cohen-Or, and Baoquan Chen. 2012. Active co-analysis of a set of shapes. ACM Trans. Graph. 31, 6 (2012), 165.
    Google ScholarLocate open access versionFindings
  • M. Ersin Yumer and Niloy J. Mitra. 2016. Learning semantic deformation flows with 3d convolutional networks. In European Conference on Computer Vision. Springer, 294–311.
    Google ScholarLocate open access versionFindings
  • Jure Zbontar and Yann LeCun. 2016. Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 1–32 (2016), 2.
    Google ScholarLocate open access versionFindings
  • Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, and Thomas Funkhouser. 2017. 3DMatch: Learning local geometric descriptors from RGB-D reconstructions. In CVPR.
    Google ScholarFindings
  • Yefeng Zheng and D. Doermann. 2006. Robust point matching for nonrigid shapes by preserving local neighborhood structures. IEEE Trans. Pattern Anal. Mach. Intell. 28, 4 (Apr. 2006), 643–649.
    Google ScholarLocate open access versionFindings
  • Received September 2017; revised July 2018; accepted July 2018
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments