AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We have presented a method that can learn a 3D model of a deformable object category from an unconstrained collection of single-view images of the object category

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

CVPR, pp.1-10, (2020)

被引用31|浏览1399
EI
下载 PDF 全文
引用
微博一下

摘要

We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categor...更多

代码

数据

0
简介
  • Understanding the 3D structure of images is key in many computer vision applications.
  • Futhermore, while many deep networks appear to understand images as 2D textures [16], 3D modelling can explain away much of the variability of natural images and potentially improve image understanding in general.
  • Motivated by these facts, the authors consider the problem of learning 3D models for deformable object categories.
  • The authors' learning algorithm ingests a number of single-view images of a deformable object category and produces as output a deep network that can estimate the 3D shape of any instance given a single image of it (Fig. 1)
重点内容
  • Understanding the 3D structure of images is key in many computer vision applications
  • Futhermore, while many deep networks appear to understand images as 2D textures [16], 3D modelling can explain away much of the variability of natural images and potentially improve image understanding in general
  • Our learning algorithm ingests a number of single-view images of a deformable object category and produces as output a deep network that can estimate the 3D shape of any instance given a single image of it (Fig. 1)
  • We have presented a method that can learn a 3D model of a deformable object category from an unconstrained collection of single-view images of the object category
  • We have shown that symmetry and illumination are strong cues for shape and help the model to converge to a meaningful reconstruction
  • As for future work, the model currently represents 3D shape from a canonical viewpoint using a depth map, which is sufficient for objects such as faces that have a roughly convex shape and a natural canonical viewpoint
方法
  • As the authors have only raw images to learn from, the learning objective is reconstructive: namely, the model is trained so that the combination of the four factors gives back the input image.
  • This results in an autoencoding pipeline where the factors have, due to the way they are recomposed, an explicit photo-geometric meaning.
  • The authors' model estimates, for each pixel in the input image, a confidence score that explains the probability of the pixel having a symmetric counterpart in the image
结果
  • Table 2 uses the BFM dataset to compare the depth reconstruction quality obtained by the method, a fully-supervised baseline and two baselines.
  • The supervised baseline is a version of the model trained to regress the ground-truth depth maps using an L1 loss.
  • The trivial baseline predicts a constant uniform depth map, which provides a performance lower-bound.
  • The authors show reconstruction results on face paintings and drawings collected from [9] and the Internet in Figs.
结论
  • The effect of lighting could be incorporated in the albedo a by interpreting the latter as a texture rather than as the object’s albedo.
  • The model is able to obtain high-fidelity monocular 3D reconstructions of individual object instances.
  • This is trained based on a reconstruction loss without any supervision, resembling an autoencoder.
  • The authors have shown that symmetry and illumination are strong cues for shape and help the model to converge to a meaningful reconstruction.
  • It may be possible to extend the model to use either multiple canonical views or a different 3D representation, such as a mesh or a voxel map
表格
  • Table1: Comparison with selected prior work: supervision, goals, and data. I: image, 3DMM: 3D morphable model, 2DKP: 2D keypoints, 2DS: 2D silhouette, 3DP: 3D points, VP: viewpoint, E: expression, 3DM: 3D mesh, 3DV: 3D volume, D: depth, N: normals, A: albedo, T: texture, L: light. † can also recover A and L in post-processing
  • Table2: Comparison with baselines. SIDE and MAD errors of our reconstructions on the BFM dataset compared against a fully-supervised and trivial baselines
  • Table3: Ablation study. Refer to Section 4.2 for details
  • Table4: Asymmetric perturbation. We add asymmetric perturbations to BFM and show that confidence maps allow the model to reject such noise, while the vanilla model without confidence maps breaks
  • Table5: Table 5
  • Table6: Training details and hyper-parameter settings
  • Table7: Network architecture for viewpoint and lighting
  • Table8: Network architecture for depth and albedo. The output channel size cout is 1 for depth and 3 for albedo
  • Table9: Network architecture for confidence maps. The network outputs two pairs of confidence maps at different spatial resolutions for photometric and perceptual losses
Download tables as Excel
相关工作
  • In order to assess our contribution in relation to the vast literature on image-based 3D reconstruction, it is important to consider three aspects of each approach: which information is used, which assumptions are made, and what the output is. Below and in Table 1 we compare our contribution to prior works based on these factors.

    Paper Supervision Goals

    Data [47] 3D scans [66] 3DV, I [1] 3DP [48] 3DM Face

    Prior on 3DV, predict from I ShapeNet, Ikea Prior on 3DP ShapeNet

    Prior on 3DM [17] 3DMM, 2DKP, I Refine 3DMM fit to I [15] 3DMM, 2DKP, I Fit 3DMM to I+2DKP [18] 3DMM

    Fit 3DMM to 3D scans Face [28] 3DMM, 2DKP Pred. 3DMM from I

    Humans [51] 3DMM, 2DS+KP Pred. N, A, L from I [64] 3DMM, I
基金
  • This work is jointly supported by Facebook Research and ERC Horizon 2020 research and innovation programme IDIU 638009
引用论文
  • Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. Learning representations and generative models for 3D point clouds. In Proc. ICML, 2018. 2, 3
    Google ScholarLocate open access versionFindings
  • Pulkit Agrawal, Joao Carreira, and Jitendra Malik. Learning to see by moving. In Proc. ICCV, 2015. 3
    Google ScholarLocate open access versionFindings
  • Peter N. Belhumeur, David J. Kriegman, and Alan L. Yuille. The bas-relief ambiguity. IJCV, 1999. 4
    Google ScholarLocate open access versionFindings
  • Christoph Bregler, Aaron Hertzmann, and Henning Biermann. Recovering non-rigid 3D shape from image streams. In Proc. CVPR, 2000. 2
    Google ScholarLocate open access versionFindings
  • Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015
    Findings
  • Ching-Hang Chen, Ambrish Tyagi, Amit Agrawal, Dylan Drover, Rohith MV, Stefan Stojanov, and James M. Rehg. Unsupervised 3d pose estimation with geometric selfsupervision. In Proc. CVPR, 2019. 3
    Google ScholarLocate open access versionFindings
  • Wenzheng Chen, Huan Ling, Jun Gao, Edward Smith, Jaako Lehtinen, Alec Jacobson, and Sanja Fidler. Learning to predict 3d objects with an interpolation-based differentiable renderer. In NeurIPS, 2019. 2, 3
    Google ScholarLocate open access versionFindings
  • Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. VoxCeleb2: Deep speaker recognition. In INTERSPEECH, 20112
    Google ScholarLocate open access versionFindings
  • Elliot J. Crowley, Omkar M. Parkhi, and Andrew Zisserman. Face painting: querying art with photos. In Proc. BMVC, 2015. 7, 12
    Google ScholarLocate open access versionFindings
  • David Eigen, Christian Puhrsch, and Rob Fergus. Depth map prediction from a single image using a multi-scale deep network. In NeurIPS, 2014. 5
    Google ScholarLocate open access versionFindings
  • Olivier Faugeras and Quang-Tuan Luong. The Geometry of Multiple Images. MIT Press, 2001. 2, 5
    Google ScholarFindings
  • Alexandre R. J. Francois, Gerard G. Medioni, and Roman Waupotitsch. Mirror symmetry ⇒ 2-view stereo geometry. Image and Vision Computing, 2003. 2, 3
    Google ScholarLocate open access versionFindings
  • Matheus Gadelha, Subhransu Maji, and Rui Wang. 3D shape induction from 2D views of multiple objects. In 3DV, 2017. 2
    Google ScholarLocate open access versionFindings
  • Yuan Gao and Alan L. Yuille. Exploiting symmetry and/or manhattan properties for 3d object structure estimation from single and multiple images. In Proc. CVPR, 2017. 2
    Google ScholarLocate open access versionFindings
  • Baris Gecer, Stylianos Ploumpis, Irene Kotsia, and Stefanos Zafeiriou. GANFIT: Generative adversarial network fitting for high fidelity 3D face reconstruction. In Proc. CVPR, 2019. 2, 3
    Google ScholarLocate open access versionFindings
  • Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In Proc. ICML, 2019. 1
    Google ScholarLocate open access versionFindings
  • Zhenglin Geng, Chen Cao, and Sergey Tulyakov. 3D guided fine-grained face manipulation. In Proc. CVPR, 2019. 2
    Google ScholarLocate open access versionFindings
  • Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Luthi, Sandro Schonborn, and Thomas Vetter. Morphable face models - an open framework. In Proc. Int. Conf. Autom. Face and Gesture Recog., 202, 3
    Google ScholarLocate open access versionFindings
  • Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In Proc. ICLR, 2018. 7
    Google ScholarLocate open access versionFindings
  • Clement Godard, Oisin Mac Aodha, and Gabriel J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In Proc. CVPR, 2017. 2, 3
    Google ScholarLocate open access versionFindings
  • Ralph Gross, Iain Matthews, Jeffrey Cohn, Takeo Kanade, and Simon Baker. Multi-pie. Image and Vision Computing, 2010. 5
    Google ScholarLocate open access versionFindings
  • Paul Henderson and Vittorio Ferrari. Learning single-image 3D reconstruction by generative modelling of shape, pose and shading. IJCV, 2019. 2, 8
    Google ScholarLocate open access versionFindings
  • Philipp Henzler, Niloy Mitra, and Tobias Ritschel. Escaping plato’s cave using adversarial training: 3d shape from unstructured 2d image collections. In Proc. ICCV, 2019. 2, 3
    Google ScholarLocate open access versionFindings
  • Berthold Horn. Obtaining shape from shading information. In The Psychology of Computer Vision, 1975. 4
    Google ScholarFindings
  • Berthold K. P. Horn and Michael J. Brooks. Shape from Shading. MIT Press, Cambridge Massachusetts, 1989. 2, 3
    Google ScholarFindings
  • Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. Spatial transformer networks. In NeurIPS, 2015. 11
    Google ScholarLocate open access versionFindings
  • Laszlo A. Jeni, Jeffrey F. Cohn, and Takeo Kanade. Dense 3d face alignment from 2d videos in real-time. In Proc. Int. Conf. Autom. Face and Gesture Recog., 2015. 5
    Google ScholarLocate open access versionFindings
  • Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. In Proc. CVPR, 2018. 2, 3
    Google ScholarLocate open access versionFindings
  • Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, and Jitendra Malik. Learning category-specific mesh reconstruction from image collections. In Proc. ECCV, 2018. 2, 3
    Google ScholarLocate open access versionFindings
  • Hiroharu Kato and Tatsuya Harada. Learning view priors for single-view 3d reconstruction. In Proc. CVPR, 2019. 2, 3
    Google ScholarLocate open access versionFindings
  • Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. Neural 3d mesh renderer. In Proc. CVPR, 2018. 3, 11
    Google ScholarLocate open access versionFindings
  • Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? In NeurIPS, 2017. 4
    Google ScholarLocate open access versionFindings
  • Jan J Koenderink. What does the occluding contour tell us about solid shape? Perception, 1984. 2
    Google ScholarLocate open access versionFindings
  • Shichen Liu, Tianye Li, Weikai Chen, and Hao Li. Soft rasterizer: A differentiable renderer for image-based 3d reasoning. In Proc. ICCV, 2019. 3
    Google ScholarLocate open access versionFindings
  • Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proc. ICCV, 2015. 5
    Google ScholarLocate open access versionFindings
  • Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. SMPL: A skinned multiperson linear model. ACM TOG, 34(6):248, 2015. 3
    Google ScholarLocate open access versionFindings
  • Matthew M. Loper and Michael J. Black. OpenDR: An approximate differentiable renderer. In Proc. ECCV, 2014. 3
    Google ScholarLocate open access versionFindings
  • Yue Luo, Jimmy Ren, Mude Lin, Jiahao Pang, Wenxiu Sun, Hongsheng Li, and Liang Lin. Single view stereo matching. In Proc. CVPR, 2018. 3
    Google ScholarLocate open access versionFindings
  • Andrew L. Maas, Awni Y. Hannun, and Andrew Y. Ng. Rectifier nonlinearities improve neural network acoustic models. In Proc. ICML, 2013. 12
    Google ScholarLocate open access versionFindings
  • Joel Ruben Antony Moniz, Christopher Beckham, Simon Rajotte, Sina Honari, and Christopher Pal. Unsupervised depth estimation, 3d face rotation and replacement. In NeurIPS, 2018. 2, 3, 6, 8
    Google ScholarLocate open access versionFindings
  • Dipti P. Mukherjee, Andrew Zisserman, and J. Michael Brady. Shape from symmetry – detecting and exploiting symmetry in affine images. Philosophical Transactions of the Royal Society of London, 351:77–106, 1995. 2, 3
    Google ScholarLocate open access versionFindings
  • Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, and Yong-Liang Yang. Hologan: Unsupervised learning of 3d representations from natural images. In Proc. ICCV, 2019. 3
    Google ScholarLocate open access versionFindings
  • David Novotny, Diane Larlus, and Andrea Vedaldi. Learning 3d object categories by looking around them. In Proc. ICCV, 2017. 3
    Google ScholarLocate open access versionFindings
  • David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, and Andrea Vedaldi. C3DPO: Canonical 3d pose networks for non-rigid structure from motion. In Proc. ICCV, 2019. 2, 4
    Google ScholarLocate open access versionFindings
  • Augustus Odena, Vincent Dumoulin, and Chris Olah. Deconvolution and checkerboard artifacts. Distill, 2016. 11
    Google ScholarLocate open access versionFindings
  • Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman, and C. V. Jawahar. Cats and dogs. In Proc. CVPR, 2012. 5, 7, 12
    Google ScholarLocate open access versionFindings
  • Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. A 3D face model for pose and illumination invariant face recognition. In Advanced video and signal based surveillance, 2009. 2, 3, 5
    Google ScholarFindings
  • Anurag Ranjan, Timo Bolkart, Soubhik Sanyal, and Michael J. Black. Generating 3D faces using convolutional mesh autoencoders. In Proc. ECCV, 2018. 2, 3
    Google ScholarLocate open access versionFindings
  • Mihir Sahasrabudhe, Zhixin Shu, Edward Bartrum, Riza Alp Guler, Dimitris Samaras, and Iasonas Kokkinos. Lifting autoencoders: Unsupervised learning of a fully-disentangled 3d morphable model using deep non-rigid structure from motion. In Proc. ICCV Workshops, 2019. 2, 3, 8
    Google ScholarLocate open access versionFindings
  • Soubhik Sanyal, Timo Bolkart, Haiwen Feng, and Michael J. Black. Learning to regress 3D face shape and expression from an image without 3D supervision. In Proc. CVPR, 2019. 2
    Google ScholarLocate open access versionFindings
  • Soumyadip Sengupta, Angjoo Kanazawa, Carlos D. Castillo, and David Jacobs. SfSNet: Learning shape, refectance and illuminance of faces in the wild. In Proc. CVPR, 2018. 2, 5
    Google ScholarLocate open access versionFindings
  • Zhixin Shu, Mihir Sahasrabudhe, Alp Guler, Dimitris Samaras, Nikos Paragios, and Iasonas Kokkinos. Deforming autoencoders: Unsupervised disentangling of shape and appearance. In Proc. ECCV, 2018. 3, 4
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. ICLR, 2015. 5
    Google ScholarLocate open access versionFindings
  • Sudipta N. Sinha, Krishnan Ramnath, and Richard Szeliski. Detecting and reconstructing 3d mirror symmetric objects. In Proc. ECCV, 2012. 2, 3
    Google ScholarLocate open access versionFindings
  • Supasorn Suwajanakorn, Noah Snavely, Jonathan Tompson, and Mohammad Norouzi. Discovery of latent 3d keypoints via end-to-end geometric reasoning. In NeurIPS, 2018. 3
    Google ScholarLocate open access versionFindings
  • Attila Szabo, Givi Meishvili, and Paolo Favaro. Unsupervised generative 3d shape learning from natural images. arXiv preprint arXiv:1910.00287, 2019. 2, 3, 8
    Findings
  • Ayush Tewari, Michael Zollhofer, Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Perez, and Christian Theobalt. MoFA: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In Proc. ICCV, 2017. 6, 8
    Google ScholarLocate open access versionFindings
  • James Thewlis, Hakan Bilen, and Andrea Vedaldi. Unsupervised learning of object frames by dense equivariant image labelling. In NeurIPS, 2017. 3
    Google ScholarLocate open access versionFindings
  • James Thewlis, Hakan Bilen, and Andrea Vedaldi. Modelling and unsupervised learning of symmetric deformable object categories. In NeurIPS, 2018. 3
    Google ScholarLocate open access versionFindings
  • Sebastian Thrun and Ben Wegbreit. Shape from symmetry. In Proc. ICCV, 2005. 2, 3
    Google ScholarLocate open access versionFindings
  • Hsiao-Yu Fish Tung, Adam W. Harley, William Seto, and Katerina Fragkiadaki. Adversarial inverse graphics networks: Learning 2d-to-3d lifting and image-to-image translation from unpaired supervision. In Proc. ICCV, 2017. 6, 8
    Google ScholarLocate open access versionFindings
  • Benjamin Ummenhofer, Huizhong Zhou, Jonas Uhrig, Nikolaus Mayer, Eddy Ilg, Alexey Dosovitskiy, and Thomas Brox. Demon: Depth and motion network for learning monocular stereo. In Proc. CVPR, 2017. 2
    Google ScholarLocate open access versionFindings
  • Chaoyang Wang, Jose Miguel Buenaposada, Rui Zhu, and Simon Lucey. Learning depth from monocular videos using direct methods. In Proc. CVPR, 2018. 3
    Google ScholarLocate open access versionFindings
  • Mengjiao Wang, Zhixin Shu, Shiyang Cheng, Yannis Panagakis, Dimitris Samaras, and Stefanos Zafeiriou. An adversarial neuro-tensorial approach for learning disentangled representations. IJCV, 2019. 2, 3
    Google ScholarLocate open access versionFindings
  • Andrew P. Witkin. Recovering surface shape and orientation from texture. Artificial Intelligence, 1981. 2
    Google ScholarLocate open access versionFindings
  • Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, and Joshua B. Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In NeurIPS, 2016. 2, 3
    Google ScholarLocate open access versionFindings
  • Yuxin Wu and Kaiming He. Group normalization. In Proc. ECCV, 2018. 12
    Google ScholarLocate open access versionFindings
  • Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Proc. CVPR, 2010. 5
    Google ScholarLocate open access versionFindings
  • Lijun Yin, Xiaochen Chen, Yi Sun, Tony Worm, and Michael Reale. A high-resolution 3d dynamic facial expression database. In Proc. Int. Conf. Autom. Face and Gesture Recog., 2008. 5
    Google ScholarLocate open access versionFindings
  • Matthew D. Zeiler, Graham W. Taylor, and Rob Fergus. Adaptive deconvolutional networks for mid and high level feature learning. In Proc. ICCV, 2011. 11
    Google ScholarLocate open access versionFindings
  • Ruo Zhang, Ping-Sing Tsai, James Edwin Cryer, and Mubarak Shah. Shape-from-shading: a survey. IEEE PAMI, 1999. 2, 3
    Google ScholarLocate open access versionFindings
  • Weiwei Zhang, Jian Sun, and Xiaoou Tang. Cat head detection - how to effectively exploit shape and texture features. In Proc. ECCV, 2008. 5, 7, 12
    Google ScholarLocate open access versionFindings
  • Xing Zhang, Lijun Yin, Jeffrey F. Cohn, Shaun Canavan, Michael Reale, Andy Horowitz, Peng Liu, and Jeffrey M. Girard. Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing, 32(10):692–706, 2014. 5
    Google ScholarLocate open access versionFindings
  • Tinghui Zhou, Matthew Brown, Noah Snavely, and David G. Lowe. Unsupervised learning of depth and ego-motion from video. In Proc. CVPR, 2017. 2, 3
    Google ScholarLocate open access versionFindings
  • Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum, and William T. Freeman. Visual object networks: Image generation with disentangled 3D representations. In NeurIPS, 2018. 3
    Google ScholarLocate open access versionFindings
  • 6. Supplementary Material
    Google ScholarFindings
  • 7. Qualitative Results
    Google ScholarFindings
您的评分 :
0

 

最佳论文
2019年, 荣获CVPR的最佳论文奖
标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科