SynSin: End-to-end View Synthesis from a Single Image

CVPR, pp. 7465-7475, 2019.

Cited by: 12|Bibtex|Views32|DOI:https://doi.org/10.1109/CVPR42600.2020.00749
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
While we have introduced SynSin in the context of view synthesis, we note that using a neural point cloud renderer within a generative model has applications in other tasks

Abstract:

Single image view synthesis allows for the generation of new views of a scene given a single input image. This is challenging, as it requires comprehensively understanding the 3D scene from a single image. As a result, current methods typically use multiple images, train on ground-truth depth, or are limited to synthetic data. We propos...More

Code:

Data:

0
Introduction
  • Given an image of a scene, as in Fig. 1, what would one see when turning left or walking forward? We

    Project page: www.robots.ox.ac.uk/ ̃ow/synsin.html. ∗Work done during an internship at Facebook AI Research.

    can reason that the window and the wall will extend to the left and more chairs will appear to the right.
  • The task of novel view synthesis addresses these questions: given a view of a scene, the aim is to generate images of the scene from new viewpoints.
  • This task has wide applications in image editing, animating still photographs or viewing RGB images in 3D.
  • Understanding semantics is necessary for synthesising plausible completions of partially visible objects, e.g. the chair in Fig. 1
Highlights
  • Given an image of a scene, as in Fig. 1, what would one see when turning left or walking forward? We

    Project page: www.robots.ox.ac.uk/ ̃ow/synsin.html. ∗Work done during an internship at Facebook AI Research.

    can reason that the window and the wall will extend to the left and more chairs will appear to the right
  • In this paper we introduce SynSin, a model for view synthesis from a single image in complex real-world scenes
  • To represent the 3D scene structure, we project the image into a latent feature space which is in turn transformed using a differentiable point cloud renderer
  • We report metrics on the final prediction (Both) and on the regions of the target image that are visible (Vis) and not visible (InVis) in the input image. (Vis) evaluates the quality of the learned 3D scene structure, as it can be largely solved by accurate depth prediction. (InVis) evaluates the quality of a model’s understanding of scene semantics; it requires a holistic understanding of semantic and geometric properties to reasonably in-paint missing regions
  • We introduced SynSin, an end-to-end model for performing single image view synthesis
  • While we have introduced SynSin in the context of view synthesis, we note that using a neural point cloud renderer within a generative model has applications in other tasks
Methods
  • The authors introduce SynSin (Fig. 2) and in particular how the authors overcome the two main challenges of representing 3D scene structure and scene semantics.
  • To represent the 3D scene structure, the authors project the image into a latent feature space which is in turn transformed using a differentiable point cloud renderer.
  • This renderer injects a 3D prior into the network, as the predicted 3D structure must obey geometric principles.
  • The refinement network (g) refines the rendered features to give the final generated image IG.
  • The authors enforce that IG should match the target image
Results
  • In order to determine the (Vis) and (InVis) regions, the authors use the GT depth in the input view to obtain a binary mask of which pixels are visible in the target image.
  • This is only possible on Matterport3D (RealEstate10K does not have GT depth).
  • As shown in Fig. 8, this causes severe artefacts for backward motion
Conclusion
  • The authors introduced SynSin, an end-to-end model for performing single image view synthesis.
  • At the heart of the system are two key components: first a differentiable neural point cloud renderer, and second a generative refinement module.
  • The authors verified that the approach can be learned end-to-end on multiple realistic datasets, generalises to unseen scenes, can be applied directly to higher image resolutions, and can be used to generate reasonable videos along a given trajectory.
  • While the authors have introduced SynSin in the context of view synthesis, the authors note that using a neural point cloud renderer within a generative model has applications in other tasks
Summary
  • Introduction:

    Given an image of a scene, as in Fig. 1, what would one see when turning left or walking forward? We

    Project page: www.robots.ox.ac.uk/ ̃ow/synsin.html. ∗Work done during an internship at Facebook AI Research.

    can reason that the window and the wall will extend to the left and more chairs will appear to the right.
  • The task of novel view synthesis addresses these questions: given a view of a scene, the aim is to generate images of the scene from new viewpoints.
  • This task has wide applications in image editing, animating still photographs or viewing RGB images in 3D.
  • Understanding semantics is necessary for synthesising plausible completions of partially visible objects, e.g. the chair in Fig. 1
  • Methods:

    The authors introduce SynSin (Fig. 2) and in particular how the authors overcome the two main challenges of representing 3D scene structure and scene semantics.
  • To represent the 3D scene structure, the authors project the image into a latent feature space which is in turn transformed using a differentiable point cloud renderer.
  • This renderer injects a 3D prior into the network, as the predicted 3D structure must obey geometric principles.
  • The refinement network (g) refines the rendered features to give the final generated image IG.
  • The authors enforce that IG should match the target image
  • Results:

    In order to determine the (Vis) and (InVis) regions, the authors use the GT depth in the input view to obtain a binary mask of which pixels are visible in the target image.
  • This is only possible on Matterport3D (RealEstate10K does not have GT depth).
  • As shown in Fig. 8, this causes severe artefacts for backward motion
  • Conclusion:

    The authors introduced SynSin, an end-to-end model for performing single image view synthesis.
  • At the heart of the system are two key components: first a differentiable neural point cloud renderer, and second a generative refinement module.
  • The authors verified that the approach can be learned end-to-end on multiple realistic datasets, generalises to unseen scenes, can be applied directly to higher image resolutions, and can be used to generate reasonable videos along a given trajectory.
  • While the authors have introduced SynSin in the context of view synthesis, the authors note that using a neural point cloud renderer within a generative model has applications in other tasks
Tables
  • Table1: Results on Matterport3D [<a class="ref-link" id="c4" href="#r4">4</a>], RealEstate10K [<a class="ref-link" id="c74" href="#r74">74</a>], and Replica [<a class="ref-link" id="c58" href="#r58">58</a>]. ↑ denotes higher is better, ↓ lower is better. XXY Y denotes std dev. Y Y . The ablations demonstrate the utility of each aspect of our model. We outperform all baselines for both datasets and are nearly as good as a model supervised with depth (SynSin (sup. by GT)). We also perform best when considering regions visible (Vis) and not visible (InVis) in the input view
  • Table2: SynSin performs better than a system trained with GT depth (3DView) and approaches the performance of [<a class="ref-link" id="c74" href="#r74">74</a>], which uses 2 input views at test time
  • Table3: Results when applying models trained on 256 × 256 images to 512 × 512 images
  • Table4: Table 4
  • Table5: Comparison on KITTI to [<a class="ref-link" id="c8" href="#r8">8</a>]. ↑ denotes higher is better, ↓ lower is better
Download tables as Excel
Related work
  • Research into new view synthesis has a long history in computer vision. These works differ based on whether they use multiple images or a single image at test time and on whether they require annotated 3D or semantic information.

    View synthesis from multiple images. If multiple images of a scene can be obtained, inferred 3D geometry can be used to reconstruct the scene and then generate new views. Traditionally, this was done using depth maps [5, 47] or multi-view geometry [11, 12, 15, 30, 51, 76].

    In the learning era, DNNs can be used to learn depth. [1, 9, 23, 36, 38, 41] use a DNN to improve view synthesis from a set of noisy, incomplete, or inconsistent depth maps. Given two or more images of a scene within a small baseline, [16, 56, 57, 63, 68, 74] show impressive results at synthesising views within this narrow baseline. [35, 42, 54] learn an implicit voxel representation of one object given many training views and generate new views of that object at test time. [14] use no implicit 3D representation. Unlike these methods, we assume only one image at test time.
Funding
  • Proposes a novel end-to-end model for this task using a single image at test time; it is trained on real images without any ground-truth 3D information
  • Introduces a novel differentiable point cloud renderer that is used to transform a latent 3D point cloud of features into the target view
  • Introduces SynSin, a model for view synthesis from a single image in complex real-world scenes
  • Evaluates our approach on three complex realworld datasets: Matterport , RealEstate10K , and Replica
  • Demonstrates that our approach generates high-quality images and outperforms baseline methods that use voxelbased 3D representations
Reference
  • Kara-Ali Aliev, Dmitry Ulyanov, and Victor Lempitsky. Neural point-based graphics. arXiv preprint arXiv:1906.08240, 2019.
    Findings
  • Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. Image inpainting. In Proc. ACM SIGGRAPH, 2000.
    Google ScholarLocate open access versionFindings
  • Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis. In Proc. ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3D: Learning from RGBD data in indoor environments. International Conference on 3D Vision (3DV), 2017. Matterport3D dataset available at https://niessner.github.io/Matterport/.
    Locate open access versionFindings
  • Gaurav Chaurasia, Sylvain Duchene, Olga SorkineHornung, and George Drettakis. Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics (TOG), 2013.
    Google ScholarLocate open access versionFindings
  • Weifeng Chen, Zhao Fu, Dawei Yang, and Jia Deng. Singleimage depth perception in the wild. In NeurIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In NeurIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Xu Chen, Jie Song, and Otmar Hilliges. Monocular neural image based rendering with continuous view control. Proc. ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Inchang Choi, Orazio Gallo, Alejandro Troccoli, Min H Kim, and Jan Kautz. Extreme view synthesis. Proc. ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • A. Criminisi, P. Perez, and T. Kentaro. Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on Image Processing, 2004.
    Google ScholarLocate open access versionFindings
  • Paul Debevec, Yizhou Yu, and George Borshukov. Efficient view-dependent image-based rendering with projective texture-mapping. In Rendering Techniques. 1998.
    Google ScholarFindings
  • P. E. Debevec, C. J. Taylor, and J. Malik. Modeling and rendering architecture from photographs: A hybrid geometryand image- based approach. In Proc. ACM SIGGRAPH, pages 11–20, 1996.
    Google ScholarLocate open access versionFindings
  • David Eigen, Christian Puhrsch, and Rob Fergus. Depth map prediction from a single image using a multi-scale deep network. In NeurIPS, 2014.
    Google ScholarLocate open access versionFindings
  • SM Ali Eslami, Danilo Jimenez Rezende, Frederic Besse, Fabio Viola, Ari S Morcos, Marta Garnelo, Avraham Ruderman, Andrei A Rusu, Ivo Danihelka, and Karol Gregor. Neural scene representation and rendering. Science, 360(6394), 2018.
    Google ScholarLocate open access versionFindings
  • A. W. Fitzgibbon, Y. Wexler, and A. Zisserman. Image-based rendering using image-based priors. IJCV, 63(2):141–151, 2005.
    Google ScholarLocate open access versionFindings
  • John Flynn, Michael Broxton, Paul Debevec, Matthew DuVall, Graham Fyffe, Ryan Overbeck, Noah Snavely, and Richard Tucker. Deepview: View synthesis with learned gradient descent. In Proc. CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research (IJRR), 2013.
    Google ScholarLocate open access versionFindings
  • Georgia Gkioxari, Jitendra Malik, and Justin Johnson. Mesh r-cnn. Proc. ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NeurIPS, pages 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Thibault Groueix, Matthew Fisher, Vladimir G Kim, Bryan C Russell, and Mathieu Aubry. Atlasnet: A papiermacheapproach to learning 3d surface generation. Proc. CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Peter Hedman and Johannes Kopf. Instant 3d photography. ACM Transactions on Graphics (TOG), 2018.
    Google ScholarLocate open access versionFindings
  • Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and Gabriel Brostow. Deep blending for free-viewpoint image-based rendering. ACM Transactions on Graphics (TOG), 2018.
    Google ScholarLocate open access versionFindings
  • Eldar Insafutdinov and Alexey Dosovitskiy. Unsupervised learning of shape and pose with differentiable point clouds. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Wei Jiang, Weiwei Sun, Andrea Tagliasacchi, Eduard Trulls, and Kwang Moo Yi. Linearized multi-sampling for differentiable image transformation. Proc. ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, and Jitendra Malik. Learning category-specific mesh reconstruction from image collections. In Proc. ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proc. CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (TOG), 2017.
    Google ScholarLocate open access versionFindings
  • Leif Kobbelt and Mario Botsch. A survey of point-based techniques in computer graphics. Computers & Graphics, 2004.
    Google ScholarLocate open access versionFindings
  • Johannes Kopf, Fabian Langguth, Daniel Scharstein, Richard Szeliski, and Michael Goesele. Image-based rendering in the gradient domain. ACM Transactions on Graphics (TOG), 2013.
    Google ScholarLocate open access versionFindings
  • Tejas D Kulkarni, William F Whitney, Pushmeet Kohli, and Josh Tenenbaum. Deep convolutional inverse graphics network. In NeurIPS, 2015.
    Google ScholarLocate open access versionFindings
  • Samuli Laine and Tero Karras. High-performance software rasterization on gpus. In Proc. ACM SIGGRAPH Symposium on High Performance Graphics., 2011.
    Google ScholarLocate open access versionFindings
  • Zhengqi Li and Noah Snavely. Megadepth: Learning single-view depth prediction from internet photos. In Proc. CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Shichen Liu, Weikai Chen, Tianye Li, and Hao Li. Soft rasterizer: Differentiable rendering for unsupervised single-view mesh reconstruction. Proc. ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann, and Yaser Sheikh. Neural volumes: Learning dynamic renderable volumes from images. ACM Transactions on Graphics (TOG), 2019.
    Google ScholarLocate open access versionFindings
  • Ricardo Martin-Brualla, Rohit Pandey, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Julien Valentin, Sameh Khamis, Philip Davidson, Anastasia Tkach, Peter Lincoln, et al. Lookingood: enhancing performance capture with real-time neural re-rendering. ACM Transactions on Graphics (TOG), 2018.
    Google ScholarLocate open access versionFindings
  • Kevin Matzen, Matthew Yu, Jonathan Lehman, Peizhao Zhang, Jan-Michael Frahm, Peter Vajda, Johannes Kopf, and Matt Uyttendaele. Powered by AI: Turning any 2D photo into 3D using convolutional neural nets. https://ai.facebook.com/blog/-powered-by-ai-turningany-2d-photo-into-3d-using-convolutional-neural-nets/, Feb 2020.
    Findings
  • Moustafa Meshry, Dan B Goldman, Sameh Khamis, Hugues Hoppe, Rohit Pandey, Noah Snavely, and Ricardo MartinBrualla. Neural rerendering in the wild. In Proc. CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, and Yong-Liang Yang. HoloGAN: Unsupervised learning of 3d representations from natural images. In Proc. ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Simon Niklaus, Long Mai, Jimei Yang, and Feng Liu. 3D Ken Burns effect from a single image. ACM Transactions on Graphics (TOG), 2019.
    Google ScholarLocate open access versionFindings
  • David Novotny, Ben Graham, and Jeremy Reizenstein. PerspectiveNet: A scene-consistent image generator for new view synthesis in real indoor environments. In NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Kyle Olszewski, Sergey Tulyakov, Oliver Woodford, Hao Li, and Linjie Luo. Transformable bottleneck networks. In Proc. ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Eunbyung Park, Jimei Yang, Ersin Yumer, Duygu Ceylan, and Alexander C Berg. Transformation-grounded image generation network for novel 3D view synthesis. In Proc. CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Semantic image synthesis with spatially-adaptive normalization. In Proc. CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. In Proc. CVPR, pages 2536–2544, 2016.
    Google ScholarLocate open access versionFindings
  • Eric Penner and Li Zhang. Soft 3D reconstruction for view synthesis. ACM Transactions on Graphics (TOG), 2017.
    Google ScholarLocate open access versionFindings
  • Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Proc. MICCAI, pages 234–241.
    Google ScholarLocate open access versionFindings
  • Miguel Sainz and Renato Pajarola. Point-based rendering techniques. Computers & Graphics, 2004.
    Google ScholarLocate open access versionFindings
  • Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. Habitat: A platform for embodied AI research. In Proc. ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Steven M Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. A comparison and evaluation of multi-view stereo reconstruction algorithms. In Proc. CVPR, 2006.
    Google ScholarLocate open access versionFindings
  • Daeyun Shin, Zhile Ren, Erik Sudderth, and Charless Fowlkes. 3D scene reconstruction with multi-layer depth and epipolar transformers. In Proc. ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from rgbd images. In Proc. ECCV, 2012.
    Google ScholarLocate open access versionFindings
  • Vincent Sitzmann, Justus Thies, Felix Heide, Matthias Nießner, Gordon Wetzstein, and Michael Zollhofer. DeepVoxels: Learning persistent 3D feature embeddings. In Proc. CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Vincent Sitzmann, Michael Zollhofer, and Gordon Wetzstein. Scene representation networks: Continuous 3Dstructure-aware neural scene representations. In NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Pratul P Srinivasan, Richard Tucker, Jonathan T Barron, Ravi Ramamoorthi, Ren Ng, and Noah Snavely. Pushing the boundaries of view extrapolation with multiplane images. In Proc. CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Pratul P Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng. Learning to synthesize a 4D rgbd light field from a single image. In Proc. ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul MurArtal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. Strasdat, Renzo De Nardi, Michael Goesele, Steven Lovegrove, and Richard Newcombe. The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
    Findings
  • Shao-Hua Sun, Minyoung Huh, Yuan-Hong Liao, Ning Zhang, and Joseph J Lim. Multi-view to novel view: Synthesizing novel views with self-learned confidence. In Proc. ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. Multi-view 3D models from single images with a convolutional network. In Proc. ECCV, 2016.
    Google ScholarLocate open access versionFindings
  • Piotr Teterwak, Aaron Sarna, Dilip Krishnan, Aaron Maschinot, David Belanger, Ce Liu, and William T Freeman. Boundless: Generative adversarial networks for image extension. In Proc. ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Shubham Tulsiani, Saurabh Gupta, David F Fouhey, Alexei A Efros, and Jitendra Malik. Factoring shape, pose, and layout from the 2D image of a 3D scene. In Proc. CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Shubham Tulsiani, Richard Tucker, and Noah Snavely. Layer-structured 3D scene inference via view synthesis. In Proc. ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Shubham Tulsiani, Tinghui Zhou, Alexei A Efros, and Jitendra Malik. Multi-view supervision for single-view reconstruction via differentiable ray consistency. In Proc. CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proc. CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Yi Wang, Xin Tao, Xiaoyong Shen, and Jiaya Jia. Widecontext semantic image extrapolation. In Proc. CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Daniel E. Worrall, Stephan J. Garbin, Daniyar Turmukhambetov, and Gabriel J. Brostow. Interpretable transformations with encoder-decoder networks. In Proc. ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Zexiang Xu, Sai Bi, Kalyan Sunkavalli, Sunil Hadap, Hao Su, and Ravi Ramamoorthi. Deep view synthesis from sparse photometric images. ACM Transactions on Graphics (TOG), 2019.
    Google ScholarLocate open access versionFindings
  • Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, and Honglak Lee. Perspective transformer nets: Learning single-view 3d object reconstruction without 3d supervision. In NeurIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Wang Yifan, Felice Serena, Shihao Wu, Cengiz Oztireli, and Olga Sorkine-Hornung. Differentiable surface splatting for point-based geometry processing. ACM Transactions on Graphics (TOG), 2019.
    Google ScholarLocate open access versionFindings
  • Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. Self-attention generative adversarial networks. 2019.
    Google ScholarFindings
  • Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proc. CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Tinghui Zhou, Matthew Brown, Noah Snavely, and David G Lowe. Unsupervised learning of depth and ego-motion from video. In Proc. CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. Stereo magnification: Learning view synthesis using multiplane images. ACM Transactions on Graphics (TOG), 2018.
    Google ScholarLocate open access versionFindings
  • Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. View synthesis by appearance flow. In Proc. ECCV, 2016.
    Google ScholarLocate open access versionFindings
  • C Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. High-quality video view interpolation using a layered representation. ACM transactions on graphics (TOG), 2004.
    Google ScholarLocate open access versionFindings
  • Matthias Zwicker, Hanspeter Pfister, Jeroen Van Baar, and Markus Gross. Surface splatting. In Proc. ACM SIGGRAPH, 2001.
    Google ScholarLocate open access versionFindings
  • [74] Vox w/ ours
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments