Controllable and Progressive Image Extrapolation

Cited by: 0|Bibtex|Views25
Other Links: arxiv.org
Weibo:
Let θg∗ be the optimal parameter for our generator G and we find it by minimizing the total loss Ltotal over all training pairs of scene graphs and image patches: Ltotal = Lbox + Lseg + Limg,

Abstract:

Image extrapolation aims at expanding the narrow field of view of a given image patch. Existing models mainly deal with natural scene images of homogeneous regions and have no control of the content generation process. In this work, we study conditional image extrapolation to synthesize new images guided by the input structured text. Th...More

Code:

Data:

0
Introduction
  • Given an image patch with a narrow field of view, image extrapolation aims at expanding it by generating plausible visual content outside the image boundaries.
  • To the best of the knowledge, only a few approaches [35, 55, 41] have been developed to address this topic, and all are designed for unconditional extrapolation where the target image is generated solely based on the input patch
  • This is often achieved by finding low-level cues of similar patterns from the given image or external databases.
  • An ideal model should directly take both the patch and text into account to dress pants (a) Text-driven extrapolation (b) Progressive extrapolation generate the target image
Highlights
  • Given an image patch with a narrow field of view, image extrapolation aims at expanding it by generating plausible visual content outside the image boundaries
  • We study conditional image extrapolation where the inputs are an image patch and a structured text that specifies desired properties to synthesize
  • We propose a progressive generative model that consists of three stages to extrapolate an image patch
  • Let θg∗ be the optimal parameter for our generator G and we find it by minimizing the total loss Ltotal over all training pairs of scene graphs and image patches: Ltotal = Lbox + Lseg + Limg, (3)
  • We propose a generative network to extrapolate new content outside the image boundaries
  • We decompose the learning process into three stages and introduced two important sub-tasks, of generating layouts from coarse to fine, to facilitate the training. Based on this multi-stage model, we use a curriculum learning strategy for effective model training. Both qualitative and quantitative results show that the proposed model performs favorably against the evaluated methods and is able to generate more controllable extrapolated results
Methods
  • Since there exist no exact extrapolation methods that can handle the multimodal input of image patch and text, the authors compare with the following related work.
  • The authors adapt its original training objective from inpainting to outpainting pixels outside the patch boundary and keep the rest unchanged
  • As it does not support controls from text or scene graph, the authors train the model only based on the image patch using their released code1.
  • The sg2im [17] is closely-related prominent method to synthesize image from scene graph
  • As it does not take the image patch as input, the authors copy and paste the image patch in the generated image.
Conclusion
  • The authors propose a generative network to extrapolate new content outside the image boundaries.
  • The authors decompose the learning process into three stages and introduced two important sub-tasks, of generating layouts from coarse to fine, to facilitate the training.
  • Based on this multi-stage model, the authors use a curriculum learning strategy for effective model training.
  • Both qualitative and quantitative results show that the proposed model performs favorably against the evaluated methods and is able to generate more controllable extrapolated results
Summary
  • Introduction:

    Given an image patch with a narrow field of view, image extrapolation aims at expanding it by generating plausible visual content outside the image boundaries.
  • To the best of the knowledge, only a few approaches [35, 55, 41] have been developed to address this topic, and all are designed for unconditional extrapolation where the target image is generated solely based on the input patch
  • This is often achieved by finding low-level cues of similar patterns from the given image or external databases.
  • An ideal model should directly take both the patch and text into account to dress pants (a) Text-driven extrapolation (b) Progressive extrapolation generate the target image
  • Methods:

    Since there exist no exact extrapolation methods that can handle the multimodal input of image patch and text, the authors compare with the following related work.
  • The authors adapt its original training objective from inpainting to outpainting pixels outside the patch boundary and keep the rest unchanged
  • As it does not support controls from text or scene graph, the authors train the model only based on the image patch using their released code1.
  • The sg2im [17] is closely-related prominent method to synthesize image from scene graph
  • As it does not take the image patch as input, the authors copy and paste the image patch in the generated image.
  • Conclusion:

    The authors propose a generative network to extrapolate new content outside the image boundaries.
  • The authors decompose the learning process into three stages and introduced two important sub-tasks, of generating layouts from coarse to fine, to facilitate the training.
  • Based on this multi-stage model, the authors use a curriculum learning strategy for effective model training.
  • Both qualitative and quantitative results show that the proposed model performs favorably against the evaluated methods and is able to generate more controllable extrapolated results
Tables
  • Table1: Quantitative evaluations on the Helen [<a class="ref-link" id="c21" href="#r21">21</a>] and CCP [<a class="ref-link" id="c50" href="#r50">50</a>] dataset
  • Table2: User preference towards different methods (%)
Download tables as Excel
Related work
  • Image extrapolation. Early extrapolation algorithms generally follow a retrieve-and-compose strategy where an external library of sample images that depict the similar scene is assumed to be available. For example, Efros and Freeman [9] expand the small texture patch with similar patches and develop an optimal boundary with minimum cost for composition. By extending similar textured patches to images of the similar scene category, Zhang et al [55] extrapolate photos by utilizing the self-similarity of a reference image to generate a set of local transformations. To handle different viewpoints and appearance variations, a few methods [35, 41] use library images to search good candidates and align them with the given input. However, those nonparametric methods are mainly limited in inferring semantically new content and requiring proper reference databases.
Reference
  • Martin Arjovsky, Soumith Chintala, and Leon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017. 5
    Findings
  • Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. Patchmatch: A randomized correspondence algorithm for structural image editing. In ACM Transactions on Graphics, volume 28, page 24, 2009. 2
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Jerome Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In ICML, 2009. 2, 3
    Google ScholarLocate open access versionFindings
  • Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. Image inpainting. In SIGGRAPH, 2000. 2
    Google ScholarLocate open access versionFindings
  • Haw-Shiuan Chang, Erik Learned-Miller, and Andrew McCallum. Active bias: Training more accurate neural networks by emphasizing high variance samples. In NIPS, 2017. 2
    Google ScholarLocate open access versionFindings
  • Qifeng Chen and Vladlen Koltun. Photographic image synthesis with cascaded refinement networks. In ICCV, 2017. 1, 3
    Google ScholarLocate open access versionFindings
  • Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR, 2018. 2
    Google ScholarLocate open access versionFindings
  • Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016. 5
    Google ScholarLocate open access versionFindings
  • Alexei A Efros and William T Freeman. Image quilting for texture synthesis and transfer. In SIGGRAPH, 2001. 2
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014. 2, 3
    Google ScholarLocate open access versionFindings
  • Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS, 2017. 7
    Google ScholarLocate open access versionFindings
  • Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. Multimodal unsupervised image-to-image translation. In ECCV, 2018. 2
    Google ScholarLocate open access versionFindings
  • Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. Globally and locally consistent image completion. ACM Transactions on Graphics, 36(4):107, 2017. 2
    Google ScholarLocate open access versionFindings
  • Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In CVPR, 2017. 1, 3, 4
    Google ScholarLocate open access versionFindings
  • Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In ICML, 2018. 2
    Google ScholarLocate open access versionFindings
  • Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In ECCV, 205
    Google ScholarLocate open access versionFindings
  • Justin Johnson, Agrim Gupta, and Li Fei-Fei. Image generation from scene graphs. In CVPR, 2018. 1, 2, 4, 5
    Google ScholarLocate open access versionFindings
  • Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, and Li Fei-Fei. Image retrieval using scene graphs. In CVPR, 2015. 3
    Google ScholarLocate open access versionFindings
  • Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017. 2
    Findings
  • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015. 6
    Google ScholarLocate open access versionFindings
  • Vuong Le, Jonathan Brandt, Zhe Lin, Lubomir Bourdev, and Thomas S Huang. Interactive facial feature localization. In ECCV, 2012. 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. Diverse image-to-image translation via disentangled representations. In ECCV, 2018. 2
    Google ScholarLocate open access versionFindings
  • Jianan Li, Jimei Yang, Aaron Hertzmann, Jianming Zhang, and Tingfa Xu. Layoutgan: Generating graphic layouts with wireframe discriminators. arXiv preprint arXiv:1901.06767, 2019. 2
    Findings
  • Yijun Li, Sifei Liu, Jimei Yang, and Ming-Hsuan Yang. Generative face completion. In CVPR, 2017. 2
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014. 5
    Google ScholarLocate open access versionFindings
  • Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In ECCV, 2018. 2
    Google ScholarLocate open access versionFindings
  • Guilin Liu, Fitsum A Reda, Kevin J Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. Image inpainting for irregular holes using partial convolutions. In ECCV, 2018. 2
    Google ScholarLocate open access versionFindings
  • Cewu Lu, Ranjay Krishna, Michael Bernstein, and Li FeiFei. Visual relationship detection with language priors. In ECCV, 2016. 1
    Google ScholarLocate open access versionFindings
  • Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014. 2
    Findings
  • Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Qureshi, and Mehran Ebrahimi. Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212, 2019. 2
    Findings
  • Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Semantic image synthesis with spatially-adaptive normalization. In CVPR, 2019. 1, 3, 4
    Google ScholarLocate open access versionFindings
  • Anastasia Pentina, Viktoriia Sharmanska, and Christoph H Lampert. Curriculum learning of multiple tasks. In CVPR, 2015. 2
    Google ScholarLocate open access versionFindings
  • Tobias Pohlen, Alexander Hermans, Markus Mathias, and Bastian Leibe. Full-resolution residual networks for semantic segmentation in street scenes. In CVPR, 2017. 4
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In NIPS, 2016. 7
    Google ScholarLocate open access versionFindings
  • Qi Shan, Brian Curless, Yasutaka Furukawa, Carlos Hernandez, and Steven M Seitz. Photo uncrop. In ECCV, 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • Sahil Sharma, Ashutosh Jha, Parikshit Hegde, and Balaraman Ravindran. Learning to multi-task by active sampling. arXiv preprint arXiv:1702.06053, 2017. 2
    Findings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 5
    Google ScholarLocate open access versionFindings
  • Brandon M Smith, Li Zhang, Jonathan Brandt, Zhe Lin, and Jianchao Yang. Exemplar-based face parsing. In CVPR, 2013. 5
    Google ScholarLocate open access versionFindings
  • Jian Sun, Lu Yuan, Jiaya Jia, and Heung-Yeung Shum. Image completion with structure propagation. In ACM Transactions on Graphics, volume 24, pages 861–868, 2005. 2
    Google ScholarLocate open access versionFindings
  • Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. In CVPR, 2017. 4
    Google ScholarLocate open access versionFindings
  • Miao Wang, Yukun Lai, Yuan Liang, Ralph Robert Martin, and Shi-Min Hu. Biggerpicture: data-driven image extrapolation using graph matching. ACM Transactions on Graphics, 33(6), 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. High-resolution image synthesis and semantic manipulation with conditional gans. In CVPR, 2018. 1, 3
    Google ScholarLocate open access versionFindings
  • Yi Wang, Xin Tao, Xiaojuan Qi, Xiaoyong Shen, and Jiaya Jia. Image inpainting via generative multi-column convolutional neural networks. In NIPS, 2018. 2, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Yi Wang, Xin Tao, Xiaoyong Shen, and Jiaya Jia. Widecontext semantic image extrapolation. In CVPR, 2019. 2, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Sanghyun Woo, Dahun Kim, Donghyeon Cho, and In So Kweon. Linknet: Relational embedding for scene graph. In NIPS, 2018. 1
    Google ScholarLocate open access versionFindings
  • Wei Xiong, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, and Jiebo Luo. Foreground-aware image inpainting. arXiv preprint arXiv:1901.05945, 2019. 2
    Findings
  • Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. Scene graph generation by iterative message passing. In CVPR, 2017. 1
    Google ScholarLocate open access versionFindings
  • Tianfan Xue, Jiajun Wu, Katherine Bouman, and William Freeman. Visual dynamics: stochastic future generation via layered cross convolutional networks. In NIPS, 2016. 8
    Google ScholarLocate open access versionFindings
  • Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. Attribute2image: Conditional image generation from visual attributes. In ECCV, 2016. 1, 2
    Google ScholarLocate open access versionFindings
  • Wei Yang, Ping Luo, and Liang Lin. Clothing co-parsing by joint image segmentation and labeling. In CVPR, 2014. 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, and Shuicheng Yan. Very long natural scenery image prediction by outpainting. In ICCV, 2019. 2
    Google ScholarLocate open access versionFindings
  • Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. Generative image inpainting with contextual attention. In CVPR, 2018. 2
    Google ScholarLocate open access versionFindings
  • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. Stackgan++: Realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1710.10916, 2017. 1, 2
    Findings
  • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV, 2017. 1, 2
    Google ScholarLocate open access versionFindings
  • Yinda Zhang, Jianxiong Xiao, James Hays, and Ping Tan. Framebreak: Dramatic image extrapolation by guided shiftmaps. In CVPR, 2013. 1, 2
    Google ScholarLocate open access versionFindings
  • Zizhao Zhang, Yuanpu Xie, and Lin Yang. Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In CVPR, 2018. 2
    Google ScholarLocate open access versionFindings
  • Yang Zhou, Zhen Zhu, Xiang Bai, Dani Lischinski, Daniel Cohen-Or, and Hui Huang. Non-stationary texture synthesis by adversarial expansion. In SIGGRAPH, 2018. 2
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments