AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We have found it difficult to optimize the adversarial objective in isolation: standard procedures often lead to the well-known problem of mode collapse, where all input images map to the same output image and the optimization fails to make progress

Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), no. 1 (2017): 2242-2251

引用11881|浏览794
EI
下载 PDF 全文
引用
微博一下

摘要

Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source do...更多

代码

数据

0
简介
  • Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.
  • The authors seek an algorithm that can learn to translate between domains without paired input-output examples (Figure 2, right).
  • The authors adopt an adversarial loss to learn the mapping such that the translated image cannot be distinguished from images in the target domain.
重点内容
  • Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs
  • In practice, we have found it difficult to optimize the adversarial objective in isolation: standard procedures often lead to the well-known problem of mode collapse, where all input images map to the same output image and the optimization fails to make progress [13]
  • Our objective contains kinds of two terms: adversarial losses [14] for matching the distribution of generated images to the data distribution in the target domain; and a cycle consistency loss to prevent the where λ controls the relative importance of the two objectives. learned mappings G and F from contradicting each other
  • We introduce a similar adversarial loss for the mapping function F : Y → X and its discriminator DX as well: i.e. LGAN(F, DX , Y, X)
  • We first compare our approach against recent methods for unpaired image-to-image translation on paired datasets where ground truth input-output pairs are available for evaluation
  • We study the importance of both the adversarial loss and the cycle consistency loss, and compare our full method against several variants
结果
  • The authors' objective contains kinds of two terms: adversarial losses [14] for matching the distribution of generated images to the data distribution in the target domain; and a cycle consistency loss to prevent the where λ controls the relative importance of the two objectives.
  • Adversarial training can, in theory, learn mappings G and F that produce outputs identically distributed as target domains Y and X respectively [13].
  • The authors first compare the approach against recent methods for unpaired image-to-image translation on paired datasets where ground truth input-output pairs are available for evaluation.
  • Pixel loss + GAN [42] Like the method, Shrivastava et al [42] uses an adversarial loss to train a translation from X to Y .
  • The authors exclude pixel loss + GAN and feature loss + GAN in the figures, as both of the methods fail to produce results at all close to the target domain.
  • The authors evaluate the method with the cycle loss in only one direction: GAN+forward cycle loss Ex∼pdata(x)[ F (G(x)) − x 1], or GAN+backward cycle loss Ey∼pdata(y)[ G(F (y)) − y 1] (Equation 2) and find that it often incurs training instability and causes mode collapse, especially for the direction of the mapping that was removed.
  • Collection style transfer (Figure 8) The authors train the model on landscape photographs downloaded from Flickr and WikiArt. Note that unlike recent work on “neural style transfer” [11], the method learns to mimic the style of an entire set of artworks (e.g. Van Gogh), rather than transferring the style of a single selected piece of art (e.g. Starry Night).
结论
  • Photo generation from paintings (Figure 9) For painting→photo, the authors find that it is helpful to introduce an additional loss to encourage the mapping to preserve color composition between the input and output.
  • When learning the mapping between Monet’s paintings and Flickr photographs, the generator often maps paintings of daytime to photographs taken during sunset, because such a mapping may be valid under the adversarial loss and cycle consistency loss.
  • On the task of dog→cat transfiguration, the learned translation degenerates to making minimal changes to the input (Figure 12).
表格
  • Table1: AMT “real vs fake” test on maps↔aerial photos
  • Table2: FCN-scores for different methods, evaluated on Cityscapes labels→photos
  • Table3: Classification performance of photo→labels for different methods on cityscapes
Download tables as Excel
相关工作
  • Generative Adversarial Networks (GANs) [14, 58] have achieved impressive results in image generation [5, 35], image editing [61], and representation learning [35, 39, 33]. Recent methods adopt the same idea for conditional image generation applications, such as text2image [36], image inpainting [34], and future prediction [32], as well as to other domains like videos [50] and 3D models [53]. The key to GANs’ success is the idea of an adversarial loss that forces the generated images to be, in principle, indistinguishable from real images. This is particularly powerful for image generation tasks, as this is exactly the objective that much of computer graphics aims to optimize. We adopt an adversarial loss to learn the mapping such that the translated image cannot be distinguished from images in the target domain.

    Image-to-Image Translation The idea of image-to-image translation goes back at least to Hertzmann et al.’s Image Analogies [17], who employ a nonparametric texture model [8] on a single input-output training image pair. More
基金
  • This work was supported in part by NSF SMA-1514512, NSF IIS1633310, a Google Research Award, Intel Corp, and hardware donations from NVIDIA
  • JYZ is supported by the Facebook Graduate Fellowship and TP is supported by the Samsung Scholarship
引用论文
  • Y. Aytar, L. Castrejon, C. Vondrick, H. Pirsiavash, and A. Torralba. Cross-modal scene networks. arXiv preprint arXiv:1610.09003, 2016. 3
    Findings
  • K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan. Unsupervised pixel-level domain adaptation with generative adversarial networks. arXiv preprint arXiv:1612.05424, 2016. 3
    Findings
  • R. W. Brislin. Back-translation for cross-cultural research. Journal of cross-cultural psychology, 1(3):185– 216, 1970. 2, 3
    Google ScholarLocate open access versionFindings
  • M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016. 2, 6
    Google ScholarLocate open access versionFindings
  • E. L. Denton, S. Chintala, R. Fergus, et al. Deep generative image models using a laplacian pyramid of adversarial networks. In NIPS, pages 1486–1494, 2012
    Google ScholarLocate open access versionFindings
  • J. Donahue, P. Krahenbuhl, and T. Darrell. Adversarial feature learning. arXiv preprint arXiv:1605.09782, 2015
    Findings
  • V. Dumoulin, I. Belghazi, B. Poole, A. Lamb, M. Arjovsky, O. Mastropietro, and A. Courville. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016. 5
    Findings
  • A. A. Efros and T. K. Leung. Texture synthesis by non-parametric sampling. In ICCV, volume 2, pages 1033–103IEEE, 1999. 2
    Google ScholarLocate open access versionFindings
  • D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In ICCV, pages 2650–2658, 2015. 2
    Google ScholarLocate open access versionFindings
  • L. A. Gatys, M. Bethge, A. Hertzmann, and E. Shechtman. Preserving color in neural artistic style transfer. arXiv preprint arXiv:1606.05897, 2016. 3
    Findings
  • L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. CVPR, 2016. 3, 6, 8
    Google ScholarLocate open access versionFindings
  • C. Godard, O. Mac Aodha, and G. J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In CVPR, 2017. 3
    Google ScholarLocate open access versionFindings
  • I. Goodfellow. Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016. 2, 4
    Findings
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 202, 3, 4, 5
    Google ScholarLocate open access versionFindings
  • D. He, Y. Xia, T. Qin, L. Wang, N. Yu, T. Liu, and W.-Y. Ma. Dual learning for machine translation. In NIPS, pages 820–828, 2016. 3
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, pages 770– 778, 204
    Google ScholarLocate open access versionFindings
  • A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image analogies. In SIGGRAPH, pages 327–340. ACM, 2001. 2
    Google ScholarLocate open access versionFindings
  • G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006. 4
    Google ScholarLocate open access versionFindings
  • Q.-X. Huang and L. Guibas. Consistent shape maps via semidefinite programming. In Computer Graphics Forum, volume 32, pages 177–186. Wiley Online Library, 2013. 3
    Google ScholarLocate open access versionFindings
  • P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-toimage translation with conditional adversarial networks. In CVPR, 2017. 2, 3, 4, 5
    Google ScholarLocate open access versionFindings
  • J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In ECCV, pages 694–711. Springer, 2016. 2, 3, 4
    Google ScholarLocate open access versionFindings
  • L. Karacan, Z. Akata, A. Erdem, and E. Erdem. Learning to generate images of outdoor scenes from attributes and semantic layouts. arXiv preprint arXiv:1612.00215, 2016. 3
    Findings
  • D. P. Kingma and M. Welling. Auto-encoding variational bayes. ICLR, 2014. 3
    Google ScholarLocate open access versionFindings
  • P.-Y. Laffont, Z. Ren, X. Tao, C. Qian, and J. Hays. Transient attributes for high-level understanding and editing of outdoor scenes. ACM Transactions on Graphics (TOG), 33(4):149, 2014. 2
    Google ScholarLocate open access versionFindings
  • C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802, 2016. 4
    Findings
  • C. Li and M. Wand. Precomputed real-time texture synthesis with markovian generative adversarial networks. ECCV, 2016. 4
    Google ScholarLocate open access versionFindings
  • M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation networks. arXiv preprint arXiv:1703.00848, 2017. 3
    Findings
  • M.-Y. Liu and O. Tuzel. Coupled generative adversarial networks. In NIPS, pages 469–477, 2016. 3, 5
    Google ScholarLocate open access versionFindings
  • J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015. 2, 3, 6
    Google ScholarLocate open access versionFindings
  • A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015. 4
    Findings
  • X. Mao, Q. Li, H. Xie, R. Y. Lau, and Z. Wang. Multiclass generative adversarial networks with the l2 loss function. arXiv preprint arXiv:1611.04076, 2016. 4
    Findings
  • M. Mathieu, C. Couprie, and Y. LeCun. Deep multiscale video prediction beyond mean square error. ICLR, 2016. 2
    Google ScholarLocate open access versionFindings
  • M. F. Mathieu, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun. Disentangling factors of variation in deep representation using adversarial training. In NIPS, pages 5040–5048, 2016. 2
    Google ScholarLocate open access versionFindings
  • D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. CVPR, 2016. 2
    Google ScholarLocate open access versionFindings
  • A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015. 2
    Findings
  • S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396, 2016. 2
    Findings
  • R. Rosales, K. Achan, and B. J. Frey. Unsupervised image translation. In iccv, pages 472–478, 2003. 3
    Google ScholarLocate open access versionFindings
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 115(3):211–252, 2015. 6
    Google ScholarLocate open access versionFindings
  • T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. arXiv preprint arXiv:1606.03498, 2016. 2
    Findings
  • P. Sangkloy, J. Lu, C. Fang, F. Yu, and J. Hays. Scribbler: Controlling deep image synthesis with sketch and color. In CVPR, 2017. 3
    Google ScholarLocate open access versionFindings
  • Y. Shih, S. Paris, F. Durand, and W. T. Freeman. Datadriven hallucination of different times of day from a single outdoor photo. ACM Transactions on Graphics (TOG), 32(6):200, 2013. 2
    Google ScholarLocate open access versionFindings
  • A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb. Learning from simulated and unsupervised images through adversarial training. arXiv preprint arXiv:1612.07828, 2016. 3, 4, 5
    Findings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 5
    Findings
  • N. Sundaram, T. Brox, and K. Keutzer. Dense point trajectories by gpu-accelerated large displacement optical flow. In ECCV, pages 438–451. Springer, 2010. 3
    Google ScholarLocate open access versionFindings
  • Y. Taigman, A. Polyak, and L. Wolf. Unsupervised cross-domain image generation. arXiv preprint arXiv:1611.02200, 2016. 3, 6
    Findings
  • D. Turmukhambetov, N. D. Campbell, S. J. Prince, and J. Kautz. Modeling object appearance using contextconditioned component analysis. In CVPR, pages 4156– 4164, 2015. 6
    Google ScholarLocate open access versionFindings
  • M. Twain. The Jumping Frog: in English, then in French, and then Clawed Back into a Civilized Language Once More by Patient, Unremunerated Toil. 1903. 3
    Google ScholarFindings
  • D. Ulyanov, V. Lebedev, A. Vedaldi, and V. Lempitsky. Texture networks: Feed-forward synthesis of textures and stylized images. In Int. Conf. on Machine Learning (ICML), 2016. 3
    Google ScholarLocate open access versionFindings
  • D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016. 4
    Findings
  • C. Vondrick, H. Pirsiavash, and A. Torralba. Generating videos with scene dynamics. In NIPS, pages 613–621, 2016. 2
    Google ScholarLocate open access versionFindings
  • F. Wang, Q. Huang, and L. J. Guibas. Image cosegmentation via consistent functional maps. In ICCV, pages 849–856, 2013. 3
    Google ScholarLocate open access versionFindings
  • X. Wang and A. Gupta. Generative image modeling using style and structure adversarial networks. ECCV, 2016. 2
    Google ScholarLocate open access versionFindings
  • J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In NIPS, pages 82–90, 2016. 2
    Google ScholarLocate open access versionFindings
  • S. Xie and Z. Tu. Holistically-nested edge detection. In ICCV, 2015. 2
    Google ScholarLocate open access versionFindings
  • Z. Yi, H. Zhang, T. Gong, Tan, and M. Gong. Dualgan: Unsupervised dual learning for image-to-image translation. In ICCV, 2017. 3
    Google ScholarLocate open access versionFindings
  • C. Zach, M. Klopschitz, and M. Pollefeys. Disambiguating visual relations using loop constraints. In CVPR, pages 1426–1433. IEEE, 2010. 3
    Google ScholarLocate open access versionFindings
  • R. Zhang, P. Isola, and A. A. Efros. Colorful image colorization. In ECCV, 2016. 2
    Google ScholarLocate open access versionFindings
  • J. Zhao, M. Mathieu, and Y. LeCun. Energybased generative adversarial network. arXiv preprint arXiv:1609.03126, 2016. 2
    Findings
  • T. Zhou, Y. Jae Lee, S. X. Yu, and A. A. Efros. Flowweb: Joint image set alignment by weaving consistent, pixelwise correspondences. In CVPR, pages 1191–1200, 2015. 3
    Google ScholarLocate open access versionFindings
  • T. Zhou, P. Krahenbuhl, M. Aubry, Q. Huang, and A. A. Efros. Learning dense correspondence via 3d-guided cycle consistency. In CVPR, pages 117–126, 2016. 2, 3
    Google ScholarLocate open access versionFindings
  • J.-Y. Zhu, P. Krahenbuhl, E. Shechtman, and A. A. Efros. Generative visual manipulation on the natural image manifold. In ECCV, 2016. 2
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn