Self-Supervised GANs via Auxiliary Rotation Loss

CVPR, pp. 12154-12163, 2019.

Cited by: 69|Bibtex|Views57
EI
Other Links: academic.microsoft.com|dblp.uni-trier.de|arxiv.org
Weibo:
In terms of mean performance the proposed approach matches the conditional Generative Adversarial Networks, and in terms of the best models selected across random seeds, the performance gap is within 5%

Abstract:

Conditional GANs are at the forefront of natural image synthesis. The main drawback of such models is the necessity for labeled data. In this work we exploit two popular unsupervised learning techniques, adversarial training and self-supervision, and take a step towards bridging the gap between conditional and unconditional GANs. In parti...More

Code:

Data:

0
Introduction
  • Generative Adversarial Networks (GANs) are a class of unsupervised generative models [1].
  • GANs involve training a generator and discriminator model in an adversarial game, such that the generator learns to produce samples from a desired data distribution.
  • If the discriminator forgets previous classification boundaries, training may become unstable or cyclic
  • This issue is usually addressed either by reusing old samples or by applying continual learning techniques [14, 15, 16, 17, 18, 19].
  • Even when labeled data is available, it is usually sparse and covers only a limited amount of high level abstractions
Highlights
  • Generative Adversarial Networks (GANs) are a class of unsupervised generative models [1]
  • The unconditional Generative Adversarial Networks is unstable on IMAGENET and the training often diverges
  • In terms of mean performance (Figure 4) the proposed approach matches the conditional Generative Adversarial Networks, and in terms of the best models selected across random seeds (Table 1), the performance gap is within 5%
  • Motivated by the desire to counter discriminator forgetting, we propose a deep generative model that combines adversarial and self-supervised learning
  • The resulting novel model, namely self-supervised Generative Adversarial Networks when combined with the recently introduced self-modulation, can match equivalent conditional Generative Adversarial Networks on the task of image synthesis, without having access to labeled data. We show that this model can be scaled to attain an Frechet Inception Distance of 23.4 on unconditional ImageNet generation which is an extremely challenging task
  • The selfsupervised Generative Adversarial Networks could be used in a semi-supervised setting where a small number of labels could be used to fine-tune the model
Methods
  • Uncond-GAN Cond-GAN SS-GAN SS-GAN

    Uncond-GAN SS-GAN SS-GAN

    FID obtained by running the code provided by the authors.2 The authors use 3k examples as the test set and the remaining examples as the training set.
  • Models The authors compare the self-supervised GAN (SS-GAN) to two well-performing baseline models, namely (1) the unconditional GAN with spectral normalization proposed in Miyato et al [6], denoted Uncond-GAN, and (2) the conditional GAN using the label-conditioning strategy and the Projection Conditional GAN (Cond-GAN) [21].
Results
  • Figure 4 shows FID training curves on CIFAR10 and IMAGENET.
  • In terms of mean performance (Figure 4) the proposed approach matches the conditional GAN, and in terms of the best models selected across random seeds (Table 1), the performance gap is within 5%.
  • On CIFAR10 and LSUN-BEDROOM the authors observe a substantial improvement over the unconditional GAN and matching the performance of the conditional GAN.
  • Figure 6 shows the learning curves for representation quality of the final ResNet block on IMAGENET.
  • The authors observe similar results on CIFAR10 provided in Table 3
Conclusion
  • Conclusions and Future Work

    Motivated by the desire to counter discriminator forgetting, the authors propose a deep generative model that combines adversarial and self-supervised learning.
  • The resulting novel model, namely self-supervised GAN when combined with the recently introduced self-modulation, can match equivalent conditional GANs on the task of image synthesis, without having access to labeled data.
  • The authors show that this model can be scaled to attain an FID of 23.4 on unconditional ImageNet generation which is an extremely challenging task.
  • This line of work opens several avenues for future research.
  • One may exploit several recently introduced techniques, such as self-attention, orthogonal normalization and regularization, and sampling truncation [9, 22], to yield even better performance in unconditional image synthesis
Summary
  • Introduction:

    Generative Adversarial Networks (GANs) are a class of unsupervised generative models [1].
  • GANs involve training a generator and discriminator model in an adversarial game, such that the generator learns to produce samples from a desired data distribution.
  • If the discriminator forgets previous classification boundaries, training may become unstable or cyclic
  • This issue is usually addressed either by reusing old samples or by applying continual learning techniques [14, 15, 16, 17, 18, 19].
  • Even when labeled data is available, it is usually sparse and covers only a limited amount of high level abstractions
  • Methods:

    Uncond-GAN Cond-GAN SS-GAN SS-GAN

    Uncond-GAN SS-GAN SS-GAN

    FID obtained by running the code provided by the authors.2 The authors use 3k examples as the test set and the remaining examples as the training set.
  • Models The authors compare the self-supervised GAN (SS-GAN) to two well-performing baseline models, namely (1) the unconditional GAN with spectral normalization proposed in Miyato et al [6], denoted Uncond-GAN, and (2) the conditional GAN using the label-conditioning strategy and the Projection Conditional GAN (Cond-GAN) [21].
  • Results:

    Figure 4 shows FID training curves on CIFAR10 and IMAGENET.
  • In terms of mean performance (Figure 4) the proposed approach matches the conditional GAN, and in terms of the best models selected across random seeds (Table 1), the performance gap is within 5%.
  • On CIFAR10 and LSUN-BEDROOM the authors observe a substantial improvement over the unconditional GAN and matching the performance of the conditional GAN.
  • Figure 6 shows the learning curves for representation quality of the final ResNet block on IMAGENET.
  • The authors observe similar results on CIFAR10 provided in Table 3
  • Conclusion:

    Conclusions and Future Work

    Motivated by the desire to counter discriminator forgetting, the authors propose a deep generative model that combines adversarial and self-supervised learning.
  • The resulting novel model, namely self-supervised GAN when combined with the recently introduced self-modulation, can match equivalent conditional GANs on the task of image synthesis, without having access to labeled data.
  • The authors show that this model can be scaled to attain an FID of 23.4 on unconditional ImageNet generation which is an extremely challenging task.
  • This line of work opens several avenues for future research.
  • One may exploit several recently introduced techniques, such as self-attention, orthogonal normalization and regularization, and sampling truncation [9, 22], to yield even better performance in unconditional image synthesis
Tables
  • Table1: Best FID attained across three random seeds. In this setting the proposed approach recovers most of the benefits of conditioning
  • Table2: FID for unconditional GANs under different hyperparameter settings. Mean and standard deviations are computed across three random seeds. Adding the self-supervision loss reduces the sensitivity of GAN training to hyperparameters
  • Table3: Top-1 accuracy on CIFAR10. Mean score across three training runs of the original model. All standard deviations are smaller than 0.01 and are reported in the appendix
  • Table4: Top-1 accuracy on IMAGENET. Mean score across three training runs of the original model. All standard deviations are smaller than 0.01, except for Uncond-GAN whose results exhibit high variance due to training instability. All standard deviations are reported in the appendix
  • Table5: Comparison with other self-supervised representation learning methods by top-1 accuracy on IMAGENET. For SS-GAN, the mean performance is presented
Download tables as Excel
Related work
  • GAN forgetting Catastrophic forgetting was previously considered as a major cause for GAN training instability. The main remedy suggested in the literature is to introduce

    Context [24] BiGAN [34, 36] Colorization [25] RotNet [26] DeepClustering [35] SS-GAN (sBN)

    temporal memory into the training algorithm in various ways. For example, Grnarova et al [19] induce discriminator memory by replaying previously generated images. An alternative is to instead reuse previous models: Salimans et al [2] introduce checkpoint averaging, where a running average of the parameters of each player is kept, and Grnarova et al [19] maintain a queue of models that are used at each training iteration. Kim et al [18] add memory to retain information about previous samples. Other papers frame GAN training as a continual learning task. Thanh-Tung et al [14] study catastrophic forgetting in the discriminator and mode collapse, relating these to training instability. Liang et al [15] counter discriminator forgetting by leveraging techniques from continual learning directly (Elastic Weight Sharing [11] and Intelligent Synapses [37]).
Reference
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS), 2014.
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems (NIPS), 2016.
    Google ScholarLocate open access versionFindings
  • Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. Which training methods for gans do actually converge? In International Conference on Machine Learning (ICML), 2018.
    Google ScholarLocate open access versionFindings
  • Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In International Conference on Computer Vision (ICCV), 2016.
    Google ScholarLocate open access versionFindings
  • Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. Improved training of Wasserstein GANs. Advances in Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Ting Chen, Mario Lucic, Neil Houlsby, and Sylvain Gelly. On Self Modulation for Generative Adversarial Networks. In International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318, 2018.
    Findings
  • Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 2017.
    Google ScholarLocate open access versionFindings
  • Michael McCloskey and Neal J Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24.
    Google ScholarLocate open access versionFindings
  • Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 1999.
    Google ScholarLocate open access versionFindings
  • Hoang Thanh-Tung, Truyen Tran, and Svetha Venkatesh. On catastrophic forgetting and mode collapse in generative adversarial networks. ICML Workshop on Theoretical Foundations and Applications of Deep Generative Models, 2018.
    Google ScholarLocate open access versionFindings
  • Kevin J Liang, Chunyuan Li, Guoyin Wang, and Lawrence Carin. Generative Adversarial Network Training is a Continual Learning Problem. arXiv preprint arXiv:1811.11083, 2018.
    Findings
  • Ari Seff, Alex Beatson, Daniel Suo, and Han Liu. Continual learning in Generative Adversarial Nets. arXiv preprint arXiv:1705.08395, 2017.
    Findings
  • Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind, Wenda Wang, and Russell Webb. Learning from simulated and unsupervised images through adversarial training. In Computer Vision and Pattern Recognition (CVPR), 2017.
    Google ScholarLocate open access versionFindings
  • Youngjin Kim, Minjung Kim, and Gunhee Kim. Memorization precedes generation: Learning unsupervised gans with memory networks. International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Paulina Grnarova, Kfir Y Levy, Aurelien Lucchi, Thomas Hofmann, and Andreas Krause. An online learning approach to generative adversarial networks. In International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier GANs. In International Conference on Machine Learning (ICML), 2017.
    Google ScholarLocate open access versionFindings
  • Takeru Miyato and Masanori Koyama. cgans with projection discriminator. International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
    Findings
  • Alexey Dosovitskiy, Jost Tobias Springenberg, Martin Riedmiller, and Thomas Brox. Discriminative unsupervised feature learning with convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), 2014.
    Google ScholarLocate open access versionFindings
  • Carl Doersch, Abhinav Gupta, and Alexei A Efros. Unsupervised visual representation learning by context prediction. In International Conference on Computer Vision (ICCV), 2015.
    Google ScholarLocate open access versionFindings
  • Richard Zhang, Phillip Isola, and Alexei A Efros. Colorful image colorization. In European Conference on Computer Vision (ECCV), 2016.
    Google ScholarLocate open access versionFindings
  • Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
    Findings
  • Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Günter Klambauer, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a Nash equilibrium. In Advances in Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Mehdi SM Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, and Sylvain Gelly. Assessing generative models via precision and recall. In Advances in Neural Information Processing Systems (NIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet. Are GANs Created Equal? A Largescale Study. In Advances in Neural Information Processing Systems (NIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Shane Barratt and Rishi Sharma. A note on the inception score. arXiv preprint arXiv:1801.01973, 2018.
    Findings
  • Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, and Sylvain Gelly. The GAN Landscape: Losses, architectures, regularization, and normalization. arXiv preprint arXiv:1807.04720, 2018.
    Findings
  • Zhiming Zhou, Yuxuan Song, Lantao Yu, and Yong Yu. Understanding the effectiveness of lipschitz constraint in training of gans via gradient analysis. arXiv preprint arXiv:1807.00751, 2018.
    Findings
  • Jeff Donahue, Philipp Krähenbühl, and Trevor Darrell. Adversarial feature learning. In International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. European Conference on Computer Vision (ECCV), 2018.
    Google ScholarLocate open access versionFindings
  • Richard Zhang, Phillip Isola, and Alexei A Efros. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    Google ScholarLocate open access versionFindings
  • Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In International Conference on Machine Learning (ICML), 2017.
    Google ScholarLocate open access versionFindings
  • Harm De Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, and Aaron C Courville. Modulating early visual processing by language. In Advances in Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Pulkit Agrawal, Joao Carreira, and Jitendra Malik. Learning to see by moving. In International Conference on Computer Vision (ICCV), 2015.
    Google ScholarLocate open access versionFindings
  • Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, and MingHsuan Yang. Unsupervised representation learning by sorting sequences. In International Conference on Computer Vision (ICCV), 2017.
    Google ScholarLocate open access versionFindings
  • Eric Jang, Coline Devin, Vincent Vanhoucke, and Sergey Levine. Grasp2vec: Learning object representations from self-supervised grasping. In Conference on Robot Learning (CoRL, 2018.
    Google ScholarLocate open access versionFindings
  • Lerrel Pinto and Abhinav Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In International Conference on Robotics and Automation (ICRA), 2016.
    Google ScholarLocate open access versionFindings
  • T Nathan Mundhenk, Daniel Ho, and Barry Y Chen. Improvements to context based self-supervised learning. In Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision (ECCV), 2016.
    Google ScholarLocate open access versionFindings
  • Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. In Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • Alexander Kolesnikov, Xiaohua Zhai, and Lucas Beyer. Revisiting self-supervised visual representation learning. In Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments