On Self Modulation for Generative Adversarial Networks

international conference on learning representations, 2019.

Cited by: 5|Bibtex|Views74
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We propose and study an architectural modification, self-modulation, which improves GAN performance across different data sets, architectures, losses, regularizers, and hyperparameter settings

Abstract:

Training Generative Adversarial Networks (GANs) is notoriously challenging. We propose and study an architectural modification, self-modulation, which improves GAN performance across different data sets, architectures, losses, regularizers, and hyperparameter settings. Intuitively, self-modulation allows the intermediate feature maps of a...More

Code:

Data:

0
Introduction
Highlights
  • Generative Adversarial Networks (GANs) are a powerful class of generative models successfully applied to a variety of tasks such as image generation (Zhang et al, 2017; Miyato et al, 2018; Karras et al, 2017), learned compression (Tschannen et al, 2018), super-resolution (Ledig et al, 2017), inpainting (Pathak et al, 2016), and domain transfer (Isola et al, 2016; Zhu et al, 2017).

    Training Generative Adversarial Networks is a notoriously challenging task (Goodfellow et al, 2014; Arjovsky et al, 2017; Lucic et al, 2018) as one is searching in a high-dimensional parameter space for a Nash equilibrium of a non-convex game
  • In this work we show that Generative Adversarial Networks benefit from self-modulation layers in the generator
  • We provide a simple yet effective technique that can added universally to yield better Generative Adversarial Networks
  • Two major approaches to conditioning on side information s have emerged: (1) Directly concatenate the side information s with the noise vector z (Mirza & Osindero, 2014), i.e. z = [s, z]. (2) Condition the hidden layers directly on s, which is usually instantiated via conditional batch normalization (De Vries et al, 2017; Miyato & Koyama, 2018)
  • We present a generator modification that improves the performance of most Generative Adversarial Networks
  • This technique is simple to implement and can be applied to all popular Generative Adversarial Networks, we believe that selfmodulation is a useful addition to the Generative Adversarial Networks toolbox
Methods
  • The authors run a Cartesian product of the parameters in Section 3.1 which results in 36 settings for each dataset (2 losses, 2 architectures, 3 hyperparameter settings for spectral normalization, and 6 for gradient penalty).
  • For each setting the authors run five random seeds for self-modulation and the baseline.
  • The authors compute the median score across random seeds which results in 1440 trained models.
  • The authors compare the performance of self-modulation and baseline for each model after hyperparameter optimization.
  • The results of this study are reported in Table 2, and the relative improvements are in Table 3 and Figure 2
Results
  • RESNET SNDCGAN RESNET SNDCGAN A.2.
  • FIDS BASE.
  • WHICH LAYER TO MODULATE?
  • Figure 4 presents the performance when modulating different layers of the generator for each dataset.
  • Dataset = cifar10 Target Layer.
  • 65 all 0 1 2 3 4 5 6 7 8 9 10 gc_transform_layers.
  • CONDITIONING AND PRECISION/RECALL Figure 5 presents the generator Jacobian condition number and precision/recall plot for each dataset
  • A.4 CONDITIONING AND PRECISION/RECALL Figure 5 presents the generator Jacobian condition number and precision/recall plot for each dataset
Conclusion
Summary
  • Introduction:

    Generative Adversarial Networks (GANs) are a powerful class of generative models successfully applied to a variety of tasks such as image generation (Zhang et al, 2017; Miyato et al, 2018; Karras et al, 2017), learned compression (Tschannen et al, 2018), super-resolution (Ledig et al, 2017), inpainting (Pathak et al, 2016), and domain transfer (Isola et al, 2016; Zhu et al, 2017).

    Training GANs is a notoriously challenging task (Goodfellow et al, 2014; Arjovsky et al, 2017; Lucic et al, 2018) as one is searching in a high-dimensional parameter space for a Nash equilibrium of a non-convex game.
  • State-of-the-art conditional GANs inject side information via conditional batch normalization (CBN) layers (De Vries et al, 2017; Miyato & Koyama, 2018; Zhang et al, 2018).
  • While this approach does help, a major drawback is that it requires external information, such as labels or embeddings, which is not always available
  • Methods:

    The authors run a Cartesian product of the parameters in Section 3.1 which results in 36 settings for each dataset (2 losses, 2 architectures, 3 hyperparameter settings for spectral normalization, and 6 for gradient penalty).
  • For each setting the authors run five random seeds for self-modulation and the baseline.
  • The authors compute the median score across random seeds which results in 1440 trained models.
  • The authors compare the performance of self-modulation and baseline for each model after hyperparameter optimization.
  • The results of this study are reported in Table 2, and the relative improvements are in Table 3 and Figure 2
  • Results:

    RESNET SNDCGAN RESNET SNDCGAN A.2.
  • FIDS BASE.
  • WHICH LAYER TO MODULATE?
  • Figure 4 presents the performance when modulating different layers of the generator for each dataset.
  • Dataset = cifar10 Target Layer.
  • 65 all 0 1 2 3 4 5 6 7 8 9 10 gc_transform_layers.
  • CONDITIONING AND PRECISION/RECALL Figure 5 presents the generator Jacobian condition number and precision/recall plot for each dataset
  • A.4 CONDITIONING AND PRECISION/RECALL Figure 5 presents the generator Jacobian condition number and precision/recall plot for each dataset
  • Conclusion:

    The authors discuss the effects of this method in light of recently proposed diagnostic tools, generator conditioning (Odena et al, 2018) and precision/recall for generative models (Sajjadi et al, 2018).
  • Several recent works observe that conditioning the generative process on side information leads to improved models (Mirza & Osindero, 2014; Odena et al, 2017; Miyato & Koyama, 2018).
Tables
  • Table1: Techniques for generator conditioning and modulation
  • Table2: In the unpaired setting (as defined in Section 3.2), we compute the median score (across random seeds) and report the best attainable score across considered optimization hyperparameters. SELF-MOD is the method introduced in Section 2 and BASELINE refers to batch normalization. We observe that the proposed approach outperforms the baseline in 30 out of 32 settings. The relative improvement is detailed in Table 3. The standard error of the median is within 3% in the majority of the settings and is presented in Table 6 for clarity
  • Table3: Reduction in FID over a large class of hyperparameter settings, losses, regularization, and normalization schemes. We observe from 4.3% to 33% decrease in FID. When applied to the RESNET architecture, independently of the loss, regularization, and normalization, SELF-MOD always outperforms the baseline. For SNDCGAN we observe an improvement in 87.5% of the cases (all except two on LSUN-BEDROOM)
  • Table4: FID and IS scores in label conditional setting
  • Table5: In the unpaired setting (as defined in Section 3.2), we compute the median score (across random seeds) and report the best attainable score across considered optimization hyperparameters. SELF-MOD is the method introduced in Section 2 and BASELINE refers to batch normalization
  • Table6: Table 2 with the standard error of the median
  • Table7: SNDCGAN Generator with 32 × 32 × 3 resolution. sBN denotes BN with self-modulation as proposed
  • Table8: SNDCGAN Discriminator with 32 × 32 × 3 resolution
  • Table9: SNDCGAN Gnerator with 128 × 128 × 3 resolution. sBN denotes BN with self-modulation as proposed
  • Table10: SNDCGAN Discriminator with 128 × 128 × 3 resolution
  • Table11: ResNet Generator with 32 × 32 × 3 resolution. Each ResNet block has a skip-connection that uses upsampling of its input and a 1x1 convolution. sBN denotes BN with self-modulation as proposed
  • Table12: ResNet Discriminator with 32 × 32 × 3 resolution. Each ResNet block has a skipconnection that applies a 1x1 convolution with possible downsampling according to spatial dimension
  • Table13: ResNet Generator with 128×128×3 resolution. Each ResNet block has a skip-connection that uses upsampling of its input and a 1x1 convolution. sBN denotes BN with self-modulation as proposed
  • Table14: ResNet Discriminator with 128 × 128 × 3 resolution. Each ResNet block has a skipconnection that applies a 1x1 convolution with possible downsampling according to spatial dimension
Download tables as Excel
Related work
  • Conditional GANs. Conditioning on side information, such as class labels, has been shown to improve the performance of GANs. Initial proposals were based on concatenating this additional feature with the input vector (Mirza & Osindero, 2014; Radford et al, 2016; Odena et al, 2017). Recent approaches, such as the projection cGAN (Miyato & Koyama, 2018) injects label information into the generator architecture using conditional Batch Norm layers (De Vries et al, 2017). Self-modulation is a simple yet effective complementary addition to this line of work which makes a significant difference when no side information is available. In addition, when side information is available it can be readily applied as discussed in Section 2 and leads to further improvements.
Reference
  • Martın Arjovsky, Soumith Chintala, and Leon Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning (ICML), 2017.
    Google ScholarLocate open access versionFindings
  • Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
    Findings
  • Shane Barratt and Rishi Sharma. A note on the inception score. arXiv preprint arXiv:1801.01973, 2018.
    Findings
  • Harm De Vries, Florian Strub, Jeremie Mary, Hugo Larochelle, Olivier Pietquin, and Aaron C Courville. Modulating early visual processing by language. In Advances in Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. A learned representation for artistic style. International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann Dauphin. Convolutional sequence to sequence learning. In International Conference on Machine Learning (ICML), 2017.
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS), 2014.
    Google ScholarLocate open access versionFindings
  • Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. Improved training of Wasserstein GANs. Advances in Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Gunter Klambauer, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a Nash equilibrium. In Advances in Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 1997.
    Google ScholarLocate open access versionFindings
  • Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
    Findings
  • Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. arxiv, 2016.
    Google ScholarFindings
  • Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. Advances in Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Taesup Kim, Inchul Song, and Yoshua Bengio. Dynamic layer normalization for adaptive neural acoustic modeling in speech recognition. In INTERSPEECH, 2017.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, and Sylvain Gelly. The GAN Landscape: Losses, Architectures, Regularization, and Normalization. arXiv preprint arXiv:1807.04720, 2018.
    Findings
  • Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Computer Vision and Pattern Recognition (CVPR), 2017.
    Google ScholarLocate open access versionFindings
  • Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet. Are GANs Created Equal? A Large-scale Study. In Advances in Neural Information Processing Systems (NIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. International Conference on Computer Vision (ICCV), 2016.
    Google ScholarLocate open access versionFindings
  • Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
    Findings
  • Takeru Miyato and Masanori Koyama. cgans with projection discriminator. International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier GANs. In International Conference on Machine Learning (ICML), 2017.
    Google ScholarLocate open access versionFindings
  • Augustus Odena, Jacob Buckman, Catherine Olsson, Tom B Brown, Christopher Olah, Colin Raffel, and Ian Goodfellow. Is generator conditioning causally related to gan performance? arXiv preprint arXiv:1802.08768, 2018.
    Findings
  • Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. In Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron C. Courville. Film: Visual reasoning with a general conditioning layer. AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • Mehdi SM Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, and Sylvain Gelly. Assessing generative models via precision and recall. In Advances in Neural Information Processing Systems (NIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems (NIPS), 2016.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR), 2015.
    Google ScholarLocate open access versionFindings
  • Michael Tschannen, Eirikur Agustsson, and Mario Lucic. Deep generative models for distributionpreserving lossy compression. In Advances in Neural Information Processing Systems (NIPS), 2018.
    Google ScholarLocate open access versionFindings
  • D Ulyanov, A Vedaldi, and VS Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
    Findings
  • Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al. Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems (NIPS), 2016.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
    Findings
  • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, and Dimitris Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. International Conference on Computer Vision (ICCV), 2017.
    Google ScholarLocate open access versionFindings
  • Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318, 2018.
    Findings
  • Zhiming Zhou, Yuxuan Song, Lantao Yu, and Yong Yu. Understanding the effectiveness of lipschitz constraint in training of gans via gradient analysis. arXiv preprint arXiv:1807.00751, 2018.
    Findings
  • Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint, 2017.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments