Energy-based Generative Adversarial Network

international conference on learning representations, 2017.

Cited by: 768|Bibtex|Views232
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We introduce the "Energy-based Generative Adversarial Network" model which views the discriminator as an energy function that attributes low energies to the regions near the data manifold and higher energies to other regions

Abstract:

We introduce the Energy-based Generative Adversarial Network model (EBGAN) which views the discriminator as an energy function that attributes low energies to the regions near the data manifold and higher energies to other regions. Similar to the probabilistic GANs, a generator is seen as being trained to produce contrastive samples with ...More

Code:

Data:

0
Introduction
  • 1.1 ENERGY-BASED MODEL

    The essence of the energy-based model (LeCun et al, 2006) is to build a function that maps each point of an input space to a single scalar, which is called “energy”.
  • The discriminator is trained to distinguish real samples of a dataset from fake samples produced by the generator.
  • The generator can be viewed as a trainable parameterized function that produces samples in regions of the space to which the discriminator assigns low energy.
Highlights
  • 1.1 ENERGY-BASED MODEL

    The essence of the energy-based model (LeCun et al, 2006) is to build a function that maps each point of an input space to a single scalar, which is called “energy”
  • The term contrastive sample is often used to refer to a data point causing an energy pull-up, such as the incorrect Y ’s in supervised learning and points from low data density regions in unsupervised learning
  • The probabilistic binary discriminator in the original formulation of GENERATIVE ADVERSARIAL NETWORKSGenerative Adversarial Networks can be seen as one way among many to define the contrast function and loss functional, as described in LeCun et al (2006) for the supervised and weakly supervised settings, and Ranzato et al (2007) for unsupervised learning
  • We argue that the energy function in the EBGAN framework is seen as being regularized by having a generator producing the contrastive samples, to which the discriminator ought to give high reconstruction energies
  • We further argue that the EBGAN framework allows more flexibility from this perspective, because: (i)-the regularizer is fully trainable instead of being handcrafted;-the adversarial training paradigm enables a direct interaction between the duality of producing contrastive sample and learning the energy function
  • We postulate that within the scope of the EBGAN framework, iteratively feeding the adversarial contrastive samples produced by the generator to the energy function acts as an effective regularizer; the contrastive samples can be thought as an extension to the dataset that provides more information to the classifier
Results
  • The probabilistic binary discriminator in the original formulation of GAN can be seen as one way among many to define the contrast function and loss functional, as described in LeCun et al (2006) for the supervised and weakly supervised settings, and Ranzato et al (2007) for unsupervised learning.
  • A proof that under a simple hinge loss, when the system reaches convergence, the generator of EBGAN produces points that follow the underlying data distribution.
  • An EBGAN framework with the discriminator using an auto-encoder architecture in which the energy is the reconstruction error.
  • The diagram of the EBGAN model with an auto-encoder discriminator is depicted in figure 1.
  • When trained with some regularization terms, auto-encoders have the ability to learn an energy manifold without supervision or negative examples.
  • This means that even when an EBGAN auto-encoding model is trained to reconstruct a real sample, the discriminator contributes to discovering the data manifold by itself.
  • One common issue in training auto-encoders is that the model may learn little more than an identity function, meaning that it attributes zero energy to the whole space.
  • The authors argue that the energy function in the EBGAN framework is seen as being regularized by having a generator producing the contrastive samples, to which the discriminator ought to give high reconstruction energies.
  • The authors further argue that the EBGAN framework allows more flexibility from this perspective, because: (i)-the regularizer is fully trainable instead of being handcrafted;-the adversarial training paradigm enables a direct interaction between the duality of producing contrastive sample and learning the energy function.
  • The authors use the notation “EBGAN-PT” to refer to the EBGAN auto-encoder model trained with this term.
Conclusion
  • The authors study the training stability of EBGANs over GANs on a simple task of MNIST digit generation with fully-connected networks.
  • The undesirability of a non-decay dynamics for using the discriminator in the GAN or EBGAN framework is indicated by Theorem 2: on convergence, the discriminator reflects a flat energy surface.
  • The authors postulate that within the scope of the EBGAN framework, iteratively feeding the adversarial contrastive samples produced by the generator to the energy function acts as an effective regularizer; the contrastive samples can be thought as an extension to the dataset that provides more information to the classifier.
Summary
  • 1.1 ENERGY-BASED MODEL

    The essence of the energy-based model (LeCun et al, 2006) is to build a function that maps each point of an input space to a single scalar, which is called “energy”.
  • The discriminator is trained to distinguish real samples of a dataset from fake samples produced by the generator.
  • The generator can be viewed as a trainable parameterized function that produces samples in regions of the space to which the discriminator assigns low energy.
  • The probabilistic binary discriminator in the original formulation of GAN can be seen as one way among many to define the contrast function and loss functional, as described in LeCun et al (2006) for the supervised and weakly supervised settings, and Ranzato et al (2007) for unsupervised learning.
  • A proof that under a simple hinge loss, when the system reaches convergence, the generator of EBGAN produces points that follow the underlying data distribution.
  • An EBGAN framework with the discriminator using an auto-encoder architecture in which the energy is the reconstruction error.
  • The diagram of the EBGAN model with an auto-encoder discriminator is depicted in figure 1.
  • When trained with some regularization terms, auto-encoders have the ability to learn an energy manifold without supervision or negative examples.
  • This means that even when an EBGAN auto-encoding model is trained to reconstruct a real sample, the discriminator contributes to discovering the data manifold by itself.
  • One common issue in training auto-encoders is that the model may learn little more than an identity function, meaning that it attributes zero energy to the whole space.
  • The authors argue that the energy function in the EBGAN framework is seen as being regularized by having a generator producing the contrastive samples, to which the discriminator ought to give high reconstruction energies.
  • The authors further argue that the EBGAN framework allows more flexibility from this perspective, because: (i)-the regularizer is fully trainable instead of being handcrafted;-the adversarial training paradigm enables a direct interaction between the duality of producing contrastive sample and learning the energy function.
  • The authors use the notation “EBGAN-PT” to refer to the EBGAN auto-encoder model trained with this term.
  • The authors study the training stability of EBGANs over GANs on a simple task of MNIST digit generation with fully-connected networks.
  • The undesirability of a non-decay dynamics for using the discriminator in the GAN or EBGAN framework is indicated by Theorem 2: on convergence, the discriminator reflects a flat energy surface.
  • The authors postulate that within the scope of the EBGAN framework, iteratively feeding the adversarial contrastive samples produced by the generator to the energy function acts as an effective regularizer; the contrastive samples can be thought as an extension to the dataset that provides more information to the classifier.
Tables
  • Table1: Grid search specs
  • Table2: The comparison of LN bottom-layer-cost model and its EBGAN extension on PI-MNIST
Download tables as Excel
Related work
  • Our work primarily casts GANs into an energy-based model scope. On this direction, the approaches studying contrastive samples are relevant to EBGAN, such as the use of noisy samples (Vincent et al, 2010) and noisy gradient descent methods like contrastive divergence (Carreira-Perpinan & Hinton, 2005). From the perspective of GANs, several papers were presented to improve the stability of GAN training, (Salimans et al, 2016; Denton et al, 2015; Radford et al, 2015; Im et al, 2016; Mathieu et al, 2015).

    Kim & Bengio (2016) propose a probabilistic GAN and cast it into an energy-based density estimator by using the Gibbs distribution. Quite unlike EBGAN, this proposed framework doesn’t get rid of the computational challenging partition function, so the choice of the energy function is required to be integratable.
Reference
  • Citeseer, 2005. 9
    Google ScholarFindings
  • Denton, Emily L, Chintala, Soumith, Fergus, Rob, et al. Deep generative image models using a laplacian pyramid of adversarial networks. In Advances in neural information processing systems, pp. 1486–1494, 2015.
    Google ScholarLocate open access versionFindings
  • Goodfellow, Ian, Pouget-Abadie, Jean, Mirza, Mehdi, Xu, Bing, Warde-Farley, David, Ozair, Sherjil, Courville, Aaron, and Bengio, Yoshua. Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Im, Daniel Jiwoong, Kim, Chris Dongjoo, Jiang, Hui, and Memisevic, Roland. Generating images with recurrent adversarial networks. arXiv preprint arXiv:1602.05110, 2016.
    Findings
  • Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
    Findings
  • Kavukcuoglu, Koray, Sermanet, Pierre, Boureau, Y-Lan, Gregor, Karol, Mathieu, Michael, and Cun, Yann L. Learning convolutional feature hierarchies for visual recognition. In Advances in neural information processing systems, pp. 1090–1098, 2010.
    Google ScholarLocate open access versionFindings
  • Kim, Taesup and Bengio, Yoshua. Deep directed generative models with energy-based probability estimation. arXiv preprint arXiv:1606.03439, 2016.
    Findings
  • Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • LeCun, Yann, Chopra, Sumit, and Hadsell, Raia. A tutorial on energy-based learning. 2006.
    Google ScholarFindings
  • Liu, Ziwei, Luo, Ping, Wang, Xiaogang, and Tang, Xiaoou. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738, 2015.
    Google ScholarLocate open access versionFindings
  • MarcAurelio Ranzato, Christopher Poultney and Chopra, Sumit. Efficient learning of sparse representations with an energy-based model. 2007.
    Google ScholarFindings
  • Mathieu, Michael, Couprie, Camille, and LeCun, Yann. Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440, 2015.
    Findings
  • Pezeshki, Mohammad, Fan, Linxi, Brakel, Philemon, Courville, Aaron, and Bengio, Yoshua. Deconstructing the ladder network architecture. arXiv preprint arXiv:1511.06430, 2015.
    Findings
  • Radford, Alec, Metz, Luke, and Chintala, Soumith. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
    Findings
  • Ranzato, Marc’Aurelio, Boureau, Y-Lan, Chopra, Sumit, and LeCun, Yann. A unified energy-based framework for unsupervised learning. In Proc. Conference on AI and Statistics (AI-Stats), 2007.
    Google ScholarLocate open access versionFindings
  • Rasmus, Antti, Berglund, Mathias, Honkala, Mikko, Valpola, Harri, and Raiko, Tapani. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, pp. 3546–3554, 2015.
    Google ScholarLocate open access versionFindings
  • Rifai, Salah, Vincent, Pascal, Muller, Xavier, Glorot, Xavier, and Bengio, Yoshua. Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th international conference on machine learning (ICML-11), pp. 833–840, 2011.
    Google ScholarLocate open access versionFindings
  • Russakovsky, Olga, Deng, Jia, Su, Hao, Krause, Jonathan, Satheesh, Sanjeev, Ma, Sean, Huang, Zhiheng, Karpathy, Andrej, Khosla, Aditya, Bernstein, Michael, Berg, Alexander C., and Fei-Fei, Li. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
    Locate open access versionFindings
  • Salimans, Tim, Goodfellow, Ian, Zaremba, Wojciech, Cheung, Vicki, Radford, Alec, and Chen, Xi. Improved techniques for training gans. arXiv preprint arXiv:1606.03498, 2016.
    Findings
  • Vincent, Pascal, Larochelle, Hugo, Lajoie, Isabelle, Bengio, Yoshua, and Manzagol, Pierre-Antoine. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec):3371–3408, 2010.
    Google ScholarLocate open access versionFindings
  • Vinyals, Oriol, Blundell, Charles, Lillicrap, Timothy, Kavukcuoglu, Koray, and Wierstra, Daan. Matching networks for one shot learning. arXiv preprint arXiv:1606.04080, 2016.
    Findings
  • Yu, Fisher, Seff, Ari, Zhang, Yinda, Song, Shuran, Funkhouser, Thomas, and Xiao, Jianxiong. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
    Findings
  • Zhao, Junbo, Mathieu, Michael, Goroshin, Ross, and Lecun, Yann. Stacked what-where auto-encoders. arXiv preprint arXiv:1506.02351, 2015.
    Findings
  • We evaluate the models from the grid search by calculating a modified version of the inception score, I = ExKL(p(y)||p(y|x)), where x denotes a generated sample and y is the label predicted by a MNIST classifier that is trained off-line using the entire MNIST training set. Two main changes were made upon its original form: (i)-we swap the order of the distribution pair; (ii)-we omit the e(·) operation. The modified score condenses the histogram in figure 2 and figure 3. It is also worth noting that although we inherit the name “inception score” from Salimans et al. (2016), the evaluation isn’t related to the “inception” model trained on ImageNet dataset. The classifier is a regular 3-layer ConvNet trained on MNIST.
    Google ScholarLocate open access versionFindings
  • We use a deep convolutional generator analogous to DCGAN’s and a deep convolutional autoencoder for the discriminator. The auto-encoder is composed of strided convolution modules in the feedforward pathway and fractional-strided convolution modules in the feedback pathway. We leave the usage of upsampling or switches-unpooling (Zhao et al., 2015) to future research. We also followed the guidance suggested by Radford et al. (2015) for training EBGANs. The configuration of the deep auto-encoder is:
    Google ScholarLocate open access versionFindings
  • Note that we feed noise into every layer of the generator where each noise component is initialized into a 4D tensor and concatenated with current feature maps in the feature space. Such strategy is also employed by Salimans et al. (2016).
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments