Learning Latent Space Energy-Based Prior Model

NIPS 2020, 2020.

Cited by: 6|Views44
EI
Weibo:
This paper proposes a generalization of the generator model, where the latent vector follows a latent space energy-based model, which is a refinement or correction of the independent Gaussian or uniform noise prior in the original generator model

Abstract:

The generator model assumes that the observed example is generated by a low-dimensional latent vector via a top-down network, and the latent vector follows a simple and known prior distribution, such as uniform or Gaussian white noise distribution. While we can learn an expressive top-down network to map the prior distribution to the da...More

Code:

Data:

0
Introduction
  • Deep generative models have achieved impressive successes in image and text generation.
  • The generator model was proposed in the contexts of variational auto-encoder (VAE) [29, 48] and generative adversarial networks (GAN) [18, 47].
  • In both frameworks, the generator model is jointly learned with a complementary model, such as the inference model in VAE and the discriminator model in GAN.
  • The authors shall adopt the the framework of maximum likelihood estimate (MLE), instead of GAN or VAE, so that the learning is simpler in the sense that the authors do not need to train a complementary network
Highlights
  • In recent years, deep generative models have achieved impressive successes in image and text generation
  • (1) We propose a generator model with a latent space energy-based prior model by following the empirical Bayes philosophy
  • This paper proposes a generalization of the generator model, where the latent vector follows a latent space energy-based model (EBM), which is a refinement or correction of the independent Gaussian or uniform noise prior in the original generator model
  • We adopt a simple maximum likelihood framework for learning, and develop a practical modification of the maximum likelihood learning algorithm based on short-run Markov chain Monte Carlo (MCMC) sampling from the prior and posterior distributions of the latent vector
  • We provide a theoretical underpinning of the resulting algorithm as a perturbation of the maximum likelihood learning in terms of objective function and estimating equation
  • EBM has many applications, its soundness and its power are limited by the difficulty with MCMC sampling
Methods
  • The authors present a set of experiments which highlight the effectiveness of the proposed model with (1) excellent synthesis for both visual and textual data outperforming state-of-the-art baselines, (2) high expressiveness of the learned prior model for both data modalities, and (3) strong performance in anomaly detection.
  • If the model is well-learned, the latent EBM πα(z) will fit the generator posterior pθ(z|x) which in turn renders realistic generated samples as well as faithful reconstructions.
  • The authors compare the model with VAE [29] and SRI [44] which assume a fixed Gaussian prior distribution for the latent vector and two recent strong VAE variants, 2sVAE [12] and RAE [16], whose prior distributions are learned with posterior samples in a second stage.
  • It can be seen that the model achieves superior generation performance compared to listed baseline models
Conclusion
  • This paper proposes a generalization of the generator model, where the latent vector follows a latent space EBM, which is a refinement or correction of the independent Gaussian or uniform noise prior in the original generator model.
  • The authors adopt a simple maximum likelihood framework for learning, and develop a practical modification of the maximum likelihood learning algorithm based on short-run MCMC sampling from the prior and posterior distributions of the latent vector.
  • EBM has many applications, its soundness and its power are limited by the difficulty with MCMC sampling.
  • By moving from data space to latent space, MCMC-based learning of EBM becomes sound and feasible, and the authors may release the power of EBM in the latent space for many applications
Summary
  • Introduction:

    Deep generative models have achieved impressive successes in image and text generation.
  • The generator model was proposed in the contexts of variational auto-encoder (VAE) [29, 48] and generative adversarial networks (GAN) [18, 47].
  • In both frameworks, the generator model is jointly learned with a complementary model, such as the inference model in VAE and the discriminator model in GAN.
  • The authors shall adopt the the framework of maximum likelihood estimate (MLE), instead of GAN or VAE, so that the learning is simpler in the sense that the authors do not need to train a complementary network
  • Methods:

    The authors present a set of experiments which highlight the effectiveness of the proposed model with (1) excellent synthesis for both visual and textual data outperforming state-of-the-art baselines, (2) high expressiveness of the learned prior model for both data modalities, and (3) strong performance in anomaly detection.
  • If the model is well-learned, the latent EBM πα(z) will fit the generator posterior pθ(z|x) which in turn renders realistic generated samples as well as faithful reconstructions.
  • The authors compare the model with VAE [29] and SRI [44] which assume a fixed Gaussian prior distribution for the latent vector and two recent strong VAE variants, 2sVAE [12] and RAE [16], whose prior distributions are learned with posterior samples in a second stage.
  • It can be seen that the model achieves superior generation performance compared to listed baseline models
  • Conclusion:

    This paper proposes a generalization of the generator model, where the latent vector follows a latent space EBM, which is a refinement or correction of the independent Gaussian or uniform noise prior in the original generator model.
  • The authors adopt a simple maximum likelihood framework for learning, and develop a practical modification of the maximum likelihood learning algorithm based on short-run MCMC sampling from the prior and posterior distributions of the latent vector.
  • EBM has many applications, its soundness and its power are limited by the difficulty with MCMC sampling.
  • By moving from data space to latent space, MCMC-based learning of EBM becomes sound and feasible, and the authors may release the power of EBM in the latent space for many applications
Tables
  • Table1: MSE of testing reconstructions and FID of generated samples for SVHN (32 × 32 × 3), CIFAR-10 (32 × 32 × 3), and CelebA (64 × 64 × 3) datasets
  • Table2: Forward Perplexity (FPPL), Reverse Perplexity (RPPL), and Negative Log-Likelihood (NLL) for our model and baselines on SNLI, PTB, and Yahoo datasets
  • Table3: Transition of a Markov chain initialized from p0(z) towards pα(z). Top: Trajectory in the PTB data-space. Each panel contains a sample for K0 ∈ {0, 40, 100}. Bottom: Energy profile
  • Table4: AUPRC scores for unsupervised anomaly detection on MNIST. Numbers are taken from [<a class="ref-link" id="c31" href="#r31">31</a>] and results for our model are averaged over last 10 epochs to account for variance
  • Table5: Comparison of the models with a latent EBM prior versus a fixed Gaussian prior. The highlighted number is the reported FID for SVHN and compared to other baseline models in the main text
  • Table6: Influence of the number of prior and posterior short run steps K0 (left) and K1 (right). The highlighted number is the reported FID for SVHN and compared to other baseline models in the main text
  • Table7: Influence of prior and generator complexity. The highlighted number is the reported FID for SVHN and compared to other baseline models in the main text. nef indicates the number of hidden features of the prior EBM and ngf denotes the factor of the number of channels of the generator (also see Table 9)
  • Table8: Hyperparameters for short run dynamics
  • Table9: EBM model architectures for all image and text datasets and generator model architectures for SVHN (32 × 32 × 3), CIFAR-10 (32 × 32 × 3), and CelebA (64 × 64 × 3). convT(n) indicates a transposed convolutional operation with n output feature maps. LReLU indicates the Leaky-ReLU activation function. The leak factor for LReLU is 0.2 in EBM and 0.1 in Generator
  • Table10: The sizes of word embeddings and hidden units of the generators for SNLI, PTB, and Yahoo
Download tables as Excel
Funding
  • The work is supported by DARPA XAI project N66001-17-2-4029; ARO project W911NF1810296; ONR MURI project N00014-16-1-2007; and XSEDE grant ASC170063
Reference
  • David H. Ackley, Geoffrey E. Hinton, and Terrence J. Sejnowski. A learning algorithm for boltzmann machines. Cognitive Science, 9(1):147–169, 1985.
    Google ScholarLocate open access versionFindings
  • Martín Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pages 214–223, 2017.
    Google ScholarLocate open access versionFindings
  • Matthias Bauer and Andriy Mnih. Resampled priors for variational autoencoders. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 66–75, 2019.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, and Salah Rifai. Better mixing via deep representations. In International conference on machine learning, pages 552–560, 2013.
    Google ScholarLocate open access versionFindings
  • Piotr Bojanowski, Armand Joulin, David Lopez-Paz, and Arthur Szlam. Optimizing the latent space of generative networks. arXiv preprint arXiv:1707.05776, 2017.
    Findings
  • Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal, Sept. 2015. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a continuous space. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pages 10–21, Berlin, Germany, Aug. 2016. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
    Findings
  • Ondrej Cífka, Aliaksei Severyn, Enrique Alfonseca, and Katja Filippova. Eval all, trust a few, do wrong to none: Comparing sentence generation models. arXiv preprint arXiv:1804.07972, 2018.
    Findings
  • Thomas M. Cover and Joy A. Thomas. Elements of information theory (2. ed.). Wiley, 2006.
    Google ScholarFindings
  • Bin Dai and David Wipf. Diagnosing and enhancing vae models. arXiv preprint arXiv:1903.05789, 2019.
    Findings
  • Bin Dai and David Wipf. Diagnosing and enhancing vae models. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Zihang Dai, Amjad Almahairi, Philip Bachman, Eduard H. Hovy, and Aaron C. Courville. Calibrating energy-based generative adversarial networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.
    Google ScholarLocate open access versionFindings
  • Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–22, 1977.
    Google ScholarLocate open access versionFindings
  • Yilun Du and Igor Mordatch. Implicit generation and generalization in energy-based models. CoRR, abs/1903.08689, 2019.
    Findings
  • Partha Ghosh, Mehdi S. M. Sajjadi, Antonio Vergari, Michael Black, and Bernhard Scholkopf. From variational to deterministic autoencoders. In International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256, 2010.
    Google ScholarLocate open access versionFindings
  • Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Cheng-En Guo, Song-Chun Zhu, and Ying Nian Wu. Modeling visual patterns by integrating descriptive and generative methods. International Journal of Computer Vision, 53(1):5–29, 2003.
    Google ScholarLocate open access versionFindings
  • Tian Han, Yang Lu, Song-Chun Zhu, and Ying Nian Wu. Alternating back-propagation for generator network. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., pages 1976–1984, 2017.
    Google ScholarLocate open access versionFindings
  • Tian Han, Erik Nijkamp, Xiaolin Fang, Mitch Hill, Song-Chun Zhu, and Ying Nian Wu. Divergence triangle for joint training of generator model, energy-based model, and inferential model. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 8670–8679, 2019.
    Google ScholarLocate open access versionFindings
  • Tian Han, Erik Nijkamp, Linqi Zhou, Bo Pang, Song-Chun Zhu, and Ying Nian Wu. Joint training of variational auto-encoder and latent energy-based model. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
    Google ScholarLocate open access versionFindings
  • Geoffrey E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1771–1800, 2002.
    Google ScholarLocate open access versionFindings
  • Geoffrey E Hinton, Peter Dayan, Brendan J Frey, and Radford M Neal. The" wake-sleep" algorithm for unsupervised neural networks. Science, 268(5214):1158–1161, 1995.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
    Google ScholarLocate open access versionFindings
  • Aapo Hyvärinen, Juha Karhunen, and Erkki Oja. Independent component analysis. John Wiley & Sons, 2004.
    Google ScholarFindings
  • Yoon Kim, Sam Wiseman, Andrew Miller, David Sontag, and Alexander Rush. Semi-amortized variational autoencoders. In International Conference on Machine Learning, pages 2678–2687, 2018.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 (canadian institute for advanced research).
    Google ScholarFindings
  • Rithesh Kumar, Anirudh Goyal, Aaron C. Courville, and Yoshua Bengio. Maximum entropy generators for energy-based models. CoRR, abs/1901.08508, 2019.
    Findings
  • Paul Langevin. On the theory of Brownian motion. 1908.
    Google ScholarFindings
  • Justin Lazarow, Long Jin, and Zhuowen Tu. Introspective neural networks for generative modeling. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 2793–2802, 2017.
    Google ScholarLocate open access versionFindings
  • Daniel D Lee and H Sebastian Seung. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, pages 556–562, 2001.
    Google ScholarLocate open access versionFindings
  • Bohan Li, Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick, and Yiming Yang. A surprisingly effective fix for deep latent variable modeling of text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3603–3614, Hong Kong, China, Nov. 2019. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
    Google ScholarLocate open access versionFindings
  • Yang Lu, Song-Chun Zhu, and Ying Nian Wu. Learning FRAME models using CNN filters. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA, pages 1902–1910, 2016.
    Google ScholarLocate open access versionFindings
  • Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet. Are gans created equal? a large-scale study. arXiv preprint arXiv:1711.10337, 2017.
    Findings
  • Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
    Findings
  • Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The penn treebank. Comput. Linguist., 19(2):313–330, June 1993.
    Google ScholarLocate open access versionFindings
  • Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.
    Google ScholarFindings
  • Erik Nijkamp, Mitch Hill, Tian Han, Song-Chun Zhu, and Ying Nian Wu. On the anatomy of MCMC-based maximum likelihood learning of energy-based models. Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020.
    Google ScholarLocate open access versionFindings
  • Erik Nijkamp, Mitch Hill, Song-Chun Zhu, and Ying Nian Wu. Learning non-convergent non-persistent short-run MCMC toward energy-based model. Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, Canada, 2019.
    Google ScholarLocate open access versionFindings
  • Erik Nijkamp, Bo Pang, Tian Han, Alex Zhou, Song-Chun Zhu, and Ying Nian Wu. Learning deep generative models with short run inference dynamics. arXiv preprint arXiv:1912.01909, 2019.
    Findings
  • Erik Nijkamp, Bo Pang, Tian Han, Alex Zhou, Song-Chun Zhu, and Ying Nian Wu. Learning deep generative models with short run inference dynamics. arXiv preprint arXiv:1912.01909, 2019.
    Findings
  • Bruno A Olshausen and David J Field. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research, 37(23):3311–3325, 1997.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
    Google ScholarLocate open access versionFindings
  • Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pages 1278– 1286, 2014.
    Google ScholarLocate open access versionFindings
  • Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951.
    Google ScholarFindings
  • Donald B. Rubin and Dorothy T. Thayer. Em algorithms for ml factor analysis. Psychometrika, 47(1):69–76, Mar 1982.
    Google ScholarLocate open access versionFindings
  • Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
    Google ScholarFindings
  • Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pages 1057–1063, 2000.
    Google ScholarLocate open access versionFindings
  • Tijmen Tieleman. Training restricted boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th international conference on Machine learning, pages 1064–1071, 2008.
    Google ScholarLocate open access versionFindings
  • Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein autoencoders. arXiv preprint arXiv:1711.01558, 2017.
    Findings
  • Jakub Tomczak and Max Welling. Vae with a vampprior. In International Conference on Artificial Intelligence and Statistics, pages 1214–1223, 2018.
    Google ScholarLocate open access versionFindings
  • Jakub M. Tomczak and Max Welling. VAE with a vampprior. In Amos J. Storkey and Fernando Pérez-Cruz, editors, International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, volume 84 of Proceedings of Machine Learning Research, pages 1214–1223. PMLR, 2018.
    Google ScholarLocate open access versionFindings
  • Ryan D. Turner, Jane Hung, Eric Frank, Yunus Saatchi, and Jason Yosinski. Metropolis-hastings generative adversarial networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 6345–6353. PMLR, 2019.
    Google ScholarLocate open access versionFindings
  • Ying Nian Wu, Ruiqi Gao, Tian Han, and Song-Chun Zhu. A tale of three probabilistic families: Discriminative, descriptive, and generative models. Quarterly of Applied Mathematics, 77(2):423–465, 2019.
    Google ScholarLocate open access versionFindings
  • Jianwen Xie, Yang Lu, Song-Chun Zhu, and Ying Nian Wu. A theory of generative convnet. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, pages 2635–2644, 2016.
    Google ScholarLocate open access versionFindings
  • Jianwen Xie, Song-Chun Zhu, and Ying Nian Wu. Synthesizing dynamic patterns by spatialtemporal generative convnet. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 1061–1069, 2017.
    Google ScholarLocate open access versionFindings
  • Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, and Taylor Berg-Kirkpatrick. Improved variational autoencoders for text modeling using dilated convolutions. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pages 3881–3890, 2017.
    Google ScholarLocate open access versionFindings
  • Houssam Zenati, Chuan Sheng Foo, Bruno Lecouat, Gaurav Manek, and Vijay Ramaseshan Chandrasekhar. Efficient gan-based anomaly detection. arXiv preprint arXiv:1802.06222, 2018.
    Findings
  • Junbo Zhao, Yoon Kim, Kelly Zhang, Alexander Rush, and Yann LeCun. Adversarially regularized autoencoders. In International Conference on Machine Learning, pages 5902–5911, 2018.
    Google ScholarLocate open access versionFindings
  • Song Chun Zhu. Statistical modeling and conceptualization of visual patterns. IEEE Trans. Pattern Anal. Mach. Intell., 25(6):691–712, 2003.
    Google ScholarLocate open access versionFindings
  • Song Chun Zhu and David Mumford. Grade: Gibbs reaction and diffusion equations. In Computer Vision, 1998. Sixth International Conference on, pages 847–854, 1998.
    Google ScholarLocate open access versionFindings
  • Song Chun Zhu, Ying Nian Wu, and David Mumford. Filters, random fields and maximum entropy (FRAME): towards a unified theory for texture modeling. International Journal of Computer Vision, 27(2):107–126, 1998.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments