AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We examine several variational autoencoder model architectures that have been proposed in the literature

Fixing a Broken ELBO.

ICML, pp.159-168, (2018)

Cited by: 290|Views315
EI
Full Text
Bibtex
Weibo

Abstract

Recent work in unsupervised representation learning has focused on learning deep directed latent-variable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether...More

Code:

Data:

0
Introduction
  • Learning a “useful” representation of data in an unsupervised way is one of the “holy grails” of current machine learning research.
  • A common approach to this problem is to fit a latent variable model of the form p(x, z|θ) = p(z|θ)p(x|z, θ) to the data, where x are the observed variables, z are the hidden variables, and θ are the parameters.
  • The authors usually fit such models by minimizing L(θ) = KL[p(x) || p(x|θ)], which is equivalent to maximum likelihood training.
  • Obtaining a good ELBO is not enough for good representation learning
Highlights
  • Learning a “useful” representation of data in an unsupervised way is one of the “holy grails” of current machine learning research
  • We may instead maximize a lower bound on this quantity, such as the evidence lower bound (ELBO), as is done when fitting variational autoencoder (VAE) models (Kingma & Welling
  • We show that VAEs with powerful autoregressive decoders can be trained to not ignore their latent code by targeting certain points on this curve
  • In section 4, we show how to use this framework to study the properties of various recently-proposed VAE model variants
  • We examine several VAE model architectures that have been proposed in the literature
  • We have presented a theoretical framework for understanding representation learning using latent variable models in terms of the rate-distortion tradeoff
Methods
  • Toy Model the authors empirically show a case where the usual ELBO objective can learn a model which perfectly captures the true data distribution, p∗(x), but which fails to learn a useful latent representation.
  • By training the same model such that the authors minimize the distortion, subject to achieving a desired target rate R∗, the authors can recover a latent representation that closely matches the true generative process, while perfectly capturing the true data distribution.
  • See Appendix E for more detail on the data generation and model
Results
  • It nearly perfectly reproduces the true generative process, as can be seen by comparing the yellow and purple regions in the z-space plots (2aii, 2cii) – both the optimal model and the Target Rate model have two clusters, one with about 70% of the probability mass, corresponding to class 0, and the other with about 30% of the mass corresponding to class 1
Conclusion
  • The authors have presented a theoretical framework for understanding representation learning using latent variable models in terms of the rate-distortion tradeoff
  • This constrained optimization problem allows them to fit models by targeting a specific point on the RD curve, which the authors cannot do using the β-VAE framework.
  • Perhaps the most surprising finding is that all the current approaches seem to have a hard time achieving high rates at low distortion.
Related work
  • Improving VAE representations. Many recent papers have introduced mechanisms for alleviating the problem of unused latent variables in VAEs. Bowman et al (2016) proposed annealing the weight of the KL term of the ELBO from 0 to 1 over the course of training but did not consider ending weights that differed from 1. Higgins et al (2017) proposed the β-VAE for unsupervised learning, which is a generalization of the original VAE in which the KL term is scaled by β, similar to this paper. However, their focus was on disentangling and did not discuss rate-distortion tradeoffs across model families. Recent work has used the β-VAE objective to tradeoff reconstruction quality for sampling accuracy (Ha & Eck, 2018). Chen et al (2017) present a bits-back interpretation (Hinton & Van Camp, 1993). Modifying the variational families (Kingma et al, 2016), priors (Papamakarios et al, 2017; Tomczak & Welling, 2017), and decoder structure (Chen et al, 2017) have also been proposed as a mechanism for learning better representations.
Funding
  • However, the VAE fails to learn a useful representation, only yielding a rate of R = 0.0002 nats,3 while the Target Rate model achieves R = 0.4999 nats. It nearly perfectly reproduces the true generative process, as can be seen by comparing the yellow and purple regions in the z-space plots (2aii, 2cii) – both the optimal model and the Target Rate model have two clusters, one with about 70% of the probability mass, corresponding to class 0 (purple shaded region), and the other with about 30% of the mass (yellow shaded region) corresponding to class 1
Reference
  • Achille, A. and Soatto, S. Information Dropout: Learning Optimal Representations Through Noisy Computation. In Information Control and Learning, September 2016. URL http://arxiv.org/abs/1611.01353.
    Findings
  • Achille, A. and Soatto, S. Emergence of Invariance and Disentangling in Deep Representations. Proceedings of the ICML Workshop on Principled Approaches to Deep Learning, 2017.
    Google ScholarLocate open access versionFindings
  • Agakov, F. V. Variational Information Maximization in Stochastic Environments. PhD thesis, University of Edinburgh, 2006.
    Google ScholarFindings
  • Alemi, A. A., Fischer, I., Dillon, J. V., and Murphy, K. Deep Variational Information Bottleneck. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Balle, J., Laparra, V., and Simoncelli, E. P. End-to-end Optimized Image Compression. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Barber, D. and Agakov, F. V. Information maximization in noisy channels: A variational approach. In NIPS. 2003.
    Google ScholarFindings
  • Bell, A. J. and Sejnowski, T. J. An informationmaximization approach to blind separation and blind deconvolution. Neural computation, 7(6):1129–1159, 1995.
    Google ScholarLocate open access versionFindings
  • Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio, S. Generating sentences from a continuous space. CoNLL, 2016.
    Google ScholarLocate open access versionFindings
  • Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., and Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. arXiv preprint 1606.03657, 2016.
    Findings
  • Chen, X., Kingma, D. P., Salimans, T., Duan, Y., Dhariwal, P., Schulman, J., Sutskever, I., and Abbeel, P. Variational Lossy Autoencoder. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Germain, M., Gregor, K., Murray, I., and Larochelle, H. Made: Masked autoencoder for distribution estimation. In ICML, 2015.
    Google ScholarLocate open access versionFindings
  • Gregor, K., Besse, F., Rezende, D. J., Danihelka, I., and Wierstra, D. Towards conceptual compression. In Advances In Neural Information Processing Systems, pp. 3549–3557, 2016.
    Google ScholarLocate open access versionFindings
  • Ha, D. and Eck, D. A neural representation of sketch drawings. International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Hy6GHpkCW.
    Locate open access versionFindings
  • Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Hinton, G. E. and Van Camp, D. Keeping the neural networks simple by minimizing the description length of the weights. In Proc. of the Workshop on Computational Learning Theory, 1993.
    Google ScholarLocate open access versionFindings
  • Hoffman, M. D. and Johnson, M. J. Elbo surgery: yet another way to carve up the variational evidence lower bound. In NIPS Workshop in Advances in Approximate Bayesian Inference, 2016.
    Google ScholarLocate open access versionFindings
  • Huszar, F. Is maximum likelihood useful for representation learning?, 20URL http://www.inference.vc/maximum-likelihood-forrepresentation-learning-2/.
    Findings
  • Johnston, N., Vincent, D., Minnen, D., Covell, M., Singh, S., Chinen, T., Hwang, S. J., Shor, J., and Toderici, G. Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks. ArXiv e-prints, 2017.
    Google ScholarFindings
  • Kingma, D. P. and Welling, M. Auto-encoding variational Bayes. In ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., and Welling, M. Improved variational inference with inverse autoregressive flow. In NIPS. 2016.
    Google ScholarFindings
  • Lake, B. M., Salakhutdinov, R., and Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
    Google ScholarLocate open access versionFindings
  • Larochelle, H. and Murray, I. The neural autoregressive distribution estimator. In AI/Statistics, 2011.
    Google ScholarLocate open access versionFindings
  • Makhzani, A., Shlens, J., Jaitly, N., and Goodfellow, I. Adversarial autoencoders. In ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • Papamakarios, G., Murray, I., and Pavlakou, T. Masked autoregressive flow for density estimation. In NIPS. 2017.
    Google ScholarFindings
  • Phuong, M., Welling, M., Kushman, N., Tomioka, R., and Nowozin, S. The mutual autoencoder: Controlling information in latent code representations, 2018. URL https://openreview.net/forum?id=HkbmWqxCZ.
    Findings
  • Rezende, D. J., Mohamed, S., and Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In ICML, 2014.
    Google ScholarLocate open access versionFindings
  • Salimans, T., Karpathy, A., Chen, X., and Kingma, D. P. Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Shamir, O., Sabato, S., and Tishby, N. Learning and generalization with the information bottleneck. Theoretical Computer Science, 411(29):2696 – 2711, 2010.
    Google ScholarLocate open access versionFindings
  • Slonim, N., Atwal, G. S., Tkacik, G., and Bialek, W. Information-based clustering. PNAS, 102(51):18297– 18302, 2005.
    Google ScholarLocate open access versionFindings
  • Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW), 2015.
    Google ScholarLocate open access versionFindings
  • Tishby, N., Pereira, F., and Biale, W. The information bottleneck method. In The 37th annual Allerton Conf. on Communication, Control, and Computing, pp. 368–377, 1999. URL https://arxiv.org/abs/physics/0004057.
    Locate open access versionFindings
  • Tomczak, J. M. and Welling, M. VAE with a VampPrior. ArXiv e-prints, 2017.
    Google ScholarFindings
  • van den Oord, A., Vinyals, O., and kavukcuoglu, k. Neural discrete representation learning. In NIPS. 2017.
    Google ScholarLocate open access versionFindings
  • Zhao, S., Song, J., and Ermon, S. Infovae: Information maximizing variational autoencoders. arXiv preprint 1706.02262, 2017.
    Findings
  • Zhao, S., Song, J., and Ermon, S. The informationautoencoding family: A lagrangian perspective on latent variable generative modeling, 2018. URL https://openreview.net/forum?id=ryZERzWCZ.
    Findings
Author
Joshua V. Dillon
Joshua V. Dillon
Rif A. Saurous
Rif A. Saurous
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科