AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We have presented a new approach for automated selection of the integration schedule for the Thermodynamic Variational Objective

Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

NIPS 2020, (2020)

Cited by: 0|Views38
EI
Full Text
Bibtex
Weibo

Abstract

Achieving the full promise of the Thermodynamic Variational Objective (TVO),a recently proposed variational lower bound on the log evidence involving a one-dimensional Riemann integral approximation, requires choosing a "schedule" ofsorted discretization points. This paper introduces a bespoke Gaussian processbandit optimization method ...More
0
Introduction
  • The Variational Autoencoder (VAE) framework has formed the basis for a number of recent advances in unsupervised representation learning [17, 35, 41].
  • The VAE framework introduces an inference network, which seeks to approximate the true posterior over latent variables.
  • The authors build upon the recent Thermodynamic Variational Objective (TVO), which frames log-likelihood estimation as a one-dimensional integral over the unit interval [26].
  • The integral is estimated using a Riemann sum approximation, as visualized in Figure 1, yielding a natural family of variational inference objectives which generalize and tighten the ELBO
Highlights
  • The Variational Autoencoder (VAE) framework has formed the basis for a number of recent advances in unsupervised representation learning [17, 35, 41]
  • In Appendix D, we explore learning and inference in a discrete probabilistic contextfree grammar [23], showing that the Thermodynamic Variational Objective (TVO) objective and our bandit optimization can translate to other learning settings
  • We have presented a new approach for automated selection of the integration schedule for the Thermodynamic Variational Objective
  • Our bandit framework optimizes a reward function that is directly linked to improvements in the generative model evidence over the course of training the model parameters
  • We show theoretically that this procedure asymptotically minimizes the regret as a function of the choice of schedule
  • Our algorithm, as well as all other existing schedules, still rely on the number of partitions d as a hyperparameter which is fixed over the course of the training
Methods
  • The authors demonstrate the effectiveness of the method for training VAEs [17] on MNIST and Fashion MNIST, and a Sigmoid Belief Network [27] on binarized MNIST and binarized Omniglot, using the TVO objective.
  • In Appendix D, the authors explore learning and inference in a discrete probabilistic contextfree grammar [23], showing that the TVO objective and the bandit optimization can translate to other learning settings.
  • The authors' code is available at http://github.com/ntienvu/tvo_gp_bandit
Conclusion
  • The authors have presented a new approach for automated selection of the integration schedule for the Thermodynamic Variational Objective.
  • The authors' bandit framework optimizes a reward function that is directly linked to improvements in the generative model evidence over the course of training the model parameters.
  • The authors demonstrated that the proposed approach empirically outperforms existing schedules in both model learning and inference for discrete and continuous generative models.
  • The authors' GP bandit optimization offers a general solution to choosing the integration schedule in the TVO.
  • Incorporating the adaptive selection of d into the bandit optimization remains an interesting direction for future work
Tables
  • Table1: Supporting notations in regret analysis. We use notation to similar to Appendix C of Bogunovic et al [<a class="ref-link" id="c5" href="#r5">5</a>] when possible
  • Table2: Wallclock time of the GP-bandit schedule compared to the grid-search of [<a class="ref-link" id="c26" href="#r26">26</a>] for the log schedule. GP-bandit approach achieves a competitive test log likelihood and lower KL divergence compared with the grid-searched log schedule, but requires significantly lower cumulative run-time
  • Table3: Comparison between permutation invariant and non-permutation invariant in MNIST dataset using S=10 (top) and S=50 (bottom). The best scores are in bold. Given T used epochs, the number of bandit update and thus the number of sample for GP is T /w where w = 10 is the frequency update. The permutation invariant will be more favorable when we have less samples for fitting the GP, as indicated in less number of used epochs T = 1000, 2000. The performance is comparable when we collect sufficiently large number of samples, e.g., when T /w = 1000
Download tables as Excel
Funding
  • VM acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) under award number PGSD3-535575-2019 and the British Columbia Graduate Scholarship, award number 6768
  • VM/FW acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada CIFAR AI Chairs Program, and the Intel Parallel Computing Centers program
  • RB acknowledges support from the Defense Advanced Research Projects Agency (DARPA) under award FA8750-17-C-0106. This material is based upon work supported by the United States Air Force Research Laboratory (AFRL) under the Defense Advanced Research Projects Agency (DARPA) Data Driven Discovery Models (D3M) program (Contract No FA8750-19-2-0222) and Learning with Less Labels (LwLL) program (Contract No.FA875019C0515)
  • Additional support was provided by UBC’s Composites Research Network (CRN), Data Science Institute (DSI) and Support for Teams to Advance Interdisciplinary Research (STAIR) Grants
Study subjects and analysis
samples: 5000
Continuous VAE: We present results of training a continuous VAE on the MNIST and Fashion MNIST dataset in Figure 4. We measure model learning performance using the test log evidence, as estimated by the IWAE bound [7] with 5000 samples per data point. We also compare inference performance using DKL[qφ(z | x) ||pθ(z | x)], which we calculate by subtracting the test ELBO from our estimate of log pθ(x)

Reference
  • Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
    Google ScholarLocate open access versionFindings
  • Atilim Gunes Baydin, Lei Shao, Wahid Bhimji, Lukas Heinrich, Saeid Naderiparizi, Andreas Munk, Jialin Liu, Bradley Gram-Hansen, Gilles Louppe, Lawrence Meadows, et al. Efficient probabilistic inference in the quest for physics beyond the standard model. In Advances in Neural Information Processing Systems, pages 5460–5473, 2019.
    Google ScholarLocate open access versionFindings
  • Christopher M Bishop. Pattern recognition and machine learning. springer New York, 2006.
    Google ScholarFindings
  • David M Blei, Alp Kucukelbir, and Jon D McAuliffe. Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
    Google ScholarLocate open access versionFindings
  • Ilija Bogunovic, Jonathan Scarlett, and Volkan Cevher. Time-varying Gaussian process bandit optimization. In Artificial Intelligence and Statistics, pages 314–323, 2016.
    Google ScholarLocate open access versionFindings
  • Rob Brekelmans, Vaden Masrani, Frank Wood, Greg Ver Steeg, and Aram Galstyan. All in the exponential family: Bregman duality in thermodynamic variational inference. In International Conference on Machine Learning, 2020.
    Google ScholarLocate open access versionFindings
  • Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov. Importance weighted autoencoders. In International Conference on Representation Learning, 2016.
    Google ScholarLocate open access versionFindings
  • Chris Cremer, Xuechen Li, and David Duvenaud. Inference suboptimality in variational autoencoders. In International Conference on Machine Learning, pages 1078–1086, 2018.
    Google ScholarLocate open access versionFindings
  • Roger Fletcher. Practical methods of optimization. John Wiley & Sons, 2013.
    Google ScholarFindings
  • Daan Frenkel and Berend Smit. Understanding molecular simulation: from algorithms to applications, volume 1.
    Google ScholarFindings
  • Andrew Gelman and Xiao-Li Meng. Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical science, pages 163–185, 1998.
    Google ScholarLocate open access versionFindings
  • Junxian He, Daniel Spokoyny, Graham Neubig, and Taylor Berg-Kirkpatrick. Lagging inference networks and posterior collapse in variational autoencoders. In International Conference on Representation Learning, 2019.
    Google ScholarLocate open access versionFindings
  • Philipp Hennig and Christian J Schuler. Entropy search for information-efficient global optimization. Journal of Machine Learning Research, 13:1809–1837, 2012.
    Google ScholarLocate open access versionFindings
  • José Miguel Hernández-Lobato, Matthew W Hoffman, and Zoubin Ghahramani. Predictive entropy search for efficient global optimization of black-box functions. In Advances in Neural Information Processing Systems, pages 918–926, 2014.
    Google ScholarLocate open access versionFindings
  • Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455–492, 1998.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 2014.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
    Google ScholarLocate open access versionFindings
  • Durk P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improved variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems, pages 4743–4751, 2016.
    Google ScholarLocate open access versionFindings
  • Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. Fast Bayesian optimization of machine learning hyperparameters on large datasets. In Artificial Intelligence and Statistics, pages 528–536, 2017.
    Google ScholarLocate open access versionFindings
  • Andreas Krause and Cheng S Ong. Contextual Gaussian process bandit optimization. In Advances in Neural Information Processing Systems, pages 2447–2455, 2011.
    Google ScholarLocate open access versionFindings
  • Brenden M Lake, Russ R Salakhutdinov, and Josh Tenenbaum. One-shot learning by inverting a compositional causal process. In Advances in Neural Information Processing Systems, pages 2526–2534, 2013.
    Google ScholarLocate open access versionFindings
  • Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
    Google ScholarLocate open access versionFindings
  • Tuan Anh Le, Adam R Kosiorek, N Siddharth, Yee Whye Teh, and Frank Wood. Revisiting reweighted wake-sleep for models with stochastic control flow. In Uncertainty in Artificial Intelligence, pages 1039–1049. PMLR, 2020.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    Google ScholarLocate open access versionFindings
  • Yucen Luo, Alex Beatson, Mohammad Norouzi, Jun Zhu, David Duvenaud, Ryan P Adams, and Ricky TQ Chen. Sumo: Unbiased estimation of log marginal probability for latent variable models. arXiv preprint arXiv:2004.00353, 2020.
    Findings
  • Vaden Masrani, Tuan Anh Le, and Frank Wood. The thermodynamic variational objective. In Advances in Neural Information Processing Systems, pages 11521–11530, 2019.
    Google ScholarLocate open access versionFindings
  • Andriy Mnih and Karol Gregor. Neural variational inference and learning in belief networks. In International Conference on Machine Learning, pages 1791–1799, 2014.
    Google ScholarLocate open access versionFindings
  • Andriy Mnih and Danilo J Rezende. Variational inference for monte carlo objectives. In International Conference on Machine Learning, pages 2188–2196, 2016.
    Google ScholarLocate open access versionFindings
  • Vu Nguyen, Sebastian Schulze, and Michael A Osborne. Bayesian optimization for iterative learning. In Advances in Neural Information Processing Systems, 2020.
    Google ScholarLocate open access versionFindings
  • Sebastian Nowozin. Debiasing evidence approximations: On importance-weighted autoencoders and jackknife variational inference. International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Favour Mandanji Nyikosa. Adaptive Bayesian optimization for dynamic problems. PhD thesis, University of Oxford, 2018.
    Google ScholarFindings
  • Yosihiko Ogata. A monte carlo method for high dimensional integration. Numerische Mathematik, 55(2):137–157, 1989.
    Google ScholarLocate open access versionFindings
  • Carl Edward Rasmussen. Gaussian processes for machine learning. 2006.
    Google ScholarFindings
  • Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In International Conference on Machine Learning, pages 1530–1538, 2015.
    Google ScholarLocate open access versionFindings
  • Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, pages 1278–1286, 2014.
    Google ScholarLocate open access versionFindings
  • Ruslan Salakhutdinov and Iain Murray. On the quantitative analysis of deep belief networks. In International Conference on Machine Learning, pages 872–879, 2008.
    Google ScholarLocate open access versionFindings
  • Jasper Snoek, Kevin Swersky, Rich Zemel, and Ryan Adams. Input warping for Bayesian optimization of non-stationary functions. In International Conference on Machine Learning, pages 1674–1682, 2014.
    Google ScholarLocate open access versionFindings
  • Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. Ladder variational autoencoders. In Advances in Neural Information Processing Systems 29, pages 3738–3746. 2016.
    Google ScholarLocate open access versionFindings
  • Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In International Conference on Machine Learning, pages 1015–1022, 2010.
    Google ScholarLocate open access versionFindings
  • Kevin Swersky, Jasper Snoek, and Ryan P Adams. Multi-task Bayesian optimization. In Advances in Neural Information Processing Systems, pages 2004–2012, 2013.
    Google ScholarLocate open access versionFindings
  • Michael Tschannen, Olivier Bachem, and Mario Lucic. Recent advances in autoencoder-based representation learning. arXiv preprint arXiv:1812.05069, 2018.
    Findings
  • Mark van der Wilk, Matthias Bauer, ST John, and James Hensman. Learning invariances using the marginal likelihood. In Advances in Neural Information Processing Systems, pages 9938–9948, 2018.
    Google ScholarLocate open access versionFindings
  • Martin J Wainwright and Michael I Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1-2):1–305, 2008.
    Google ScholarLocate open access versionFindings
  • Frank Wood, Andrew Warrington, Saeid Naderiparizi, Christian Weilbach, Vaden Masrani, William Harvey, Adam Scibior, Boyan Beronov, and Ali Nasseri. Planning as inference in epidemiological models. arXiv preprint arXiv:2003.13221, 2020.
    Findings
  • Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
    Findings
  • [5] App. C. with time kernel kT (i, j) = (1 − ω) 2. At a high level, their proof proceeds by partitioning the T random functions into blocks of length N, and bounding each using Mirsky’s theorem. Referring to Table 1 for notation, this results in a bound on the maximum mutual information γN ≤
    Google ScholarLocate open access versionFindings
  • [5] Eq. (58), we have
    Google ScholarFindings
  • [5] Eq. (60) (N 5/2 ≤ N 3), where the latter was achieved via a simple constrained optimization argument. Using (26), Theorem 1 follows using identical arguments as in [5].
    Google ScholarFindings
  • [26] Appendix G.1).
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科