Probabilistic Circuits for Variational Inference in Discrete Graphical Models
NIPS 2020, 2020.
EI
Weibo:
Abstract:
Inference in discrete graphical models with variational methods is difficult because of the inability to re-parameterize gradients of the Evidence Lower Bound (ELBO). Many sampling-based methods have been proposed for estimating these gradients, but they suffer from high bias or variance. In this paper, we propose a new approach that le...More
Introduction
- Variational methods for inference have seen a rise in popularity due to advancements in Monte Carlo methods for gradient estimation and optimization [11].
- The advent of black box variational inference [35] and the re-parameterization trick [18, 36, 39] have enabled the training of complex models with automatic differentiation tools [21], and provided a low-variance gradient estimate of the ELBO for continuous variables.
- Exploring richer variational families beyond fully factored mean-field would enable a tighter bound of the log partition function, and in turn bound the probability of evidence
Highlights
- Variational methods for inference have seen a rise in popularity due to advancements in Monte Carlo methods for gradient estimation and optimization [11]
- We study the problem of variational inference for discrete graphical models
- We show that we can obtain exact Evidence Lower Bound (ELBO) calculations for graphical models with polynomial logdensity by leveraging the properties of probabilistic circuit models
- We demonstrate the use of selective-Sum Product Networks (SPN) as the variational distribution, which leads to a tractable ELBO computation while being more expressive than mean-field or structured mean-field
- Our experiments on computing a lower bound of the log partition function of various graphical models show significant improvement over the other variational methods, and is competitive with approximation techniques using belief propagation
- These findings suggest that probabilistic circuits can be useful tools for inference in discrete graphical models due to their combination of tractability and expressivity
Methods
- The authors experimentally validate the use of selective-SPNs in variational inference of discrete graphical models.
- The authors use Algorithm 1 to construct selective-SPNs, compute the exact ELBO gradient, and optimize the parameters of the selective-SPN with gradient descent.
- Since the selective-SPNs have size O(kn), each optimization iteration takes O(tkn) steps, where t is the number of terms when expressing the log density of the graphical model as a polynomial, k is a hyperparameter denoting the size budget, and n is the number of binary variables.
Conclusion
- The authors study the problem of variational inference for discrete graphical models. While many Monte Carlo techniques have been proposed for estimating the ELBO, they can suffer from high bias or variance due to the inability to backpropagate through samples of discrete variables.
- The authors' experiments on computing a lower bound of the log partition function of various graphical models show significant improvement over the other variational methods, and is competitive with approximation techniques using belief propagation
- These findings suggest that probabilistic circuits can be useful tools for inference in discrete graphical models due to their combination of tractability and expressivity
Summary
Introduction:
Variational methods for inference have seen a rise in popularity due to advancements in Monte Carlo methods for gradient estimation and optimization [11].- The advent of black box variational inference [35] and the re-parameterization trick [18, 36, 39] have enabled the training of complex models with automatic differentiation tools [21], and provided a low-variance gradient estimate of the ELBO for continuous variables.
- Exploring richer variational families beyond fully factored mean-field would enable a tighter bound of the log partition function, and in turn bound the probability of evidence
Methods:
The authors experimentally validate the use of selective-SPNs in variational inference of discrete graphical models.- The authors use Algorithm 1 to construct selective-SPNs, compute the exact ELBO gradient, and optimize the parameters of the selective-SPN with gradient descent.
- Since the selective-SPNs have size O(kn), each optimization iteration takes O(tkn) steps, where t is the number of terms when expressing the log density of the graphical model as a polynomial, k is a hyperparameter denoting the size budget, and n is the number of binary variables.
Conclusion:
The authors study the problem of variational inference for discrete graphical models. While many Monte Carlo techniques have been proposed for estimating the ELBO, they can suffer from high bias or variance due to the inability to backpropagate through samples of discrete variables.- The authors' experiments on computing a lower bound of the log partition function of various graphical models show significant improvement over the other variational methods, and is competitive with approximation techniques using belief propagation
- These findings suggest that probabilistic circuits can be useful tools for inference in discrete graphical models due to their combination of tractability and expressivity
Tables
- Table1: Computing the log partition function of factor graphs from the 2014 UAI Inference Competition. The estimate closest to the ground truth is bolded, and the strongest lower bound is underlined
Related work
- There is a large body of work on estimating variational objectives for discrete settings. Most existing approaches are Monte Carlo methods [27, 26, 13, 41, 42], typically relying on continuous relaxations, which introduce bias, or score function estimators [45], which have high variance. Also of interest are neural variational inference approaches [22] that estimate an upper bound of the partition function, and transformer networks for estimating marginal likelihoods in discrete settings [46].
The approach of computing the variational lower bounds analytically has generally been restricted to mean-field or structured mean-field [37]. One line of work has studied the use of mean-field for exact ELBO computation of Bayesian neural networks [15, 10]. More relevant works also consider graphical models with polynomial log-density, but are still restricted to fully factored variational distributions [12]. Our result shows that computing the ELBO can be tractable for the more expressive variational family of selective-SPNs.
Funding
- Research supported by NSF (#1651565, #1522054, #1733686), ONR (N00014-19-1-2145), AFOSR (FA9550-19-1-0024), and FLI
Study subjects and analysis
documents: 100
Eq 15 in [1]) only has linear terms on φ, we used a shallow mixture model with 16 selective mixtures – a special case of selective-SPNs with only one sum node at the root – as the variational distribution. In Figure 3, we see the improvement of using a selective mixture model over mean-field variational inference for 100 documents. 0 0 20 40 60 80 100 Document
Reference
- David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022, 2003.
- David M Blei, Alp Kucukelbir, and Jon D McAuliffe. Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
- Beate Bollig and Matthias Buttkus. On the relative succinctness of sentential decision diagrams. Theory of Computing Systems, 63(6):1250–1277, 2019.
- Arthur Choi, Yujia Shen, and Adnan Darwiche. Tractability in structured probability spaces. In Advances in Neural Information Processing Systems, pages 3477–3485, 2017.
- YooJung Choi, Antonio Vergari, and Guy Van den Broeck. cuits: A unifying framework for tractable probabilistic models. http://starai.cs.ucla.edu/papers/ProbCirc20.pdf. Probabilistic cirsep 2020. URL
- Adnan Darwiche. A differential approach to inference in Bayesian networks. In J. ACM, 2000.
- Adnan Darwiche and Pierre Marquis. A knowledge compilation map. Journal of Artificial Intelligence Research, 17:229–264, 2002.
- Abram L. Friesen and Pedro M. Domingos. The sum-product theorem: A foundation for learning tractable models. In International Conference on Machine Learning, 2016.
- Robert Gens and Pedro M. Domingos. Learning the structure of sum-product networks. In International Conference on Machine Learning, 2013.
- Manuel Haußmann, Fred A Hamprecht, and Melih Kandemir. Sampling-free variational inference of bayesian neural networks by variance backpropagation. In Uncertainty in Artificial Intelligence, pages 563–573. PMLR, 2020.
- Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. Stochastic variational inference. The Journal of Machine Learning Research, 14(1):1303–1347, 2013.
- Matthew D Hoffman, Matthew J Johnson, and Dustin Tran. Autoconj: recognizing and exploiting conjugacy without a domain-specific language. In Advances in Neural Information Processing Systems, pages 10716–10726, 2018.
- Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparametrization with gumble-softmax. In International Conference on Learning Representations, 2017.
- Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, and Lawrence K Saul. An introduction to variational methods for graphical models. Machine learning, 37(2):183–233, 1999.
- Melih Kandemir. Variational closed-form deep neural net inference. Pattern Recognition Letters, 112:145–151, 2018.
- Pasha Khosravi, YooJung Choi, Yitao Liang, Antonio Vergari, and Guy Van den Broeck. On tractable computation of expected predictions. In Advances in Neural Information Processing Systems, dec 2019.
- Pasha Khosravi, Yitao Liang, YooJung Choi, and Guy Van den Broeck. What to expect of classifiers? reasoning about logistic regression with missing features. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019.
- Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
- Doga Kisa, Guy Van den Broeck, Arthur Choi, and Adnan Darwiche. Probabilistic sentential decision diagrams. In Principles of Knowledge Representation and Reasoning, 2014.
- Daphne Koller and Nir Friedman. Probabilistic Graphical Models - Principles and Techniques. MIT Press, 2009.
- Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M Blei. Automatic differentiation variational inference. The Journal of Machine Learning Research, 18(1): 430–474, 2017.
- Volodymyr Kuleshov and Stefano Ermon. Neural variational inference and learning in undirected graphical models. In Advances in Neural Information Processing Systems, pages 6734– 6743, 2017.
- Yann LeCun, Sumit Chopra, Raia Hadsell, M Ranzato, and F Huang. A tutorial on energybased learning. 2006.
- Yitao Liang, Jessa Bekker, and Guy Van den Broeck. Learning the structure of probabilistic sentential decision diagrams. In Uncertainty in Artificial Intelligence, 2017.
- Daniel Lowd and Pedro Domingos. Approximate inference by compilation to arithmetic circuits. In Advances in Neural Information Processing Systems, pages 1477–1485, 2010.
- Chris J Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous relaxation of discrete random variables. In International Conference on Learning Representations, 2016.
- Andriy Mnih and Danilo J Rezende. Variational inference for monte carlo objectives. In International Conference on Machine Learning, 2017.
- Joris M. Mooij. libDAI: A free and open source C++ library for discrete approximate inference in graphical models. Journal of Machine Learning Research, 11:2169–2173, August 2010. URL http://www.jmlr.org/papers/volume11/mooij10a/mooij10a.pdf.
- Radford M Neal. Annealed importance sampling. Statistics and computing, 11(2):125–139, 2001.
- Ryan O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.
- Robert Peharz, Robert Gens, and Pedro M. Domingos. Learning selective sum-product networks. In International Conference on Machine Learning Workshop on Learning Tractable Probabilistic Models, 2014.
- Robert Peharz, Robert Gens, Franz Pernkopf, and Pedro M. Domingos. On the latent variable interpretation in sum-product networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:2030–2044, 2017.
- Robert Peharz, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Kristian Kersting, and Zoubin Ghahramani. Random sum-product networks: A simple and effective approach to probabilistic deep learning. In Uncertainty in Artificial Intelligence, 2019.
- Hoifung Poon and Pedro M. Domingos. Sum-product networks: A new deep architecture. 2011 IEEE International Conference on Computer Vision Workshops, pages 689–690, 2011.
- Rajesh Ranganath, Sean Gerrish, and David M Blei. Black box variational inference. arXiv preprint arXiv:1401.0118, 2013.
- Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, 2014.
- Lawrence K Saul and Michael I Jordan. Exploiting tractable substructures in intractable networks. In Advances in neural information processing systems, pages 486–492, 1996.
- Yujia Shen, Arthur Choi, and Adnan Darwiche. Tractable operations for arithmetic circuits of probabilistic models. In Advances in Neural Information Processing Systems, 2016.
- Michalis Titsias and Miguel Lázaro-Gredilla. Doubly stochastic variational bayes for nonconjugate inference. In International conference on machine learning, pages 1971–1979, 2014.
- Martin Trapp, Robert Peharz, Franz Pernkopf, and Carl E Rasmussen. Deep structured mixtures of gaussian processes. arXiv preprint arXiv:1910.04536, 2019.
- George Tucker, Andriy Mnih, Chris J Maddison, John Lawson, and Jascha Sohl-Dickstein. Rebar: Low-variance, unbiased gradient estimates for discrete latent variable models. In Advances in Neural Information Processing Systems, pages 2627–2636, 2017.
- Arash Vahdat, William G. Macready, Zhengbing Bian, Amir Khoshaman, and Evgeny Andriyash. Dvae++: Discrete variational autoencoders with overlapping transformations. In International Conference on Machine Learning, 2018.
- Antonio Vergari, Nicola Di Mauro, and Floriana Esposito. Visualizing and understanding sumproduct networks. Machine Learning, 108(4):551–573, 2019.
- Martin J Wainwright, Michael I Jordan, et al. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
- Sam Wiseman and Yoon Kim. Amortized bethe free energy minimization for learning mrfs. In Advances in Neural Information Processing Systems, pages 15546–15557, 2019.
Full Text
Tags
Comments