Probabilistic Circuits for Variational Inference in Discrete Graphical Models

Andy Shih
Andy Shih

NIPS 2020, 2020.

Cited by: 0|Bibtex|Views36
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We show that we can obtain exact Evidence Lower Bound calculations for graphical models with polynomial logdensity by leveraging the properties of probabilistic circuit models

Abstract:

Inference in discrete graphical models with variational methods is difficult because of the inability to re-parameterize gradients of the Evidence Lower Bound (ELBO). Many sampling-based methods have been proposed for estimating these gradients, but they suffer from high bias or variance. In this paper, we propose a new approach that le...More
0
Introduction
  • Variational methods for inference have seen a rise in popularity due to advancements in Monte Carlo methods for gradient estimation and optimization [11].
  • The advent of black box variational inference [35] and the re-parameterization trick [18, 36, 39] have enabled the training of complex models with automatic differentiation tools [21], and provided a low-variance gradient estimate of the ELBO for continuous variables.
  • Exploring richer variational families beyond fully factored mean-field would enable a tighter bound of the log partition function, and in turn bound the probability of evidence
Highlights
  • Variational methods for inference have seen a rise in popularity due to advancements in Monte Carlo methods for gradient estimation and optimization [11]
  • We study the problem of variational inference for discrete graphical models
  • We show that we can obtain exact Evidence Lower Bound (ELBO) calculations for graphical models with polynomial logdensity by leveraging the properties of probabilistic circuit models
  • We demonstrate the use of selective-Sum Product Networks (SPN) as the variational distribution, which leads to a tractable ELBO computation while being more expressive than mean-field or structured mean-field
  • Our experiments on computing a lower bound of the log partition function of various graphical models show significant improvement over the other variational methods, and is competitive with approximation techniques using belief propagation
  • These findings suggest that probabilistic circuits can be useful tools for inference in discrete graphical models due to their combination of tractability and expressivity
Methods
  • The authors experimentally validate the use of selective-SPNs in variational inference of discrete graphical models.
  • The authors use Algorithm 1 to construct selective-SPNs, compute the exact ELBO gradient, and optimize the parameters of the selective-SPN with gradient descent.
  • Since the selective-SPNs have size O(kn), each optimization iteration takes O(tkn) steps, where t is the number of terms when expressing the log density of the graphical model as a polynomial, k is a hyperparameter denoting the size budget, and n is the number of binary variables.
Conclusion
  • The authors study the problem of variational inference for discrete graphical models. While many Monte Carlo techniques have been proposed for estimating the ELBO, they can suffer from high bias or variance due to the inability to backpropagate through samples of discrete variables.
  • The authors' experiments on computing a lower bound of the log partition function of various graphical models show significant improvement over the other variational methods, and is competitive with approximation techniques using belief propagation
  • These findings suggest that probabilistic circuits can be useful tools for inference in discrete graphical models due to their combination of tractability and expressivity
Summary
  • Introduction:

    Variational methods for inference have seen a rise in popularity due to advancements in Monte Carlo methods for gradient estimation and optimization [11].
  • The advent of black box variational inference [35] and the re-parameterization trick [18, 36, 39] have enabled the training of complex models with automatic differentiation tools [21], and provided a low-variance gradient estimate of the ELBO for continuous variables.
  • Exploring richer variational families beyond fully factored mean-field would enable a tighter bound of the log partition function, and in turn bound the probability of evidence
  • Methods:

    The authors experimentally validate the use of selective-SPNs in variational inference of discrete graphical models.
  • The authors use Algorithm 1 to construct selective-SPNs, compute the exact ELBO gradient, and optimize the parameters of the selective-SPN with gradient descent.
  • Since the selective-SPNs have size O(kn), each optimization iteration takes O(tkn) steps, where t is the number of terms when expressing the log density of the graphical model as a polynomial, k is a hyperparameter denoting the size budget, and n is the number of binary variables.
  • Conclusion:

    The authors study the problem of variational inference for discrete graphical models. While many Monte Carlo techniques have been proposed for estimating the ELBO, they can suffer from high bias or variance due to the inability to backpropagate through samples of discrete variables.
  • The authors' experiments on computing a lower bound of the log partition function of various graphical models show significant improvement over the other variational methods, and is competitive with approximation techniques using belief propagation
  • These findings suggest that probabilistic circuits can be useful tools for inference in discrete graphical models due to their combination of tractability and expressivity
Tables
  • Table1: Computing the log partition function of factor graphs from the 2014 UAI Inference Competition. The estimate closest to the ground truth is bolded, and the strongest lower bound is underlined
Download tables as Excel
Related work
  • There is a large body of work on estimating variational objectives for discrete settings. Most existing approaches are Monte Carlo methods [27, 26, 13, 41, 42], typically relying on continuous relaxations, which introduce bias, or score function estimators [45], which have high variance. Also of interest are neural variational inference approaches [22] that estimate an upper bound of the partition function, and transformer networks for estimating marginal likelihoods in discrete settings [46].

    The approach of computing the variational lower bounds analytically has generally been restricted to mean-field or structured mean-field [37]. One line of work has studied the use of mean-field for exact ELBO computation of Bayesian neural networks [15, 10]. More relevant works also consider graphical models with polynomial log-density, but are still restricted to fully factored variational distributions [12]. Our result shows that computing the ELBO can be tractable for the more expressive variational family of selective-SPNs.
Funding
  • Research supported by NSF (#1651565, #1522054, #1733686), ONR (N00014-19-1-2145), AFOSR (FA9550-19-1-0024), and FLI
Study subjects and analysis
documents: 100
Eq 15 in [1]) only has linear terms on φ, we used a shallow mixture model with 16 selective mixtures – a special case of selective-SPNs with only one sum node at the root – as the variational distribution. In Figure 3, we see the improvement of using a selective mixture model over mean-field variational inference for 100 documents. 0 0 20 40 60 80 100 Document

Reference
  • David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022, 2003.
    Google ScholarLocate open access versionFindings
  • David M Blei, Alp Kucukelbir, and Jon D McAuliffe. Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
    Google ScholarLocate open access versionFindings
  • Beate Bollig and Matthias Buttkus. On the relative succinctness of sentential decision diagrams. Theory of Computing Systems, 63(6):1250–1277, 2019.
    Google ScholarLocate open access versionFindings
  • Arthur Choi, Yujia Shen, and Adnan Darwiche. Tractability in structured probability spaces. In Advances in Neural Information Processing Systems, pages 3477–3485, 2017.
    Google ScholarLocate open access versionFindings
  • YooJung Choi, Antonio Vergari, and Guy Van den Broeck. cuits: A unifying framework for tractable probabilistic models. http://starai.cs.ucla.edu/papers/ProbCirc20.pdf. Probabilistic cirsep 2020. URL
    Locate open access versionFindings
  • Adnan Darwiche. A differential approach to inference in Bayesian networks. In J. ACM, 2000.
    Google ScholarFindings
  • Adnan Darwiche and Pierre Marquis. A knowledge compilation map. Journal of Artificial Intelligence Research, 17:229–264, 2002.
    Google ScholarLocate open access versionFindings
  • Abram L. Friesen and Pedro M. Domingos. The sum-product theorem: A foundation for learning tractable models. In International Conference on Machine Learning, 2016.
    Google ScholarLocate open access versionFindings
  • Robert Gens and Pedro M. Domingos. Learning the structure of sum-product networks. In International Conference on Machine Learning, 2013.
    Google ScholarLocate open access versionFindings
  • Manuel Haußmann, Fred A Hamprecht, and Melih Kandemir. Sampling-free variational inference of bayesian neural networks by variance backpropagation. In Uncertainty in Artificial Intelligence, pages 563–573. PMLR, 2020.
    Google ScholarLocate open access versionFindings
  • Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. Stochastic variational inference. The Journal of Machine Learning Research, 14(1):1303–1347, 2013.
    Google ScholarLocate open access versionFindings
  • Matthew D Hoffman, Matthew J Johnson, and Dustin Tran. Autoconj: recognizing and exploiting conjugacy without a domain-specific language. In Advances in Neural Information Processing Systems, pages 10716–10726, 2018.
    Google ScholarLocate open access versionFindings
  • Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparametrization with gumble-softmax. In International Conference on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, and Lawrence K Saul. An introduction to variational methods for graphical models. Machine learning, 37(2):183–233, 1999.
    Google ScholarLocate open access versionFindings
  • Melih Kandemir. Variational closed-form deep neural net inference. Pattern Recognition Letters, 112:145–151, 2018.
    Google ScholarLocate open access versionFindings
  • Pasha Khosravi, YooJung Choi, Yitao Liang, Antonio Vergari, and Guy Van den Broeck. On tractable computation of expected predictions. In Advances in Neural Information Processing Systems, dec 2019.
    Google ScholarLocate open access versionFindings
  • Pasha Khosravi, Yitao Liang, YooJung Choi, and Guy Van den Broeck. What to expect of classifiers? reasoning about logistic regression with missing features. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
    Google ScholarLocate open access versionFindings
  • Doga Kisa, Guy Van den Broeck, Arthur Choi, and Adnan Darwiche. Probabilistic sentential decision diagrams. In Principles of Knowledge Representation and Reasoning, 2014.
    Google ScholarLocate open access versionFindings
  • Daphne Koller and Nir Friedman. Probabilistic Graphical Models - Principles and Techniques. MIT Press, 2009.
    Google ScholarFindings
  • Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M Blei. Automatic differentiation variational inference. The Journal of Machine Learning Research, 18(1): 430–474, 2017.
    Google ScholarLocate open access versionFindings
  • Volodymyr Kuleshov and Stefano Ermon. Neural variational inference and learning in undirected graphical models. In Advances in Neural Information Processing Systems, pages 6734– 6743, 2017.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Sumit Chopra, Raia Hadsell, M Ranzato, and F Huang. A tutorial on energybased learning. 2006.
    Google ScholarFindings
  • Yitao Liang, Jessa Bekker, and Guy Van den Broeck. Learning the structure of probabilistic sentential decision diagrams. In Uncertainty in Artificial Intelligence, 2017.
    Google ScholarLocate open access versionFindings
  • Daniel Lowd and Pedro Domingos. Approximate inference by compilation to arithmetic circuits. In Advances in Neural Information Processing Systems, pages 1477–1485, 2010.
    Google ScholarLocate open access versionFindings
  • Chris J Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous relaxation of discrete random variables. In International Conference on Learning Representations, 2016.
    Google ScholarLocate open access versionFindings
  • Andriy Mnih and Danilo J Rezende. Variational inference for monte carlo objectives. In International Conference on Machine Learning, 2017.
    Google ScholarLocate open access versionFindings
  • Joris M. Mooij. libDAI: A free and open source C++ library for discrete approximate inference in graphical models. Journal of Machine Learning Research, 11:2169–2173, August 2010. URL http://www.jmlr.org/papers/volume11/mooij10a/mooij10a.pdf.
    Locate open access versionFindings
  • Radford M Neal. Annealed importance sampling. Statistics and computing, 11(2):125–139, 2001.
    Google ScholarLocate open access versionFindings
  • Ryan O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.
    Google ScholarFindings
  • Robert Peharz, Robert Gens, and Pedro M. Domingos. Learning selective sum-product networks. In International Conference on Machine Learning Workshop on Learning Tractable Probabilistic Models, 2014.
    Google ScholarLocate open access versionFindings
  • Robert Peharz, Robert Gens, Franz Pernkopf, and Pedro M. Domingos. On the latent variable interpretation in sum-product networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:2030–2044, 2017.
    Google ScholarLocate open access versionFindings
  • Robert Peharz, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Kristian Kersting, and Zoubin Ghahramani. Random sum-product networks: A simple and effective approach to probabilistic deep learning. In Uncertainty in Artificial Intelligence, 2019.
    Google ScholarLocate open access versionFindings
  • Hoifung Poon and Pedro M. Domingos. Sum-product networks: A new deep architecture. 2011 IEEE International Conference on Computer Vision Workshops, pages 689–690, 2011.
    Google ScholarLocate open access versionFindings
  • Rajesh Ranganath, Sean Gerrish, and David M Blei. Black box variational inference. arXiv preprint arXiv:1401.0118, 2013.
    Findings
  • Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, 2014.
    Google ScholarLocate open access versionFindings
  • Lawrence K Saul and Michael I Jordan. Exploiting tractable substructures in intractable networks. In Advances in neural information processing systems, pages 486–492, 1996.
    Google ScholarLocate open access versionFindings
  • Yujia Shen, Arthur Choi, and Adnan Darwiche. Tractable operations for arithmetic circuits of probabilistic models. In Advances in Neural Information Processing Systems, 2016.
    Google ScholarLocate open access versionFindings
  • Michalis Titsias and Miguel Lázaro-Gredilla. Doubly stochastic variational bayes for nonconjugate inference. In International conference on machine learning, pages 1971–1979, 2014.
    Google ScholarLocate open access versionFindings
  • Martin Trapp, Robert Peharz, Franz Pernkopf, and Carl E Rasmussen. Deep structured mixtures of gaussian processes. arXiv preprint arXiv:1910.04536, 2019.
    Findings
  • George Tucker, Andriy Mnih, Chris J Maddison, John Lawson, and Jascha Sohl-Dickstein. Rebar: Low-variance, unbiased gradient estimates for discrete latent variable models. In Advances in Neural Information Processing Systems, pages 2627–2636, 2017.
    Google ScholarLocate open access versionFindings
  • Arash Vahdat, William G. Macready, Zhengbing Bian, Amir Khoshaman, and Evgeny Andriyash. Dvae++: Discrete variational autoencoders with overlapping transformations. In International Conference on Machine Learning, 2018.
    Google ScholarLocate open access versionFindings
  • Antonio Vergari, Nicola Di Mauro, and Floriana Esposito. Visualizing and understanding sumproduct networks. Machine Learning, 108(4):551–573, 2019.
    Google ScholarLocate open access versionFindings
  • Martin J Wainwright, Michael I Jordan, et al. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
    Google ScholarLocate open access versionFindings
  • Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
    Google ScholarLocate open access versionFindings
  • Sam Wiseman and Yoon Kim. Amortized bethe free energy minimization for learning mrfs. In Advances in Neural Information Processing Systems, pages 15546–15557, 2019.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments