Belief Propagation Neural Networks

NeurIPS, 2020.

Cited by: 0|Bibtex|Views57
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We empirically demonstrated that belief propagation neural networks can learn from tiny data sets containing only 10s of training points and generalize to test data drawn from a different distribution than seen during training

Abstract:

Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that ...More

Code:

Data:

0
Introduction
  • Probabilistic inference problems arise in many domains, from statistical physics to machine learning.
  • The authors introduce belief propagation neural networks (BPNNs), a flexible neural architecture designed to estimate the partition function of a factor graph.
  • Like BP, BPNN-D is guaranteed to converge on tree structured factor graphs and return the exact partition function.
  • BPNN-B performs regression from the trajectory of beliefs to the partition function of the input factor graph
  • While this sacrifices some guarantees, the additional flexibility introduced by BPNN-B generally improves estimation performance.
  • Factor nodes and variables nodes are connected if and only if the variable is in the scope of the factor
Highlights
  • Probabilistic inference problems arise in many domains, from statistical physics to machine learning
  • belief propagation neural networks (BPNNs) are composed of iterative layers (BPNND) and an optional Bethe free energy layer (BPNN-B), both of which maintain the symmetries of Belief Propagation (BP) under factor graph isomorphisms
  • Our BPNN significantly outperforms loopy belief propagation, both for test data drawn from the training distribution and for out of distribution data
  • As shown in Table 1, BPNN provides the best estimates for the partition function
  • We introduced belief propagation neural networks, a strict generalization of BP that learns to find better fixed points faster
  • We empirically demonstrated that BPNNs can learn from tiny data sets containing only 10s of training points and generalize to test data drawn from a different distribution than seen during training
Methods
  • In the experiments the authors trained BPNN to estimate the partition function of factor graphs from a variety of domains.
  • Experiments on synthetic Ising models show that BPNN-D can learn to find better fixed points than BP and converge faster.
  • BPNN generalizes to Ising models with nearly twice as many variables as those seen during training and that were sampled from a different distribution.
  • The authors refer the reader to Appendix B.2 for details on the GNN
Results
  • As shown in Table 1, BPNN provides the best estimates for the partition function. Critically, the authors see that not ’double counting’ messages and preserving the symmetries of BP are key improvements of BPNN over GNN.
  • The computational complexity of exact model counting has led to a significant body of work on approximate model counting [46, 27, 28, 8, 20, 18, 24, 3, 5, 44], with the goal of estimating the number of satisfying solutions at a lower computational cost
Conclusion
  • The authors introduced belief propagation neural networks, a strict generalization of BP that learns to find better fixed points faster.
  • The authors empirically demonstrated that BPNNs can learn from tiny data sets containing only 10s of training points and generalize to test data drawn from a different distribution than seen during training.
  • BPNNs significantly outperform loopy belief propagation and standard graph neural networks in terms of accuracy.
  • BPNNs provide excellent computational efficiency, running orders of magnitudes faster than stateof-the-art randomized hashing algorithms while maintaining comparable accuracy.
Summary
  • Introduction:

    Probabilistic inference problems arise in many domains, from statistical physics to machine learning.
  • The authors introduce belief propagation neural networks (BPNNs), a flexible neural architecture designed to estimate the partition function of a factor graph.
  • Like BP, BPNN-D is guaranteed to converge on tree structured factor graphs and return the exact partition function.
  • BPNN-B performs regression from the trajectory of beliefs to the partition function of the input factor graph
  • While this sacrifices some guarantees, the additional flexibility introduced by BPNN-B generally improves estimation performance.
  • Factor nodes and variables nodes are connected if and only if the variable is in the scope of the factor
  • Methods:

    In the experiments the authors trained BPNN to estimate the partition function of factor graphs from a variety of domains.
  • Experiments on synthetic Ising models show that BPNN-D can learn to find better fixed points than BP and converge faster.
  • BPNN generalizes to Ising models with nearly twice as many variables as those seen during training and that were sampled from a different distribution.
  • The authors refer the reader to Appendix B.2 for details on the GNN
  • Results:

    As shown in Table 1, BPNN provides the best estimates for the partition function. Critically, the authors see that not ’double counting’ messages and preserving the symmetries of BP are key improvements of BPNN over GNN.
  • The computational complexity of exact model counting has led to a significant body of work on approximate model counting [46, 27, 28, 8, 20, 18, 24, 3, 5, 44], with the goal of estimating the number of satisfying solutions at a lower computational cost
  • Conclusion:

    The authors introduced belief propagation neural networks, a strict generalization of BP that learns to find better fixed points faster.
  • The authors empirically demonstrated that BPNNs can learn from tiny data sets containing only 10s of training points and generalize to test data drawn from a different distribution than seen during training.
  • BPNNs significantly outperform loopy belief propagation and standard graph neural networks in terms of accuracy.
  • BPNNs provide excellent computational efficiency, running orders of magnitudes faster than stateof-the-art randomized hashing algorithms while maintaining comparable accuracy.
Tables
  • Table1: RMSE of SBM ln(Z) estimates. BPNN outperforms BP, GNN, and ablated versions of BPNN
  • Table2: RMSE of BPNN for each training/validation set, along with ablation results. BPNN corresponds to a model with 5 BPNN-D layers followed by a Bethe layer that is invariant to the factor graph representation. BPNN-NI corresponds to removing invariance from the Bethe layer. BPNN-DC corresponds to performing ’double counting’ as is standard for GNN, rather than subtracting previously sent messages as is standard for BP. ’Random Split’ rows show that BPNNs are capable of learning a distribution from a tiny dataset of only 10s of training problems. ‘Easy / Hard’ rows additionally show that BPNNs are able to generalize from simple training problems to significantly more complex validation problems
  • Table3: Root mean squared error (RMSE) of estimates of the natural logarithm of the number of satisfying solutions is shown. The fraction of benchmarks within each category that each approximate counter was able to complete within the time limit of 5k seconds is shown in parentheses
  • Table4: Runtime percentiles (in seconds) are shown for DSharp, ApproxMC3, and F2. Percentiles are computed separately for each category’s training dataset. In comparison, BPNN sequential runtime is nearly a constant and BPNN parallel runtime is limited by GPU memory
  • Table5: RMSE of ln(Z) of BPNN against BP and GNN for SBM’s generated from different distributions and larger graphs than the training or validation set. We see that BPNN outperforms both methods here across different edge probabilities, class probabilities, and on larger graphs. Furthermore, it generalizes better than GNN in all these settings
Download tables as Excel
Funding
  • Research supported by NSF (#1651565, #1522054, #1733686), ONR (N00014-19-1-2145), AFOSR (FA955019-1-0024), and FLI
Reference
  • Emmanuel Abbe. Community detection and stochastic block models: Recent developments. Journal of Machine Learning Research, 18(177):1–86, 2018. URL http://jmlr.org/papers/v18/16-480.html.
    Locate open access versionFindings
  • Ralph Abboud, Ismail Ilkan Ceylan, and Thomas Lukasiewicz. Learning to reason: Leveraging neural networks for approximate dnf counting. AAAI, 2020.
    Google ScholarLocate open access versionFindings
  • Dimitris Achlioptas and Pei Jiang. Stochastic integration via error-correcting codes. In Proc. Uncertainty in Artificial Intelligence, 2015.
    Google ScholarLocate open access versionFindings
  • Dimitris Achlioptas and Panos Theodoropoulos. Probabilistic model counting with short XORs. In SAT, 2017.
    Google ScholarLocate open access versionFindings
  • Dimitris Achlioptas, Zayd Hammoudeh, and Panos Theodoropoulos. Fast and flexible probabilistic model counting. In SAT, pages 148–164, 2018.
    Google ScholarLocate open access versionFindings
  • Rodney J Baxter. Exactly solved models in statistical mechanics. Elsevier, 2016.
    Google ScholarFindings
  • Jens Behrmann, Will Grathwohl, Ricky TQ Chen, David Duvenaud, and Jörn-Henrik Jacobsen. Invertible residual networks. In International Conference on Machine Learning, pages 573– 582, 2019.
    Google ScholarLocate open access versionFindings
  • Mihir Bellare, Oded Goldreich, and Erez Petrank. Uniform generation of np-witnesses using an np-oracle. Electronic Colloquium on Computational Complexity (ECCC), 5, 1998.
    Google ScholarLocate open access versionFindings
  • Vaishak Belle, Guy Van den Broeck, and Andrea Passerini. Hashing-based approximate probabilistic inference in hybrid domains. In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI), 2015.
    Google ScholarLocate open access versionFindings
  • Hans A Bethe. Statistical theory of superlattices. Proceedings of the Royal Society of London. Series A-Mathematical and Physical Sciences, 150(871):552–575, 1935.
    Google ScholarLocate open access versionFindings
  • Fabrizio Biondi, Michael A. Enescu, Annelie Heuser, Axel Legay, Kuldeep S. Meel, and Jean Quilbeuf. Scalable approximation of quantitative information flow in programs. In VMCAI, 2018.
    Google ScholarLocate open access versionFindings
  • Supratik Chakraborty, Kuldeep S. Meel, and Moshe Y. Vardi. Algorithmic improvements in approximate counting for probabilistic inference: From linear to logarithmic SAT calls. In IJCAI, 7 2016.
    Google ScholarLocate open access versionFindings
  • David Chandler. Introduction to modern statistical mechanics. Oxford University Press, Oxford, UK, 1987.
    Google ScholarFindings
  • Zhengdao Chen, Lisha Li, and Joan Bruna. Supervised community detection with line graph neural networks. ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E, 84:066106, Dec 2011. doi: 10.1103/PhysRevE.84.066106. URL https://link.aps.org/doi/10.1103/PhysRevE.84.066106.
    Locate open access versionFindings
  • Leonardo Dueñas-Osorio, Kuldeep S. Meel, Roger Paredes, and Moshe Y. Vardi. Countingbased reliability estimation for power-transmission grids. In AAAI, 2017.
    Google ScholarLocate open access versionFindings
  • Stefano Ermon, Carla Gomes, Ashish Sabharwal, and Bart Selman. Taming the curse of dimensionality: Discrete integration by hashing and optimization. In ICML, pages 334–342, 2013.
    Google ScholarLocate open access versionFindings
  • Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, and Bart Selman. Low-density parity constraints for hashing-based discrete integration. In ICML, pages 271–279, 2014.
    Google ScholarLocate open access versionFindings
  • Matthias Fey and Jan E. Lenssen. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
    Google ScholarLocate open access versionFindings
  • Carla P. Gomes, A. Sabharwal, and B. Selman. Model counting: A new strategy for obtaining good bounds. In AAAI, pages 54–61, 2006.
    Google ScholarLocate open access versionFindings
  • Tamir Hazan and Tommi S. Jaakkola. On the partition function and random maximum aposteriori perturbations. In ICML, pages 991–998. ACM, 2012.
    Google ScholarLocate open access versionFindings
  • Nicolas Heess, Daniel Tarlow, and John Winn. Learning to pass expectation propagation messages. In NeurIPS, pages 3219–3227, 2013.
    Google ScholarLocate open access versionFindings
  • Jun-Ting Hsieh, Shengjia Zhao, Stephan Eismann, Lucia Mirabella, and Stefano Ermon. Learning neural pde solvers with convergence guarantees. ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Alexander Ivrii, Sharad Malik, Kuldeep S Meel, and Moshe Y Vardi. On computing minimal independent support and its applications to sampling and counting. Constraints, pages 1–18, 2015.
    Google ScholarLocate open access versionFindings
  • Alexander Ivrii, Sharad Malik, Kuldeep S Meel, and Moshe Y Vardi. On computing minimal independent support and its applications to sampling and counting. Constraints, 21(1):41–58, 2016.
    Google ScholarLocate open access versionFindings
  • Tommi S Jaakkola and Michael I Jordan. Variational probabilistic inference and the qmr-dt network. Journal of artificial intelligence research, 10:291–322, 1999.
    Google ScholarLocate open access versionFindings
  • Mark Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theor. Comput. Sci., 43:169–188, 1986.
    Google ScholarLocate open access versionFindings
  • Richard M. Karp, Michael Luby, and Neal Madras. Monte-carlo approximation algorithms for enumeration problems. J. Algorithms, 10:429–448, 1989.
    Google ScholarLocate open access versionFindings
  • D. P. Kingma and J. L. Ba. Adam: a method for stochastic optimization. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Frederic Koehler. Fast convergence of belief propagation to global optima: Beyond correlation decay. In NeurIPS, 2019.
    Google ScholarFindings
  • Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
    Google ScholarFindings
  • Frank R Kschischang, Brendan J Frey, and H-A Loeliger. Factor graphs and the sum-product algorithm. IEEE Trans. on information theory, 47(2):498–519, 2001.
    Google ScholarLocate open access versionFindings
  • Steffen L Lauritzen and David J Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society: Series B (Methodological), 50(2):157–194, 1988.
    Google ScholarLocate open access versionFindings
  • Marc Mézard, Giorgio Parisi, and Riccardo Zecchina. Analytic and algorithmic solution of random satisfiability problems. Science, 297(5582):812–815, 2002.
    Google ScholarLocate open access versionFindings
  • Joris M. Mooij. libDAI: A free and open source C++ library for discrete approximate inference in graphical models. JMLR, 11:2169–2173, August 2010. URL http://www.jmlr.org/papers/volume11/mooij10a/mooij10a.pdf.
    Locate open access versionFindings
  • Ryuhei Mori. New understanding of the bethe approximation and the replica method. arXiv preprint arXiv:1303.2168, 2013.
    Findings
  • Christian Muise, Sheila A. McIlraith, J. Christopher Beck, and Eric Hsu. DSHARP: Fast dDNNF Compilation with sharpSAT. In Canadian Conference on Artificial Intelligence, 2012.
    Google ScholarLocate open access versionFindings
  • Art B. Owen. Monte carlo theory, methods and examples, 2013.
    Google ScholarFindings
  • Marcelo Prates, Pedro HC Avelar, Henrique Lemos, Luis C Lamb, and Moshe Y Vardi. Learning to solve NP-complete problems: A graph neural network for decision TSP. In AAAI, volume 33, pages 4731–4738, 2019.
    Google ScholarLocate open access versionFindings
  • Dan Roth. On the hardness of approximate reasoning. In IJCAI, 1993.
    Google ScholarLocate open access versionFindings
  • Nicholas Ruozzi. The bethe partition function of log-supermodular graphical models. In NeurIPS, 2012.
    Google ScholarLocate open access versionFindings
  • Victor Garcia Satorras and Max Welling. Neural enhanced belief propagation on factor graphs. arXiv preprint arXiv:2003.01998, 2020.
    Findings
  • Daniel Selsam, Matthew Lamm, Benedikt Bünz, Percy Liang, Leonardo de Moura, and David L Dill. Learning a SAT solver from single-bit supervision. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Mate Soos and Kuldeep S. Meel. Bird: Engineering an efficient cnf-xor sat solver and its applications to approximate model counting. In AAAI, 1 2019.
    Google ScholarLocate open access versionFindings
  • Mate Soos, Karsten Nohl, and Claude Castelluccia. Extending SAT solvers to cryptographic problems. In SAT, 2009.
    Google ScholarLocate open access versionFindings
  • Larry J. Stockmeyer. The complexity of approximate counting. In STOC ’83, 1983.
    Google ScholarLocate open access versionFindings
  • L.G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410–421, 1979.
    Google ScholarLocate open access versionFindings
  • Martin J Wainwright, Michael I Jordan, et al. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 1(1–2):1–305, 2008.
    Google ScholarLocate open access versionFindings
  • Boris Weisfeiler and Andrei A Lehman. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsia, 2(9):12–16, 1968.
    Google ScholarLocate open access versionFindings
  • Sam Wiseman and Yoon Kim. Amortized bethe free energy minimization for learning mrfs. In NeurIPS, pages 15520–15531, 2019.
    Google ScholarLocate open access versionFindings
  • Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Jonathan S Yedidia, William T Freeman, and Yair Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. on information theory, 51 (7):2282–2312, 2005.
    Google ScholarLocate open access versionFindings
  • KiJung Yoon, Renjie Liao, Yuwen Xiong, Lisa Zhang, Ethan Fetaya, Raquel Urtasun, Richard S. Zemel, and Xaq Pitkow. Inference in probabilistic graphical models by graph neural networks. ArXiv, abs/1803.07710, 2018. (2) Ruozzi [41][p.8] prove in Corollary 4.2 that the Bethe approximation at any fixed point of BP is a lower bound on the partition function for factor graphs with binary variables and log-supermodular potential functions. so it follows that the Bethe approximation at any fixed point of BPNN-D lower bounds the partition function.
    Findings
  • 1. There exist bijections6 fF: [M ] → [M ] and fV: [N ] → [N ] such that G(Aai) = G′(Abj) for all a ∈ [M ] and i ∈ [N ], where b = fF (a) and j = fV (i).
    Google ScholarLocate open access versionFindings
  • 2. There exists a bijection for every factor, faidx: {1,..., |G(Faidx)|} → {1,..., |G′(Fbidx)|} ∀a ∈ [M ], (8)
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments