# Belief Propagation Neural Networks

NeurIPS, 2020.

EI

Weibo:

Abstract:

Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that ...More

Code:

Data:

Introduction

- Probabilistic inference problems arise in many domains, from statistical physics to machine learning.
- The authors introduce belief propagation neural networks (BPNNs), a flexible neural architecture designed to estimate the partition function of a factor graph.
- Like BP, BPNN-D is guaranteed to converge on tree structured factor graphs and return the exact partition function.
- BPNN-B performs regression from the trajectory of beliefs to the partition function of the input factor graph
- While this sacrifices some guarantees, the additional flexibility introduced by BPNN-B generally improves estimation performance.
- Factor nodes and variables nodes are connected if and only if the variable is in the scope of the factor

Highlights

- Probabilistic inference problems arise in many domains, from statistical physics to machine learning
- belief propagation neural networks (BPNNs) are composed of iterative layers (BPNND) and an optional Bethe free energy layer (BPNN-B), both of which maintain the symmetries of Belief Propagation (BP) under factor graph isomorphisms
- Our BPNN significantly outperforms loopy belief propagation, both for test data drawn from the training distribution and for out of distribution data
- As shown in Table 1, BPNN provides the best estimates for the partition function
- We introduced belief propagation neural networks, a strict generalization of BP that learns to find better fixed points faster
- We empirically demonstrated that BPNNs can learn from tiny data sets containing only 10s of training points and generalize to test data drawn from a different distribution than seen during training

Methods

- In the experiments the authors trained BPNN to estimate the partition function of factor graphs from a variety of domains.
- Experiments on synthetic Ising models show that BPNN-D can learn to find better fixed points than BP and converge faster.
- BPNN generalizes to Ising models with nearly twice as many variables as those seen during training and that were sampled from a different distribution.
- The authors refer the reader to Appendix B.2 for details on the GNN

Results

- As shown in Table 1, BPNN provides the best estimates for the partition function. Critically, the authors see that not ’double counting’ messages and preserving the symmetries of BP are key improvements of BPNN over GNN.
- The computational complexity of exact model counting has led to a significant body of work on approximate model counting [46, 27, 28, 8, 20, 18, 24, 3, 5, 44], with the goal of estimating the number of satisfying solutions at a lower computational cost

Conclusion

- The authors introduced belief propagation neural networks, a strict generalization of BP that learns to find better fixed points faster.
- The authors empirically demonstrated that BPNNs can learn from tiny data sets containing only 10s of training points and generalize to test data drawn from a different distribution than seen during training.
- BPNNs significantly outperform loopy belief propagation and standard graph neural networks in terms of accuracy.
- BPNNs provide excellent computational efficiency, running orders of magnitudes faster than stateof-the-art randomized hashing algorithms while maintaining comparable accuracy.

Summary

## Introduction:

Probabilistic inference problems arise in many domains, from statistical physics to machine learning.- The authors introduce belief propagation neural networks (BPNNs), a flexible neural architecture designed to estimate the partition function of a factor graph.
- Like BP, BPNN-D is guaranteed to converge on tree structured factor graphs and return the exact partition function.
- BPNN-B performs regression from the trajectory of beliefs to the partition function of the input factor graph
- While this sacrifices some guarantees, the additional flexibility introduced by BPNN-B generally improves estimation performance.
- Factor nodes and variables nodes are connected if and only if the variable is in the scope of the factor
## Methods:

In the experiments the authors trained BPNN to estimate the partition function of factor graphs from a variety of domains.- Experiments on synthetic Ising models show that BPNN-D can learn to find better fixed points than BP and converge faster.
- BPNN generalizes to Ising models with nearly twice as many variables as those seen during training and that were sampled from a different distribution.
- The authors refer the reader to Appendix B.2 for details on the GNN
## Results:

As shown in Table 1, BPNN provides the best estimates for the partition function. Critically, the authors see that not ’double counting’ messages and preserving the symmetries of BP are key improvements of BPNN over GNN.- The computational complexity of exact model counting has led to a significant body of work on approximate model counting [46, 27, 28, 8, 20, 18, 24, 3, 5, 44], with the goal of estimating the number of satisfying solutions at a lower computational cost
## Conclusion:

The authors introduced belief propagation neural networks, a strict generalization of BP that learns to find better fixed points faster.- The authors empirically demonstrated that BPNNs can learn from tiny data sets containing only 10s of training points and generalize to test data drawn from a different distribution than seen during training.
- BPNNs significantly outperform loopy belief propagation and standard graph neural networks in terms of accuracy.
- BPNNs provide excellent computational efficiency, running orders of magnitudes faster than stateof-the-art randomized hashing algorithms while maintaining comparable accuracy.

- Table1: RMSE of SBM ln(Z) estimates. BPNN outperforms BP, GNN, and ablated versions of BPNN
- Table2: RMSE of BPNN for each training/validation set, along with ablation results. BPNN corresponds to a model with 5 BPNN-D layers followed by a Bethe layer that is invariant to the factor graph representation. BPNN-NI corresponds to removing invariance from the Bethe layer. BPNN-DC corresponds to performing ’double counting’ as is standard for GNN, rather than subtracting previously sent messages as is standard for BP. ’Random Split’ rows show that BPNNs are capable of learning a distribution from a tiny dataset of only 10s of training problems. ‘Easy / Hard’ rows additionally show that BPNNs are able to generalize from simple training problems to significantly more complex validation problems
- Table3: Root mean squared error (RMSE) of estimates of the natural logarithm of the number of satisfying solutions is shown. The fraction of benchmarks within each category that each approximate counter was able to complete within the time limit of 5k seconds is shown in parentheses
- Table4: Runtime percentiles (in seconds) are shown for DSharp, ApproxMC3, and F2. Percentiles are computed separately for each category’s training dataset. In comparison, BPNN sequential runtime is nearly a constant and BPNN parallel runtime is limited by GPU memory
- Table5: RMSE of ln(Z) of BPNN against BP and GNN for SBM’s generated from different distributions and larger graphs than the training or validation set. We see that BPNN outperforms both methods here across different edge probabilities, class probabilities, and on larger graphs. Furthermore, it generalizes better than GNN in all these settings

Funding

- Research supported by NSF (#1651565, #1522054, #1733686), ONR (N00014-19-1-2145), AFOSR (FA955019-1-0024), and FLI

Reference

- Emmanuel Abbe. Community detection and stochastic block models: Recent developments. Journal of Machine Learning Research, 18(177):1–86, 2018. URL http://jmlr.org/papers/v18/16-480.html.
- Ralph Abboud, Ismail Ilkan Ceylan, and Thomas Lukasiewicz. Learning to reason: Leveraging neural networks for approximate dnf counting. AAAI, 2020.
- Dimitris Achlioptas and Pei Jiang. Stochastic integration via error-correcting codes. In Proc. Uncertainty in Artificial Intelligence, 2015.
- Dimitris Achlioptas and Panos Theodoropoulos. Probabilistic model counting with short XORs. In SAT, 2017.
- Dimitris Achlioptas, Zayd Hammoudeh, and Panos Theodoropoulos. Fast and flexible probabilistic model counting. In SAT, pages 148–164, 2018.
- Rodney J Baxter. Exactly solved models in statistical mechanics. Elsevier, 2016.
- Jens Behrmann, Will Grathwohl, Ricky TQ Chen, David Duvenaud, and Jörn-Henrik Jacobsen. Invertible residual networks. In International Conference on Machine Learning, pages 573– 582, 2019.
- Mihir Bellare, Oded Goldreich, and Erez Petrank. Uniform generation of np-witnesses using an np-oracle. Electronic Colloquium on Computational Complexity (ECCC), 5, 1998.
- Vaishak Belle, Guy Van den Broeck, and Andrea Passerini. Hashing-based approximate probabilistic inference in hybrid domains. In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI), 2015.
- Hans A Bethe. Statistical theory of superlattices. Proceedings of the Royal Society of London. Series A-Mathematical and Physical Sciences, 150(871):552–575, 1935.
- Fabrizio Biondi, Michael A. Enescu, Annelie Heuser, Axel Legay, Kuldeep S. Meel, and Jean Quilbeuf. Scalable approximation of quantitative information flow in programs. In VMCAI, 2018.
- Supratik Chakraborty, Kuldeep S. Meel, and Moshe Y. Vardi. Algorithmic improvements in approximate counting for probabilistic inference: From linear to logarithmic SAT calls. In IJCAI, 7 2016.
- David Chandler. Introduction to modern statistical mechanics. Oxford University Press, Oxford, UK, 1987.
- Zhengdao Chen, Lisha Li, and Joan Bruna. Supervised community detection with line graph neural networks. ICLR, 2019.
- Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E, 84:066106, Dec 2011. doi: 10.1103/PhysRevE.84.066106. URL https://link.aps.org/doi/10.1103/PhysRevE.84.066106.
- Leonardo Dueñas-Osorio, Kuldeep S. Meel, Roger Paredes, and Moshe Y. Vardi. Countingbased reliability estimation for power-transmission grids. In AAAI, 2017.
- Stefano Ermon, Carla Gomes, Ashish Sabharwal, and Bart Selman. Taming the curse of dimensionality: Discrete integration by hashing and optimization. In ICML, pages 334–342, 2013.
- Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, and Bart Selman. Low-density parity constraints for hashing-based discrete integration. In ICML, pages 271–279, 2014.
- Matthias Fey and Jan E. Lenssen. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
- Carla P. Gomes, A. Sabharwal, and B. Selman. Model counting: A new strategy for obtaining good bounds. In AAAI, pages 54–61, 2006.
- Tamir Hazan and Tommi S. Jaakkola. On the partition function and random maximum aposteriori perturbations. In ICML, pages 991–998. ACM, 2012.
- Nicolas Heess, Daniel Tarlow, and John Winn. Learning to pass expectation propagation messages. In NeurIPS, pages 3219–3227, 2013.
- Jun-Ting Hsieh, Shengjia Zhao, Stephan Eismann, Lucia Mirabella, and Stefano Ermon. Learning neural pde solvers with convergence guarantees. ICLR, 2019.
- Alexander Ivrii, Sharad Malik, Kuldeep S Meel, and Moshe Y Vardi. On computing minimal independent support and its applications to sampling and counting. Constraints, pages 1–18, 2015.
- Alexander Ivrii, Sharad Malik, Kuldeep S Meel, and Moshe Y Vardi. On computing minimal independent support and its applications to sampling and counting. Constraints, 21(1):41–58, 2016.
- Tommi S Jaakkola and Michael I Jordan. Variational probabilistic inference and the qmr-dt network. Journal of artificial intelligence research, 10:291–322, 1999.
- Mark Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theor. Comput. Sci., 43:169–188, 1986.
- Richard M. Karp, Michael Luby, and Neal Madras. Monte-carlo approximation algorithms for enumeration problems. J. Algorithms, 10:429–448, 1989.
- D. P. Kingma and J. L. Ba. Adam: a method for stochastic optimization. In ICLR, 2015.
- Frederic Koehler. Fast convergence of belief propagation to global optima: Beyond correlation decay. In NeurIPS, 2019.
- Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
- Frank R Kschischang, Brendan J Frey, and H-A Loeliger. Factor graphs and the sum-product algorithm. IEEE Trans. on information theory, 47(2):498–519, 2001.
- Steffen L Lauritzen and David J Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society: Series B (Methodological), 50(2):157–194, 1988.
- Marc Mézard, Giorgio Parisi, and Riccardo Zecchina. Analytic and algorithmic solution of random satisfiability problems. Science, 297(5582):812–815, 2002.
- Joris M. Mooij. libDAI: A free and open source C++ library for discrete approximate inference in graphical models. JMLR, 11:2169–2173, August 2010. URL http://www.jmlr.org/papers/volume11/mooij10a/mooij10a.pdf.
- Ryuhei Mori. New understanding of the bethe approximation and the replica method. arXiv preprint arXiv:1303.2168, 2013.
- Christian Muise, Sheila A. McIlraith, J. Christopher Beck, and Eric Hsu. DSHARP: Fast dDNNF Compilation with sharpSAT. In Canadian Conference on Artificial Intelligence, 2012.
- Art B. Owen. Monte carlo theory, methods and examples, 2013.
- Marcelo Prates, Pedro HC Avelar, Henrique Lemos, Luis C Lamb, and Moshe Y Vardi. Learning to solve NP-complete problems: A graph neural network for decision TSP. In AAAI, volume 33, pages 4731–4738, 2019.
- Dan Roth. On the hardness of approximate reasoning. In IJCAI, 1993.
- Nicholas Ruozzi. The bethe partition function of log-supermodular graphical models. In NeurIPS, 2012.
- Victor Garcia Satorras and Max Welling. Neural enhanced belief propagation on factor graphs. arXiv preprint arXiv:2003.01998, 2020.
- Daniel Selsam, Matthew Lamm, Benedikt Bünz, Percy Liang, Leonardo de Moura, and David L Dill. Learning a SAT solver from single-bit supervision. In ICLR, 2018.
- Mate Soos and Kuldeep S. Meel. Bird: Engineering an efficient cnf-xor sat solver and its applications to approximate model counting. In AAAI, 1 2019.
- Mate Soos, Karsten Nohl, and Claude Castelluccia. Extending SAT solvers to cryptographic problems. In SAT, 2009.
- Larry J. Stockmeyer. The complexity of approximate counting. In STOC ’83, 1983.
- L.G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410–421, 1979.
- Martin J Wainwright, Michael I Jordan, et al. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 1(1–2):1–305, 2008.
- Boris Weisfeiler and Andrei A Lehman. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsia, 2(9):12–16, 1968.
- Sam Wiseman and Yoon Kim. Amortized bethe free energy minimization for learning mrfs. In NeurIPS, pages 15520–15531, 2019.
- Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In ICLR, 2018.
- Jonathan S Yedidia, William T Freeman, and Yair Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. on information theory, 51 (7):2282–2312, 2005.
- KiJung Yoon, Renjie Liao, Yuwen Xiong, Lisa Zhang, Ethan Fetaya, Raquel Urtasun, Richard S. Zemel, and Xaq Pitkow. Inference in probabilistic graphical models by graph neural networks. ArXiv, abs/1803.07710, 2018. (2) Ruozzi [41][p.8] prove in Corollary 4.2 that the Bethe approximation at any fixed point of BP is a lower bound on the partition function for factor graphs with binary variables and log-supermodular potential functions. so it follows that the Bethe approximation at any fixed point of BPNN-D lower bounds the partition function.
- 1. There exist bijections6 fF: [M ] → [M ] and fV: [N ] → [N ] such that G(Aai) = G′(Abj) for all a ∈ [M ] and i ∈ [N ], where b = fF (a) and j = fV (i).
- 2. There exists a bijection for every factor, faidx: {1,..., |G(Faidx)|} → {1,..., |G′(Fbidx)|} ∀a ∈ [M ], (8)

Full Text

Tags

Comments