AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We focus on efficient parallel algorithms for the key machine learning task of inference on graphical models, in particular on the fundamental belief propagation algorithm

Scalable Belief Propagation via Relaxed Scheduling

NIPS 2020, (2020)

Cited by: 0|Views15
EI
Full Text
Bibtex
Weibo

Abstract

The ability to leverage large-scale hardware parallelism has been one of the key enablers of the accelerated recent progress in machine learning. Consequently, there has been considerable effort invested into developing efficient parallel variants of classic machine learning algorithms. However, despite the wealth of knowledge on parallel...More

Code:

Data:

0
Introduction
  • Hardware parallelism has been a key computational enabler for recent advances in machine learning, as it provides a way to reduce the processing time for the ever-increasing quantities of data required for training accurate models.
  • The authors will focus on efficient parallel algorithms for the fundamental task of inference on graphical models.
  • The marginalization problem is known to be computationally intractable in general [10, 32, 9], but inexact heuristics are well-studied for practical inference tasks.
  • One popular heuristic for inference on graphical models is belief propagation [27], inspired by the exact dynamic programming algorithm for marginalization on trees.
  • It remains poorly understood how to properly parallelize belief propagation
Highlights
  • Hardware parallelism has been a key computational enabler for recent advances in machine learning, as it provides a way to reduce the processing time for the ever-increasing quantities of data required for training accurate models
  • There has been considerable effort invested into developing efficient parallel variants of classic machine learning algorithms, e.g. [28, 22, 23, 24, 15]
  • We model a relaxed scheduler Qq as a data structure which stores pairs corresponding to tasks and their priorities, with the operational semantics given in Section 3
  • We model the behavior of relaxed priority-based belief propagation by investigating the number of message updates needed for convergence when the algorithm is executed sequentially using a relaxed scheduler Qq satisfying the above constraints
  • The choice of algorithm can depend on the model; one may choose Relaxed Smart splash algorithm of [16] (Splash) since it performs well on all our models
  • We have investigated the use of relaxed schedulers in the context of the classic belief propagation algorithm for inference on graphical model, and have shown that this approach leads to an efficient family of algorithms, which improve upon the previous state-of-the-art non-relaxed parallelization approaches in our experiments
Methods
  • The authors run the experiments on four MRFs of moderate size: a binary tree of size 107, an Ising model [14, 20] of size 103 ×103, a Potts [37] of size 103 ×103 and the decoding of (3, 6)-LDPC code [29] of size 3 · 105.

    For each pair of algorithm and model, the authors run each experiment five times, and average the execution time and the number of performed updates on the messages.
  • The authors run the experiments on four MRFs of moderate size: a binary tree of size 107, an Ising model [14, 20] of size 103 ×103, a Potts [37] of size 103 ×103 and the decoding of (3, 6)-LDPC code [29] of size 3 · 105.
  • For each pair of algorithm and model, the authors run each experiment five times, and average the execution time and the number of performed updates on the messages.
  • Synch Coarse-G Splash (10) RS (2) Bucket Residual Weight-Decay Priority Smart Splash (2)
Results
  • See Table 1 for the speedups versus the baseline, on 70 threads. (For ablation studies, see the full version.) On trees, the fastest algorithm is, predictably, the synchronous one, since on tree-like models with small diameter D it performs only approximately O(D) times more updates in comparison to the sequential baseline, while being almost perfectly parallelizable.
  • (For ablation studies, see the full version.) On trees, the fastest algorithm is, predictably, the synchronous one, since on tree-like models with small diameter D it performs only approximately O(D) times more updates in comparison to the sequential baseline, while being almost perfectly parallelizable.
  • It works well on perfect binary tree as the Tree model, but works much worse on the chain graphs.
  • The choice of algorithm can depend on the model; one may choose Relaxed Smart Splash since it performs well on all the models
Conclusion
  • For a more detailed analysis of the results, the authors refer to the full version of the paper.
  • The authors' worst-case instances shows that relaxed residual BP is not a “silver bullet:” there exist tree instances where it may lead to Ω(qn) message updates, i.e. asymptotically no speedup.
  • First is to extend the theoretical analysis to cover more types of instances; as the authors have seen, the structure of belief propagation schedules can be quite complicated, and the challenge is the figure out a proper framework for more general analysis.
Summary
  • Introduction:

    Hardware parallelism has been a key computational enabler for recent advances in machine learning, as it provides a way to reduce the processing time for the ever-increasing quantities of data required for training accurate models.
  • The authors will focus on efficient parallel algorithms for the fundamental task of inference on graphical models.
  • The marginalization problem is known to be computationally intractable in general [10, 32, 9], but inexact heuristics are well-studied for practical inference tasks.
  • One popular heuristic for inference on graphical models is belief propagation [27], inspired by the exact dynamic programming algorithm for marginalization on trees.
  • It remains poorly understood how to properly parallelize belief propagation
  • Methods:

    The authors run the experiments on four MRFs of moderate size: a binary tree of size 107, an Ising model [14, 20] of size 103 ×103, a Potts [37] of size 103 ×103 and the decoding of (3, 6)-LDPC code [29] of size 3 · 105.

    For each pair of algorithm and model, the authors run each experiment five times, and average the execution time and the number of performed updates on the messages.
  • The authors run the experiments on four MRFs of moderate size: a binary tree of size 107, an Ising model [14, 20] of size 103 ×103, a Potts [37] of size 103 ×103 and the decoding of (3, 6)-LDPC code [29] of size 3 · 105.
  • For each pair of algorithm and model, the authors run each experiment five times, and average the execution time and the number of performed updates on the messages.
  • Synch Coarse-G Splash (10) RS (2) Bucket Residual Weight-Decay Priority Smart Splash (2)
  • Results:

    See Table 1 for the speedups versus the baseline, on 70 threads. (For ablation studies, see the full version.) On trees, the fastest algorithm is, predictably, the synchronous one, since on tree-like models with small diameter D it performs only approximately O(D) times more updates in comparison to the sequential baseline, while being almost perfectly parallelizable.
  • (For ablation studies, see the full version.) On trees, the fastest algorithm is, predictably, the synchronous one, since on tree-like models with small diameter D it performs only approximately O(D) times more updates in comparison to the sequential baseline, while being almost perfectly parallelizable.
  • It works well on perfect binary tree as the Tree model, but works much worse on the chain graphs.
  • The choice of algorithm can depend on the model; one may choose Relaxed Smart Splash since it performs well on all the models
  • Conclusion:

    For a more detailed analysis of the results, the authors refer to the full version of the paper.
  • The authors' worst-case instances shows that relaxed residual BP is not a “silver bullet:” there exist tree instances where it may lead to Ω(qn) message updates, i.e. asymptotically no speedup.
  • First is to extend the theoretical analysis to cover more types of instances; as the authors have seen, the structure of belief propagation schedules can be quite complicated, and the challenge is the figure out a proper framework for more general analysis.
Tables
  • Table1: Algorithm speedups with respect to the sequential residual algorithm. Higher is better
  • Table2: Total updates relative to the sequential residual algorithm at 70 threads. Lower is better
Related work
  • 2.1 Belief Propagation

    We consider marginalization in pairwise Markov random fields; one can equivalently consider factor graphs or Bayesian networks [39]. A pairwise Markov random field is defined by a set of random variables X1, X2, . . . , Xn, a graph G = (V, E) with V = {1, 2, . . . , n}, and a set of factors ψi : Di → R+ ψij : Di × Dj → R+

    for i ∈ V , for {i, j} ∈ E, where Di denotes the domain of random variable Xi. The edge factors ψij represent the dependencies between the random variables, and the node factors ψi represent a priori information about the individual random variables; the Markov random field defines a joint probability distribution on

    X = (X1, X2, . . . , Xn) as

    Pr X = x ∝ ψi(xi) ψij(xi, xj) , i ij where the ‘proportional to’ notation ∝ hides the normalization constant applied to the right-hand side to obtain a probability distribution. The marginalization problem is to compute the probabilities Pr[Xi = x] for a specified subset of variables; for convenience, we assume that any observations regarding the values of other variables are encoded in the node factor functions ψi.

    Belief propagation is a message-passing algorithm; for each ordered pair (i, j) such that {i, j} ∈ E, we maintain a message μi→j : Dj → R, and the algorithm iteratively updates these messages until the values (approximately) converge to a fixed point. On Markov random fields, the message update rule gives the new value of message μi→j as a function of the old messages directed to node i by μi→j (xj ) ∝
Funding
  • This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML) and from Government of Russian Federation (Grant 08-08)
Reference
  • Dan Alistarh, Trevor Brown, Justin Kopinsky, Jerry Z. Li, and Giorgi Nadiradze. Distributionally linearizable data structures. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, SPAA ’18, pages 133–142, New York, NY, USA, 2018. ACM.
    Google ScholarLocate open access versionFindings
  • Dan Alistarh, Trevor Brown, Justin Kopinsky, and Giorgi Nadiradze. Relaxed schedulers can efficiently parallelize iterative algorithms. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, PODC ’18, pages 377–386, New York, NY, USA, 2018. ACM.
    Google ScholarLocate open access versionFindings
  • Dan Alistarh, Justin Kopinsky, Jerry Li, and Giorgi Nadiradze. The power of choice in priority scheduling. In Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC ’17, pages 283–292, New York, NY, USA, 2017. ACM.
    Google ScholarLocate open access versionFindings
  • Dan Alistarh, Justin Kopinsky, Jerry Li, and Nir Shavit. The SprayList: A scalable relaxed priority queue. In 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, USA, 2015. ACM.
    Google ScholarFindings
  • Dan Alistarh, Giorgi Nadiradze, and Nikita Koval. Efficiency guarantees for parallel incremental algorithms under relaxed schedulers. In The 31st ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’19, page 145–154, New York, NY, USA, 2019. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • Dmitry Basin, Rui Fan, Idit Keidar, Ofer Kiselov, and Dmitri Perelman. CAFE: Scalable task pools with adjustable fairness and contention. In Proceedings of the 25th International Conference on Distributed Computing, DISC’11, pages 475–488, Berlin, Heidelberg, 2011. Springer-Verlag.
    Google ScholarLocate open access versionFindings
  • Guy E Blelloch, Yan Gu, Julian Shun, and Yihan Sun. Parallelism in randomized incremental algorithms. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, pages 467–478. ACM, 2016.
    Google ScholarLocate open access versionFindings
  • Andres I Vila Casado, Miguel Griot, and Richard D Wesel. Informed dynamic scheduling for belief-propagation decoding of LDPC codes. In 2007 IEEE International Conference on Communications, pages 932–937. IEEE, 2007.
    Google ScholarLocate open access versionFindings
  • Gregory F. Cooper. The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 42(2):393–405, 1990.
    Google ScholarLocate open access versionFindings
  • Paul Dagum and Michael Luby. Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artificial Intelligence, 60(1):141–153, 1993.
    Google ScholarLocate open access versionFindings
  • Mark Van der Merwe, Vinu Joseph, and Ganesh Gopalakrishnan. Message scheduling for performant, many-core belief propagation. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC 2019), 2019.
    Google ScholarLocate open access versionFindings
  • Laxman Dhulipala, Guy Blelloch, and Julian Shun. Julienne: A framework for parallel graph algorithms using work-efficient bucketing. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’17, pages 293–304, New York, NY, USA, 2017. ACM.
    Google ScholarLocate open access versionFindings
  • Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. Theoretically efficient parallel graph algorithms can be fast and scalable. In 30th on Symposium on Parallelism in Algorithms and Architectures (SPAA 2018), pages 393–404, 2018.
    Google ScholarLocate open access versionFindings
  • Gal Elidan, Ian McGraw, and Daphne Koller. Residual belief propagation: informed scheduling for asynchronous message passing. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI 2006), pages 165–173, 2006.
    Google ScholarLocate open access versionFindings
  • Joseph Gonzalez, Yucheng Low, and Carlos Guestrin. Residual splash for optimally parallelizing belief propagation. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS 2009), volume 5, pages 177–184, 2009.
    Google ScholarLocate open access versionFindings
  • Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: distributed graph-parallel computation on natural graphs. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’12), pages 17–30, 2012.
    Google ScholarLocate open access versionFindings
  • Joseph E. Gonzalez, Yucheng Low, Carlos Guestrin, and David O’Hallaron. Distributed parallel inference on large factor graphs. In Proceedings of 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), page 203–212, 2009.
    Google ScholarLocate open access versionFindings
  • Andreas Haas, Michael Lippautz, Thomas A. Henzinger, Hannes Payer, Ana Sokolova, Christoph M. Kirsch, and Ali Sezgin. Distributed queues in shared memory: multicore performance and scalability through quantitative relaxation. In Computing Frontiers Conference, CF’13, Ischia, Italy, May 14 - 16, 2013, pages 17:1–17:9, 2013.
    Google ScholarLocate open access versionFindings
  • Mark C Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, and Daniel Sanchez. Unlocking ordered parallelism with the swarm architecture. IEEE Micro, 36(3):105–117, 2016.
    Google ScholarLocate open access versionFindings
  • Christian Knoll, Michael Rath, Sebastian Tschiatschek, and Franz Pernkopf. Message scheduling methods for belief propagation. In Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015), pages 295–310, 2015.
    Google ScholarLocate open access versionFindings
  • Andrew Lenharth, Donald Nguyen, and Keshav Pingali. Priority queues are not good concurrent priority schedulers. In European Conference on Parallel Processing, pages 209–221.
    Google ScholarLocate open access versionFindings
  • Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. Scaling distributed machine learning with the parameter server. In 11th USENIX conference on Operating Systems Design and Implementation (OSDI’14), page 583–598, 2014.
    Google ScholarLocate open access versionFindings
  • Ji Liu and Stephen J Wright. Asynchronous stochastic coordinate descent: Parallelism and convergence properties. SIAM Journal on Optimization, 25(1):351–376, 2015.
    Google ScholarLocate open access versionFindings
  • Yucheng Low, Joseph E. Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E. Guestrin, and Joseph Hellerstein. GraphLab: A new framework for parallel machine learning. In 26th Conference on Uncertainty in Artificial Intelligence (UAI 2010), page 340–349, 2010.
    Google ScholarLocate open access versionFindings
  • Joris M. Mooij. libDAI: A free and open source C++ library for discrete approximate inference in graphical models. Journal of Machine Learning Research, 11:2169–2173, August 2010.
    Google ScholarLocate open access versionFindings
  • Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, pages 456–471, New York, NY, USA, 2013. ACM.
    Google ScholarLocate open access versionFindings
  • Judea Pearl. Reverend Bayes on inference engines: A distributed hierarchical approach. In Proceedings of the Second AAAI Conference on Artificial Intelligence (AAAI 1982), page 133–136. AAAI Press, 1982.
    Google ScholarLocate open access versionFindings
  • Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in neural information processing systems, pages 693–701, 2011.
    Google ScholarLocate open access versionFindings
  • Thomas J. Richardson and Rudiger L. Urbanke. Modern coding theory. Cambridge university press, 2008.
    Google ScholarFindings
  • Hamza Rihani, Peter Sanders, and Roman Dementiev. Brief announcement: Multiqueues: Simple relaxed concurrent priority queues. In Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures, pages 80–82, 2015.
    Google ScholarLocate open access versionFindings
  • Hamza Rihani, Peter Sanders, and Roman Dementiev. Brief announcement: MultiQueues: Simple relaxed concurrent priority queues. In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’15, pages 80–82, New York, NY, USA, 2015. ACM.
    Google ScholarLocate open access versionFindings
  • Dan Roth. On the hardness of approximate reasoning. Artificial Intelligence, 82(1):273–302, 1996.
    Google ScholarLocate open access versionFindings
  • Adones Rukundo, Aras Atalar, and Philippas Tsigas. Monotonically Relaxing Concurrent Data-Structure Semantics for Increasing Performance: An Efficient 2D Design Framework. In 33rd International Symposium on Distributed Computing (DISC 2019), volume 146 of Leibniz International Proceedings in Informatics (LIPIcs), pages 31:1–31:15, Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
    Google ScholarLocate open access versionFindings
  • Konstantinos Sagonas and Kjell Winblad. A contention adapting approach to concurrent ordered sets. Journal of Parallel and Distributed Computing, 2017.
    Google ScholarLocate open access versionFindings
  • Nir Shavit and Itay Lotan. Skiplist-based concurrent priority queues. In Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International, pages 263–268. IEEE, 2000.
    Google ScholarLocate open access versionFindings
  • Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. Reducing contention through priority updates. In Proceedings of the Twenty-fifth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’13, pages 152–163, New York, NY, USA, 2013. ACM.
    Google ScholarLocate open access versionFindings
  • Charles Sutton and Andrew McCallum. Improved dynamic schedules for belief propagation. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI 2007), pages 376–383, 2007.
    Google ScholarLocate open access versionFindings
  • Martin Wimmer, Jakob Gruber, Jesper Larsson Traff, and Philippas Tsigas. The lock-free k-LSM relaxed priority queue. In 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015), pages 277–278, 2015.
    Google ScholarLocate open access versionFindings
  • Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Understanding belief propagation and its generalizations. In Exploring Artificial Intelligence in the New Millenium, chapter 8, pages 239—269. Morgan Kaufmann, 2003.
    Google ScholarLocate open access versionFindings
  • Jiangtao Yin and Lixin Gao. Scalable distributed belief propagation with prioritized block updates. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CKIM 2014), page 1209–1218, 2014.
    Google ScholarLocate open access versionFindings
Author
Vitalii Aksenov
Vitalii Aksenov
Janne Korhonen
Janne Korhonen
Your rating :
0

 

Tags
Comments
小科