AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Factor graph grammars are a powerful way of defining probabilistic models that permits practical inference

Factor Graph Grammars

NIPS 2020, (2020)

Cited by: 0|Views9
EI
Full Text
Bibtex
Weibo

Abstract

We propose the use of hyperedge replacement graph grammars for factor graphs, or factor graph grammars (FGGs) for short. FGGs generate sets of factor graphs and can describe a more general class of models than plate notation, dynamic graphical models, case-factor diagrams, and sum-product networks can. Moreover, inference can be done on...More

Code:

Data:

0
Introduction
  • Graphs have been used with great success as representations of probability models, both Bayesian and Markov networks (Koller and Friedman, 2009) as well as latent-variable neural networks (Schulman et al, 2015).
  • The authors show that if a FGG generates a finite set, it can be converted to a single factor graph, to which standard graphical model inference methods can be applied (§5.2).
  • If H is a factor graph, define an assignment ξ of H to be a mapping from nodes to values: ξ(v) ∈ Ω(v).
Highlights
  • Graphs have been used with great success as representations of probability models, both Bayesian and Markov networks (Koller and Friedman, 2009) as well as latent-variable neural networks (Schulman et al, 2015)
  • We show that hyperedge replacement graph grammars (HRGs) for factor graphs, or factor graph grammars (FGGs) for short, are expressive enough to solve both the repeated-substructure and alternative-substructure problems, and constrained enough allow exact and tractable inference in many situations
  • We show that if a FGG generates a finite set, it can be converted to a single factor graph, to which standard graphical model inference methods can be applied (§5.2)
  • Let LV be a finite set of node labels and LE be a finite set of edge labels, and assume there is a function type : LE → (LV )∗, which says for each edge label what the number and labels of the endpoint nodes must be
  • Factor graph grammars are a powerful way of defining probabilistic models that permits practical inference
  • We will explore techniques for optimizing inference in FGGs, for example, by automatically modifying rules to reduce their treewidth (Bilmes, 2010) or reducing the cost of matrix inversions in Theorem 15 (Nederhof and Satta, 2008). Another important direction for future work is the development of approximate inference algorithms for FGGs
Results
  • A left-hand side X is formally just a nonterminal symbol, the authors draw it as a hyperedge labeled X inside, with replicas of the external nodes as its endpoints.
  • The graphs generated by a FGG can be viewed, together with Ω and F , as factor graphs, each of which defines a distribution over assignments.
  • The authors can constrain the W variables to an observed string w using another FGG, Gw, which has the same variables as G but different factors; its nonterminal edges are the same as G but with different labels.
  • It combines the factors and nonterminal labels of G and Gw and generates just one graph, the HMM for string w.
  • It follows that a HRG whose right-hand sides have at most (k + 1) nodes generates graphs with treewidth at most k (Bodlaender, 1998, Theorem 37).
  • If a FGG G generates a graph H, computing the sum–product of H by variable elimination (VE) takes time linear in the size of H and exponential in k.
  • For each right-hand side R = (V, EN ∪ ET , att, labV , labE, ext), where EN contains only nonterminal edges and ET contains only terminal edges, and for each ξ ∈ ΞX , add the equation τR(ξ) =
  • Construct a directed graph over nonterminals with an edge from X to Y iff there is a rule X → R where R contains an edge labeled Y .
  • (Example 31 in Appendix D shows an example of this construction for a toy FGG.) First, the authors add binary variables (with label B where Ω(B) = {true, false}) that switch on or off parts of the factor graph (somewhat like the gates of Minka and Winn (2008)).
Conclusion
  • Let G be a FGG whose variable domains are N and whose factors only use the successor relation and equality with zero.
  • Another important direction for future work is the development of approximate inference algorithms for FGGs. This research is of potential benefit to anyone working with structured probability models, including latent-variable neural networks.
  • As this research is purely theoretical, the authors are not aware of any direct negative impacts
Funding
  • We also thank Antonis Anastasopoulos, Justin DeBenedetto, Wes Filardo, Chung-Chieh Shan, and Xing Jie Zhong for their feedback. This material is based upon work supported by the National Science Foundation under Grant No 2019291
Study subjects and analysis
cases: 3
The Viterbi (max–product) semiring would find the highest-weight derivation and assignment, not necessarily the highest-weight graph and assignment, which is NP-hard (Lyngsø and Pedersen, 2002). We consider three cases below: finite variable domains, but possibly infinite graph languages (§5.1); finite graph languages, but possibly infinite variable domains (§5.2); and infinite variable domains and graph languages (§5.3). To help characterize these cases and their subcases, we introduce the following definitions

cases: 3
Solving the equations could involve inverting a matrix of this size, which takes O(|G|3m3(k+1)) time. If G is nonlinearly recursive, any of the three cases may apply. For case (3), each iteration takes O(|G|mk+1) time (fixed-point iteration method) or O(|G|3m3(k+1)) time (Newton’s method), but the number of iterations depends on G

Reference
  • Michel Bauderon and Bruno Courcelle. 1987. Graph expressions and graph rewriting. Mathematical Systems Theory, 20:83–127.
    Google ScholarLocate open access versionFindings
  • Jeff Bilmes. 2010. Dynamic graphical models. IEEE Signal Processing Magazine, 27(6):29–42.
    Google ScholarLocate open access versionFindings
  • Jeff Bilmes and Chris Bartels. 200On triangulating dynamic graphical models. In Proc. UAI, pages 47–56.
    Google ScholarLocate open access versionFindings
  • Hans L. Bodlaender. 1993. A linear time algorithm for finding tree-decompositions of small treewidth. In Proc. STOC, pages 226–234.
    Google ScholarLocate open access versionFindings
  • Hans L. Bodlaender. 1998. A partial k-arboretum of graphs with bounded treewidth. Theoretical Computer Science, 209:1–54.
    Google ScholarLocate open access versionFindings
  • Wray L. Buntine. 1994. Operations for learning with graphical models. J. Artificial Intelligence Research, 2:159–225.
    Google ScholarLocate open access versionFindings
  • David Chiang, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, Bevan Jones, and Kevin Knight. 2013. Parsing graphs with hyperedge replacement grammars. In Proc. ACL, volume 1, pages 924–932.
    Google ScholarLocate open access versionFindings
  • Shay B. Cohen, Robert J. Simmons, and Noah A. Smith. 2011. Products of weighted logic programs. Theory and Practice of Logic Programming, 11(2–3):263–296.
    Google ScholarLocate open access versionFindings
  • H. Comon, M. Dauchet, R. Gilleron, C. Loding, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. 2007. Tree automata techniques and applications. Release October, 12th 2007.
    Google ScholarFindings
  • Frank Drewes, Hans-Jorg Kreowski, and Annegret Habel. 1997. Hyperedge replacement graph grammars. In Grzegorz Rozenberg, editor, Handbook of Graph Grammars and Computing by Graph Transformation, pages 95–162. World Scientific.
    Google ScholarLocate open access versionFindings
  • Markus Dreyer and Jason Eisner. 2009. Graphical models over multiple strings. In Proc. EMNLP, pages 101–110.
    Google ScholarLocate open access versionFindings
  • Jason Eisner. 2002. Parameter estimation for probabilistic finite-state transducers. In Proc. ACL, pages 1–8.
    Google ScholarLocate open access versionFindings
  • Lise Getoor, Nir Friedman, Daphne Koller, Avi Pfeffer, and Ben Taskar. 2007. Probabilistic relational models. In Lise Getoor and Ben Taskar, editors, Introduction to Statistical Relational Learning, pages 129–174. MIT Press.
    Google ScholarLocate open access versionFindings
  • Daniel Gildea. 2011. Grammar factorization by tree decomposition. Computational Linguistics, 37(1):231–248.
    Google ScholarLocate open access versionFindings
  • Joshua Goodman. 1999. Semiring parsing. Computational Linguistics, 25(4):573–606.
    Google ScholarLocate open access versionFindings
  • Annegret Habel and Hans-Jorg Kreowski. 1987. May we introduce to you: Hyperedge replacement. In Proc. Third International Workshop on Graph Grammars and Their Application to Computer Science, volume 291 of Lecture Notes in Computer Science, pages 15–26. Springer.
    Google ScholarLocate open access versionFindings
  • Bevan Jones, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, and Kevin Knight. 2012. Semanticsbased machine translation with hyperedge replacement grammars. In Proc. COLING, pages 1359–1376.
    Google ScholarLocate open access versionFindings
  • Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press.
    Google ScholarFindings
  • Frank R. Kschischang, Brendan J. Frey, and Hans-Andrea Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Trans. Information Theory, 47(2):498–519.
    Google ScholarLocate open access versionFindings
  • Rune B. Lyngsø and Christian N. S. Pedersen. 2002. The consensus string problem and the complexity of comparing hidden Markov models. J. Computer and System Sciences, 65:545–569.
    Google ScholarLocate open access versionFindings
  • David McAllester, Michael Collins, and Fernando Pereira. 2008. Case-factor diagrams for structured probabilistic modeling. J. Computer and System Sciences, 74(1):84–96.
    Google ScholarLocate open access versionFindings
  • Mazen Melibari, Pascal Poupart, Prashant Doshi, and George Trimponias. 2016. Dynamic sum product networks for tractable inference on sequence data. In Proc. International Conference on Probabilistic Graphical Models, pages 345–355.
    Google ScholarLocate open access versionFindings
  • Tom Minka and John Winn. 2008. Gates. In Proc. NeurIPS, pages 1073–1080. Mark-Jan Nederhof and Giorgio Satta. 2008. Computing partition functions of PCFGs. Research on
    Google ScholarLocate open access versionFindings
  • Noah Goodman. 2019. Tensor variable elimination for plated factor graphs. In Proc. ICML, pages 4871–4880. Hoifung Poon and Pedro Domingos. 2011. Sum-product networks: A new deep architecture. In Proc. UAI, pages 337–346.
    Google ScholarLocate open access versionFindings
  • David V. Pynadath and Michael P. Wellman. 1998. Generalized queries on probabilistic context-free grammars. Trans. Pattern Analysis and Machine Intelligence, 20(1):65–77. John Schulman, Nicolas Heess, Theophane Weber, and Pieter Abbeel. 2015. Gradient estimation using stochastic computation graphs. In Proc. NeurIPS. David A. Smith and Jason Eisner. 2008. Dependency parsing by belief propagation. In Proc. EMNLP, pages 145–156. Andreas Stolcke. 1995. An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics, 21(2):165–201. Andreas Stuhlmuller and Noah D. Goodman. 2012. A dynamic programming algorithm for inference in recursive probabilistic programs. In Proc. International Workshop on Statistical Relational AI (StarAI). Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, and Frank Wood. 2018. An introduction to probabilistic programming. ArXiv:1809.10756.
    Findings
  • Plate diagrams are extensions of graphs that describe repeated structure in Bayesian networks (Buntine, 1994) or factor graphs (Obermeyer et al., 2019). A plate is a subset of variables/factors, together with a count M, indicating that the variables/factors inside the plate are to be replicated M times. But there cannot be edges between different instances of a plate. Definition 21. A plated factor graph or PFG (Obermeyer et al., 2019) is a factor graph H = (V, E) together with a finite set B of plates and a function P: V ∪ E → 2B that assigns each variable and factor to a set of plates. If b ∈ P (v) and e is incident to v, then b ∈ P (e).
    Google ScholarLocate open access versionFindings
  • Obermeyer et al. (2019) give an algorithm for computing the sum–product of a PFG. It only succeeds on some PFGs. An example for which it fails is the set of all restricted Boltzmann machines (fullyconnected bipartite graphs); one of their main results is to characterize the PFGs for which their algorithm succeeds. Below, we show how to convert these PFGs to FGGs. Proposition 22. Let H be a PFG. If the sum–product algorithm of Obermeyer et al. (2019) succeeds on H, then there is a FGG G such that for any M: B → N, there is a FGG GM such that G GM generates one graph, namely the unrolling of H by M.
    Google ScholarLocate open access versionFindings
  • Proof. We just describe how to construct G GM directly; hopefully, it should be clear how to construct G and GM separately (G has factors but not counts; GM has counts but not factors). Algorithm 1 converts H and M to G GM. It has the same structure as the sum–product algorithm of Obermeyer et al. (2019) and therefore works on the same class of PFGs.
    Google ScholarLocate open access versionFindings
  • If the algorithm of Obermeyer et al. (2019) fails on a PFG, there might not be an equivalent FGG. In particular, FGGs cannot generate the set of RBMs, because a m × n RBM has treewidth min(m, n), so the set of all RBMs has unbounded treewidth and can’t be generated by a HRG.
    Google ScholarLocate open access versionFindings
  • Example 23. The following PFG is from Obermeyer et al. (2019): F
    Google ScholarFindings
  • B.2 Dynamic graphical models For simplicity, we only consider binary factors, which we draw as directed edges, and we ignore edge labels. Definition 24. A dynamic graphical model or DGM (Bilmes, 2010) is a tuple (H1, H2, H3, E12, E22, E23), where the Hi = (Vi, Ei) are factor graphs and the Eij ⊆ Vi × Vj are sets of edges from Hi to Hj. A DGM specifies how to construct, for any length n ≥ 2, a factor graph
    Google ScholarFindings
  • Bilmes (2010) give the following example of a DGM. All factors have two endpoints, and we draw them as directed edges instead of the usual squares. We draw the edges in E22 with dotted lines.
    Google ScholarFindings
  • Running the algorithm of Theorem 15 would not be guaranteed to achieve the same time complexity as that of (Bilmes and Bartels, 2003), which searches through alternative ways of dividing the unrolled factor graph into time slices.
    Google ScholarFindings
  • Case–factor diagrams (McAllester et al., 2008) and sum–product networks (Poon and Domingos, 2011) are compact representations of probability distributions over assignments to Boolean variables. They generalize both Markov networks and PCFGs. Both formalisms represent models as rooted directed acyclic graphs (DAGs), with edges directed away from the root, in which some nodes mention variables. If D is a DAG, for any node v ∈ D, let scope(v) be the set of variables mentioned in v or any descendant of v.
    Google ScholarFindings
  • Further variations of SPNs have been proposed, in particular to generate repeated substructures (Stuhlmuller and Goodman, 2012; Melibari et al., 2016). Factored SPNs (Stuhlmuller and Goodman, 2012) are especially closely related to FGGs, in that they allow one part of a SPN to “reference” another, which is analogous to a nonterminal-labeled edge in a FGG.
    Google ScholarLocate open access versionFindings
  • CFDs and SPNs present a rather different, lower-level view of a model than the other formalisms surveyed here do. Whereas factor graphs and the other formalisms represent the model’s variables and the dependencies among them, CFDs and SPNs (including factored SPNs) represent the computation of the sum-product. For instance, converting a factor graph H to a CFD or SPN requires forming a tree decomposition of H (McAllester et al., 2008), and the resulting CFD/SPN’s structure is that of the tree decomposition, not of H.
    Google ScholarFindings
  • The width of a tree decomposition is maxB |VB| − 1, and the treewidth of H is the minimum width of any tree decomposition of H. A tree decomposition can always be made to have at most n nodes without changing its width (Bodlaender, 1993).
    Google ScholarFindings
  • Chiang et al. (2013) give a parsing algorithm for HRGs that matches right-hand sides incrementally using their tree decompositions. They observe that this is related to the concept of binarization of context-free grammars. Here, we make this connection explicit by showing how to factorize a HRG.
    Google ScholarFindings
Author
David Chiang
David Chiang
Darcey Riley
Darcey Riley
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科