On Large-Scale Graph Generation with Validation of Diverse Triangle Statistics at Edges and Vertices.
IPDPS Workshops(2018)
摘要
Researchers developing implementations of distributed graph analytic algorithms require graph generators that yield graphs sharing the challenging characteristics of real-world graphs (small-world, scale-free, heavy-tailed degree distribution) with efficiently calculable ground-truth solutions to the desired output. Reproducibility for current generators [1] used in benchmarking are somewhat lacking in this respect due to their randomness: the output of a desired graph analytic can only be compared to expected values and not exact ground truth. Nonstochastic Kronecker product graphs [2] meet these design criteria for several graph analytics. Here we show that many flavors of triangle participation can be cheaply calculated while generating a Kronecker product graph. Given two medium-sized scale-free graphs with adjacency matrices A and B, their Kronecker product graph has adjacency matrix C = A ? B. Such graphs are highly compressible: |E| edges are represented in O(|?|^1/2) memory and can be built in a distributed setting from small data structures, making them easy to share in compressed form. Many interesting graph calculations have worst-case complexity bounds O(|?|^p) and often these are reduced to O(|?|^p/2) for Kronecker product graphs, when a Kronecker formula can be derived yielding the sought calculation on C in terms of related calculations on A and B. We focus on deriving formulas for triangle participation at vertices, tC, a vector storing the number of triangles that every vertex is involved in, and triangle participation at edges, ?_C, a sparse matrix storing the number of triangles at every edge. When factors A and B are undirected, C is also undirected. In the case when both factors have no self loops we show t_C = 2 t_A ? t_B, ?_C = ?_A ? ?_B. Moreover, we derive the respective formulas when A and B have self loops, which boosts the triangle counts for the associated vertices/edges in C. We additionally demonstrate strong assumptions on B that allow the truss decomposition of C to be derived cheaply from the truss decomposition of A. We extend these results and show Kronecker formulas for triangle participation in both directed graphs and undirected, vertex-labeled graphs. In these classes of graphs each vertex / edge can participate in many different types of triangles.
更多查看译文
关键词
graph generation,Kronecker graph,triangle counting,directed graphs,labeled graphs,truss decomposition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络