PROPAGATE: A Seed Propagation Framework to Compute Distance-Based Metrics on Very Large Graphs

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT III(2023)

引用 0|浏览1
暂无评分
摘要
We propose propagate, a fast approximation framework to estimate distance-based metrics on very large graphs such as: the (effective) diameter or the average distance within a small error. The framework assigns seeds to nodes and propagates them in a BFS-like fashion, computing the neighbors set until we obtain either the whole vertex set (for computing the diameter) or a given percentage of vertices (for the effective diameter). At each iteration, we derive compressed Boolean representations of the neighborhood sets discovered so far. The PROPAGATE framework yields two algorithms: PROPAGATE-P, which propagates all the s seeds in parallel, and PROPAGATE-S which propagates the seeds sequentially. For each node, the compressed representation of the PROPAGATE-P algorithm requires s bits while PROPAGATE-S 1 bit only. Both algorithms compute the average distance, the effective diameter, the diameter, and the connectivity rate (a measure of the sparseness degree of the transitive closure graph) within a small error with high probability: for any epsilon > 0 and using s = Theta (log n/epsilon(2)) sample nodes, the error for the average distance is bounded by xi = epsilon Delta/alpha; the errors for the effective diameter and the diameter are bounded by xi = epsilon/a; and the error for the connectivity rate is bounded by epsilon where Delta is the diameter and alpha is the connectivity rate. The time complexity of our approaches is O(Delta center dot m) for PROPAGATE-Pand O (log n/epsilon(2) center dot Delta center dot m) for PROPAGATE-S, where m is the number of edges of the graph and Delta is the diameter. The experimental results show that the propagate framework improves the current state of the art in accuracy, speed, and space. Moreover, we experimentally show that PROPAGATE is also very efficient for solving the All Pair Shortest Path problem in very large graphs.
更多
查看译文
关键词
Graph mining,shortest paths,effective diameter,sampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要