Topological Graph Sketching for Incremental and Scalable Analytics

ACM International Conference on Information and Knowledge Management(2016)

引用 13|浏览89
暂无评分
摘要
We propose a novel, scalable, and principled graph sketching technique based on minwise hashing of local neighborhood. For an n-node graph with e-edges (e >> n), we incrementally maintain in real-time a minwise neighbor sampled subgraph using k hash functions in O(n * k) memory, limit being user-configurable by the parameter k. Symmetrization and similarity based techniques can recover from these data structures a significant portion of the original graph. We present theoretical analysis of the minwise sampling strategy and also derive unbiased estimators for important graph properties such as triangle count and neighborhood overlap. We perform an extensive empirical evaluation of our framework on a wide variety of real-world graph data sets drawn from different application domains using three fundamental large network analysis algorithms: local and global clustering coefficient, PageRank, and local graph sparsification. With bounded memory, the quality of results using the sketch representation is competitive against baselines which use the full graph, and the computational performance is significantly better. Our framework is flexible and configurable to be leveraged by numerous other graph analytics algorithms, potentially reducing the information mining time on large streamed graphs for a variety of applications.
更多
查看译文
关键词
Graph Sampling,Min-wise Hashing,Scalable Analysis Algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要