Better Algorithms For Counting Triangles In Data Streams
SIGMOD/PODS'16: International Conference on Management of Data San Francisco California USA June, 2016(2016)
摘要
We present space-efficient data stream algorithms for approximating the number of triangles in a graph up to a factor 1 + epsilon. While it can be shown that determining whether a graph is triangle-free is not possible in sub-linear space, a large body of work has focused on minimizing the space required in terms of the number of triangles T (or a lower bound on this quantity) and other parameters including the number of nodes n and the number of edges m. Two models are important in the literature: the arbitrary order model in which the stream consists of the edges of the graph in arbitrary order and the adjacency list order model in which all edges incident to the same node appear consecutively. We improve over the state of the art results in both models. For the adjacency list order model, we show that (O) over tilde (c(-2)mR/root T) space is sufficient in one pass and (O) over tilde(epsilon(-2)m(3/2)/T) space is sufficient in two passes where the (O) over tilde(.) notation suppresses log factors. For the arbitrary order model, we show that (O) over tilde (epsilon(-2)m/root T) space suffices given two passes and that (O) over tilde(epsilon(-2)m(3/2)/T) space suffices given three passes and oracle access to the degrees. Finally, we show how to efficiently implement the "wedge sampling" approach to triangle estimation in the arbitrary order model. To do this, we develop the first algorithm for fp sampling such that multiple independent samples can be generated with O (polylog n) update time; this primitive is widely applicable and this result may be of independent interest.
更多查看译文
关键词
data streams,triangles,clustering coefficients
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络