A generic sketch for estimating super-spreaders and per-flow cardinality distribution in high-speed data streams

Computer Networks(2023)

引用 0|浏览0
暂无评分
摘要
For a high-speed network, it is an important task to process the IP packet stream using limited memory and measure its statistical metrics of interest. While many algorithms have been proposed to estimate the cardinality of a single data stream (i.e., the number of distinct elements), it remains a great challenge when a stream contains numerous sub-streams, called flows. In this paper, we focus on a problem of designing a generic data structure to measure multiple types of per-flow statistics in a high-speed stream, including per-flow cardinality, top-K super-spreading flows with the greatest cardinalities, per-flow cardinality moments and per-flow cardinality distribution. Previous solutions for generic measurement mainly focus on the frequency-related statistics measurement, while this paper makes a step forward to support deduplication, i.e., cardinality-related measuring. To address this new problem, we propose a generic sketch named M2D. The challenge is that the per-flow cardinality distribution is often highly skewed with a small proportion of super-spreaders. To tame the skewness, we adopt the adjustable progressive sampling technique, which samples subsets of flows by an exponentially decreasing probability according to their cardinalities. Based on the sampled super-spreaders, we estimate the moments of per-flow cardinalities with different orders. We finally apply the method of moments to reconstruct the per-flow cardinality distribution with no priori knowledge about its formula. Our experiments show M2D’s high memory efficiency (average savings of 38%) and satisfactory distribution estimation accuracy (2% to 98% improvement) than other algorithms.
更多
查看译文
关键词
Data streams, Network measurements, Generic sketch, Cardinality estimation, Vertex degree distribution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要