Learning-Augmented Data Stream Algorithms

Tanqiu Jiang,Yi Li,Honghao Lin,Yisong Ruan,David P. Woodruff

ICLR（2020）

引用 36|浏览118

暂无评分

摘要

The data stream model is a fundamental model for processing massive data sets with limited memory and fast processing time. Recently Hsu et al. (2019) incorporated machine learning techniques into the data stream model in order to learn relevant patterns in the input data. Such techniques were encapsulated by training an oracle to predict item frequencies in the streaming model. In this paper we explore the full power of such an oracle, showing that it can be applied to a wide array of problems in data streams, sometimes resulting in the first optimal bounds for such problems. Namely, we apply the oracle to counting distinct elements on the difference of streams, estimating frequency moments, estimating cascaded aggregates, and estimating moments of geometric data streams. For the distinct elements problem, we obtain the first memory-optimal algorithms. For estimating the $p$-th frequency moment for $0 < p < 2$ we obtain the first algorithms with optimal update time. For estimating the $p$-the frequency moment for $p > 2$ we obtain a quadratic saving in memory. We empirically validate our results, demonstrating also our improvements in practice.

查看译文

关键词

streaming algorithms, heavy hitters, F_p moment, distinct elements, cascaded norms

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要