Space Lower Bounds for Itemset Frequency Sketches.

SIGMOD/PODS'16: International Conference on Management of Data San Francisco California USA June, 2016(2016)

引用 10|浏览73
暂无评分
摘要
Given a database, computing the fraction of rows that contain a query itemset or determining whether this fraction is above some threshold are fundamental operations in data mining. A uniform sample of rows is a good sketch of the database in the sense that all sufficiently frequent itemsets and their approximate frequencies are recoverable from the sample, and the sketch size is independent of the number of rows in the original database. For many seemingly similar problems there are better sketching algorithms than uniform sampling. In this paper we show that for itemset frequency sketching this is not the case. That is, we prove that there exist classes of databases for which uniform sampling is a space optimal sketch for approximate itemset frequency analysis, up to constant or iterated-logarithmic factors.
更多
查看译文
关键词
sketching, itemset mining, marginal queries, conjunction queries, lower bounds
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要