Finding Significant Items in Data Streams

2019 IEEE 35th International Conference on Data Engineering (ICDE)(2019)

引用 12|浏览78
暂无评分
摘要
Finding top-k frequent items has been a hot issue in databases. Finding top-k persistent items is a new issue, and has attracted increasing attention in recent years. In practice, users often want to know which items are significant, i.e., not only frequent but also persistent. No prior art can address both of the above two issues at the same time. Also, for high-speed data streams, they cannot achieve high accuracy when the memory is tight. In this paper, we define a new issue, named finding top-k significant items, and propose a novel algorithm namely LTC to address this issue. It includes two key techniques: Long-tail Replacement and a modified CLOCK algorithm. We theoretically prove there is no overestimation error and derive the correct rate and error bound. We conduct extensive experiments on three real datasets. Our experimental results show that LTC achieves 300~10^8 and in average 10^5 times higher accuracy than other related algorithms.
更多
查看译文
关键词
Conferences,Data engineering,IEEE 1394 Standard
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要