Mining Robust Frequent Items in Data Streams

2020 IEEE International Conference on Joint Cloud Computing(2020)

引用 0|浏览61
暂无评分
摘要
This paper studies the problem of robust frequent items mining in data streams that generalizes the traditional frequent items mining by considering the noise of datasets. That is, different items may correspond to the same entity because of noise; examples include different images of the same object and fluctuated data in the same setting measured by sensors. Our objective is to identify those items that correspond to the same entity and have an aggregated frequency exceeding a given threshold, which named as robust frequent items. To the best of our knowledge, there is no existing works on mining robust frequent items in a data stream. In this paper, we first propose a scheme by applying sampling and spatial partition to address the problem in low dimensional spaces. Furthermore, we extend the above algorithmic framework to high dimensional spaces by incorporating the locality sensitive hashing scheme to deal with the approximate nearest neighbor problem. We conduct evaluations using synthetic datasets and compare our scheme with two prior adapted schemes. Our results demonstrate that the efficiency of our algorithms outperforms the adaptive Space Saving by 14.8% and 9.8% in terms of precision and recall, respectively.
更多
查看译文
关键词
Robust frequent items,Sampling,LSH
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要