Scalable KDE-based top-n local outlier detection over large-scale data streams

Knowledge-Based Systems(2020)

引用 23|浏览14
暂无评分
摘要
The detection of local outliers over high-volume data streams is critical for diverse real-time applications in the real world, where the distributions in different subsets of the data tend to be skewed. However, existing methods are not scalable to large-scale high-volume data streams owing to the high complexity of the re-detection of data updates. In this work, we propose a top-n local outlier detection method based on Kernel Density Estimation (KDE) over large-scale high-volume data streams. First, we define a KDE-based Outlier Factor (KOF) to measure the local outlierness score for the data points. Then, we propose the upper bounds of the KOF and an upper-bound-based pruning strategy to quickly eliminate the majority of the inlier points by leveraging the upper bounds without computing the expensive KOF scores. Moreover, we design an Upper-bound pruning-based top-n KOF detection method (UKOF) over data streams to efficiently address the data updates in a sliding window environment. Furthermore, we propose a Lazy update method of UKOF (LUKOF) for bulk updates in high-speed large-scale data streams to further minimize the computation cost. Our comprehensive experimental study demonstrates that the proposed method outperforms the state-of-the-art methods by up to 3,600 times in speed, while achieving the best performance in detecting local outliers over data streams.
更多
查看译文
关键词
Local outlier detection,Data streams,Kernel Density Estimation,Upper-bound based pruning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要