Scalable Top-n Local Outlier Detection

KDD(2017)

引用 38|浏览23
暂无评分
摘要
Local Outlier Factor (LOF) method that labels all points with their respective LOF scores to indicate their status is known to be very effective for identifying outliers in datasets with a skewed distribution. Since outliers by definition are the absolute minority in a dataset, the concept of Top-N local outlier was proposed to discover the n points with the largest LOF scores. The detection of the Top-N local outliers is prohibitively expensive, since it requires huge number of high complexity k-nearest neighbor (kNN) searches. In this work, we present the first scalable Top-N local outlier detection approach called TOLF. The key innovation of TOLF is a multi-granularity pruning strategy that quickly prunes most points from the set of potential outlier candidates without computing their exact LOF scores or even without conducting any kNN search for them. Our customized density-aware indexing structure not only effectively supports the pruning strategy, but also accelerates the $k$NN search. Our extensive experimental evaluation on OpenStreetMap, SDSS, and TIGER datasets demonstrates the effectiveness of TOLF - up to 35 times faster than the state-of-the-art methods.
更多
查看译文
关键词
Local Outlier Factor,Top-N,Pruning Strategy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要