Effective Discovery of Meaningful Outlier Relationships

ACM/IMS Transactions on Data Science(2020)

引用 4|浏览82
暂无评分
摘要
AbstractWe propose Predictable Outliers in Data-trendS (PODS), a method that, given a collection of temporal datasets, derives data-driven explanations for outliers by identifying meaningful relationships between them. First, we formalize the notion of meaningfulness, which so far has been informally framed in terms of explainability. Next, since outliers are rare and it is difficult to determine whether their relationships are meaningful, we develop a new criterion that does so by checking if these relationships could have been predicted from non-outliers, i.e., whether we could see the outlier relationships coming. Finally, searching for meaningful outlier relationships between every pair of datasets in a large data collection is computationally infeasible. To address that, we propose an indexing strategy that prunes irrelevant comparisons across datasets, making the approach scalable. We present the results of an experimental evaluation using real datasets and different baselines, which demonstrates the effectiveness, robustness, and scalability of our approach.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要