Rank correlated subgroup discovery

Journal of Intelligent Information Systems(2019)

引用 6|浏览38
暂无评分
摘要
Subgroup discovery (SD) and exceptional model mining (EMM), its generalization to handle more complex targets, are two mature fields at the frontier of data mining and machine learning. More precisely, EMM aims to find coherent subgroups of a dataset where multiple targets interact in an unusual way. Correlation model classes have already been defined to discover interesting subgroups when dealing with two numerical targets. However, in this supervised setting, the two numerical targets are fixed before the subgroup search. To make unsupervised exploration possible, we propose to search for arbitrary subsets of numerical targets whose correlation is exceptional for an automatically found subgroup. This involves solving two challenges: the definition of a model that evaluates the interest of a subgroup for a subset of numerical targets and the definition of a pattern language that enumerates both subgroups and targets and lends itself to effective research strategies. We propose an integrated solution to both challenges. We introduce the problem of rank-correlated subgroup discovery with an arbitrary subset of numerical targets. A rank-correlated subgroup is identified by both conditions on descriptive attributes, whether numeric or nominal, and a pattern on numeric attributes that captures (positive or negative) rank correlations based on a generalization of the Kendall’s τ . We define a new branch-and-bound algorithm that exploits some pruning properties based on two upper-bounds and a closure property. An empirical study on several datasets demonstrates the efficiency and the effectiveness of the algorithm.
更多
查看译文
关键词
Subgroup discovery,Exceptional model mining,Gradual patterns
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要