Sampling Rank Correlated Subgroups.

Mohamed-Ali Hammal,Bernardo Abreu,Marc Plantevit,Céline Robardet

DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 16TH INTERNATIONAL CONFERENCE（2020）

引用 0|浏览25

暂无评分

摘要

Data mining, a key technique in knowledge discovery, is the process of identifying useful patterns from a collection of data. This process is made difficult for complex data combining, for example, numeric and symbolic attributes, or also when the number of observations is large. In this paper, we present a pattern mining approach to identify local correlations in the data, that is to say, sets of numerical attributes that strongly co-vary together in a subset of the data. The sets of numerical attributes and the subset of data are automatically (inductively) identified by the method. Whereas the space of patterns to be potentially explored is exponential, the complexity of the problem can be overcome by using sampling techniques that have several advantages: (1) reducing the computation cost, (2) identifying most important patterns, and (3) making possible to process large databases by distributed computing on multiple machines.

查看译文

关键词

Data mining,Markov Chain sampling,Correlated subgroups

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要