Iterative sampling based frequent itemset mining for big data

Int. J. Machine Learning & Cybernetics(2015)

引用 28|浏览57
暂无评分
摘要
Frequent pattern mining attracts extensive research interests over the past two decades: including mining frequent item sets from transactions, extracting frequent sequences from bio-arrays and detecting common subgraph from molecular structures. In the era of big data, the explosive data volume brings new challenges to frequent pattern mining: (1) Space complexity: both input data, intermediate results and the outputted patterns could be too large to fit into memory which prevents many algorithms from executing; (2) Time complexity: many existing approaches rely on exhaustive search or complicated data structures to mine frequent patterns which prove to be inapplicable for big data. To deal with these two challenges. we propose ISbFIM, an Iterative Sampling based Frequent Itemset Mining method. Rather than process the entire data set at once, ISbFIM samples computationally-manageable subsets and extracts frequent itemsets from these subsets. By repeating this process for a sufficient number of times, we can guarantee both theoretically and empirically that the frequent itemsets can be enumerated without running into a combinatorial explosion. ISbFIM can be easily parallelized and applied to mine item sets, sequences or structures. We implement a Map-Reduce version of ISbFIM to demonstrate its scalability on big data.
更多
查看译文
关键词
Frequent itemset mining, Big data, Iterative sampling, Parallelization, Map-reduce
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要