Frequent Pattern Mining Based on Approximate Edit Distance Matrix

2016 IEEE First International Conference on Data Science in Cyberspace (DSC)(2016)

引用 4|浏览14
暂无评分
摘要
Frequent pattern mining has been a hot topic in many research domains, such as bioinformatics and text retrieval. Many seemingly disorganized constituents connected with different gap constraints often repetitively appear in these real-word sequences, which are considered as latent frequent patterns (FPs). A latent FP both with flexible gap and approximate constraints (replacement, deletion and insertion operations) deepen difficulty to discover its true occurrences. We design a Mining Approximate frequent PAttern (MAPA) algorithm to handle the problem: (1) We first extend an Approximate Edit Distance Matrix (A-EDM) by tackling replacement, insertion and deletion under the gap constraint to discover various deformations of patterns. (2) Then we employ an effectively back-tracking Approximate Pattern Matching (APM) scheme to obtain each candidate latent FP's support. (3) Finally, an Apriori-like deterministic pruning tactic is proposed to avoid generating unnecessary candidates for mining validation. The MAPA algorithm is a unified framework for both exact mining and approximate mining. Experiments on DNA and protein sequences demonstrate that MAPA is efficient on solutions, time and memory consumption than other peers.
更多
查看译文
关键词
frequent pattern,approximate operation,pattern mining,edit distance matrix,gap constraint
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要