Co-occurrence pattern mining based on a biological approximation scoring matrix

Pattern Anal. Appl.(2017)

引用 1|浏览91
暂无评分
摘要
Mining co-occurrence frequency patterns from multiple sequences is a hot topic in bioinformatics. Many seemingly disorganized constituents repetitively appear under different biological matrices, such as PAM250 and BLOSUM62, which are considered hidden frequent patterns ( FPs ). A hidden FP with both gap and flexible approximation operations (replacement, deletion or insertion) deepens the difficulty in discovering its true occurrences. To effectively discover co-occurrence FP s ( Co-FPs ) under these conditions, we design a mining algorithm ( co-fp-miner ) using the following steps: (1) a biological approximation scoring matrix is designed to discover various deformations of a single FP pattern; (2) a data-driven intersection tactic is used to generate candidate Co-FPs ; (3) a deterministic Apriori-like rule is proposed to prune unnecessary Co-FPs ; and (4) finally, we employ a backtracking matching scheme to validate true Co-FPs . The co-fp-miner algorithm is an unified framework for both exact and approximate mining on multiple sequences. Experiments on DNA and protein sequences demonstrate that co-fp-miner is more efficient on solutions, time and memory consumption than that of other peers.
更多
查看译文
关键词
Co-occurrence pattern, Pattern mining, Approximate, Gap, Edit distance matrix
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要