Expectation Maximization of Frequent Patterns, a Specific, Local, Pattern-Based Biclustering Algorithm for Biological Datasets.

Erin J. Moore,Thirimachos Bourlai

IEEE/ACM Trans. Comput. Biology Bioinform.(2016)

引用 8|浏览13
暂无评分
摘要
Currently, binary biclustering algorithms are too slow and non-specific to handle biological datasets that have a large number of attributes, which is essential for the computational biology problem of microarray analysis. Specialized computers may be needed to execute an algorithm, and may fail to produce a solution, due to its large resource needs. The biclusters also include too many false positives, the type I error, which hinders biological discovery. We propose an algorithm that can analyze datasets with a large attribute set at different densities, and can operate on a laptop, which makes it accessible to practitioners. EMFP produces biclusters that have a very low Root Mean Squared Error and false positive rate, with very few type II errors. Our binary biclustering algorithm is a hybrid, axis-parallel, pattern-based algorithm that finds multiple, non-overlapping, near-constant, deterministic, binary submatricies, with a variable confidence threshold, and the novel use of local density comparisons versus the standard global threshold. EMFP introduces a new, and intuitive way to calculate internal measures for binary biclustering methods. We also introduce a framework to ease comparison with other algorithms, and compare to both binary and general biclustering algorithms using two real, and 80 synthetic databases.
更多
查看译文
关键词
Algorithm design and analysis,Databases,Clustering algorithms,Standards,Machine learning algorithms,DNA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要