ARBic: an all-round biclustering algorithm for analyzing gene expression data.

NAR genomics and bioinformatics(2023)

引用 0|浏览2
Identifying significant biclusters of genes with specific expression patterns is an effective approach to reveal functionally correlated genes in gene expression data. However, none of existing algorithms can simultaneously identify both broader and narrower biclusters due to their failure of balancing between effectiveness and efficiency. We introduced ARBic, an algorithm which is capable of accurately identifying any significant biclusters of any shape, including broader, narrower and square, in any large scale gene expression dataset. ARBic was designed by integrating column-based and row-based strategies into a single biclustering procedure. The column-based strategy borrowed from RecBic, a recently published biclustering tool, extracts narrower biclusters, while the row-based strategy that iteratively finds the longest path in a specific directed graph, extracts broader ones. Being tested and compared to other seven salient biclustering algorithms on simulated datasets, ARBic achieves at least an average of 29% higher recovery, relevance and[Formula: see text] scores than the best existing tool. In addition, ARBic substantially outperforms all tools on real datasets and is more robust to noises, bicluster shapes and dataset types.
AI 理解论文