scSPARKL: Apache Spark based parallel analytical framework for the downstream analysis of scRNA-seq data.

crossref(2023)

引用 0|浏览0
暂无评分
摘要
As the field of single-cell genomics continues to develop, the generation of large-scale scRNA-seq datasets has become more prevalent. While these datasets offer tremendous potential for shedding light on the complex biology of individual cells, the sheer volume of data presents significant challenges for management and analysis. To address these challenges, a new discipline, known as "big single-cell data science," has emerged. Within this field, a variety of computational tools have been developed to facilitate the processing and interpretation of scRNA-seq data. In this paper, we present a novel parallel analytical framework, scSPARKL, that leverages the power of Apache Spark to enable the efficient analysis of single-cell transcriptomic data. Our methodology incorporates six key operations for dealing with single-cell Big Data, including data reshaping, data preprocessing, cell/gene filtering, data normalization, dimensionality reduction, and clustering. By utilizing Spark's unlimited scalability, fault tolerance, and parallelism, scSPARKL enables researchers to rapidly and accurately analyze scRNA-seq datasets of any size. We demonstrate the utility of our framework through a series of experiments on simulated and real-world scRNA-seq data. Overall, our results suggest that scSPARKL represents a powerful and flexible tool for the analysis of single-cell transcriptomic data, with broad applications across the fields of biology and medicine.
更多
查看译文
关键词
apache scsparkl,parallel analytical framework,downstream analysis,scrna-seq
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要