Scalable De Novo Genome Assembly Using a Pregel-Like Graph-Parallel System

IEEE/ACM Transactions on Computational Biology and Bioinformatics(2021)

引用 15|浏览98
暂无评分
摘要
AbstractDe novo genome assembly is the process of stitching short DNA sequences to generate longer DNA sequences, without using any reference sequence for alignment. It enables high-throughput genome sequencing and thus accelerates the discovery of new genomes. In this paper, we present a toolkit, called PPA-assembler, for de novo genome assembly in a distributed setting. The operations in our toolkit provide strong performance guarantees, and can be assembled to implement various sequencing strategies. PPA-assembler adopts the popular de Bruijn graph based approach for sequencing, and each operation is implemented as a program in Google’s Pregel framework which can be easily deployed in a generic cluster. Experiments on large real and simulated datasets demonstrate that PPA-assembler is much more efficient than the state-of-the-arts while providing comparable sequencing quality. PPA-assembler has been open-sourced at https://github.com/yaobaiwei/PPA-Assembler.
更多
查看译文
关键词
Genome assembly, graph, distributed, vertex-centric, Pregel, DNA, read, contig, k-mer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要