de novodiploid genome assembly using long noisy reads via haplotype-aware error correction and inconsistent overlap identification

bioRxiv (Cold Spring Harbor Laboratory)(2022)

引用 0|浏览1
暂无评分
摘要
AbstractHigh sequencing errors have impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to distinguish heterozygotes from high sequencing errors in long noisy reads and generate collapsed assemblies with lots of haplotype switches. Here, we present PECAT, aphasederrorcorrection andassemblytool for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We develop a read-level SNP caller that can further reduce the SNP errors in corrected reads. Then, we use a read grouping method to assign reads to different haplotype groups. To accelerate the assembling, PECAT only performs local alignment when it is necessary. PECAT efficiently assembles diploid genomes using only long noisy reads and generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly onB. taurus(Bison×Simmental) using Nanopore reads.
更多
查看译文
关键词
novo</i>diploid genome assembly,long noisy reads,inconsistent overlap identification,haplotype-aware
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要