Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

NATURE GENETICS(2022)

引用 54|浏览27
暂无评分
摘要
Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k -mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k -mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation—a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.
更多
查看译文
关键词
Genome informatics,Genomics,Biomedicine,general,Human Genetics,Cancer Research,Agriculture,Gene Function,Animal Genetics and Genomics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要