Efficient inference of large pangenomes with PanTA

Duc Quang Le, Tien Anh Nguyen, Son Hoang Nguyen,Tam Thi Nguyen,Canh Hao Nguyen,Huong Thanh Phung,Tho Huu Ho,Nam S. Vo, Trang Nguyen, Hoang Anh Nguyen, Minh Duc Cao

biorxiv(2024)

引用 0|浏览0
暂无评分
摘要
Pangenome analysis is an indispensable step in bacterial genomics to address the high variability of bacteria genomes. However, speed and scalability remain a challenge for pangenome inference software tools to cope with the fast-growing genomic collections. We present PanTA, a software package for constructing the pangenomes of large bacterial collections. We show that PanTA exhibits an unprecedented multiple times more efficient than the current state-of-the-arts while maintaining a similar pangenome accuracy. In addition, PanTA introduces a novel mechanism to construct the pangenome progressively where new samples are added into an existing pangenome without rebuilding the accumulated collection from scratch. In the progressive mode, PanTA is demonstrated to consume orders of magnitude less computational resource than existing solutions in managing the pangenomes of growing microbial datasets. We further show that PanTA can build the pangenome of the entire collection of >28000 Escherichia coli genomes from the RefSeq database on a laptop computer in 32 hours, highlighting the scalability and practicality of PanTA.The software is open source and is publicly available at under an MIT license. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要