You can move, but you can't hide: identification of mobile genetic elements with geNomad

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览5
暂无评分
摘要
Identifying and characterizing mobile genetic elements (MGEs) in sequencing data is essential for understanding their diversity, ecology, biotechnological applications, and impact on public health. Here, we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a large dataset of marker proteins to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks that included diverse MGE and chromosome sequences, geNomad significantly outperformed other tools in all evaluated clades of plasmids and viruses. Leveraging geNomad's speed and scalability, we were able to process public metagenomes and metatranscriptomes, leading to the discovery of millions of new viruses and plasmids that are available through the IMG/VR and IMG/PR databases. We anticipate that geNomad will enable further advancements in MGE research, and it is available at https://portal.nersc.gov/genomad.
更多
查看译文
关键词
mobile genetic elements,identification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要