MarkerScan: Separation and assembly of cobionts sequenced alongside target species in biodiversity genomics projects [version 1; peer review: 1 approved, 3 approved with reservations]

Wellcome Open Research(2024)

引用 0|浏览0
暂无评分
摘要
Contamination of public databases by mislabelled sequences has been highlighted for many years and the avalanche of novel sequencing data now being deposited has the potential to make databases difficult to use effectively. It is therefore crucial that sequencing projects and database curators perform pre-submission checks to remove obvious contamination and avoid propagating erroneous taxonomic relationships. However, it is important also to recognise that biological contamination of a target sample with unexpected species’ DNA can also lead to the discovery of fascinating biological phenomena through the identification of environmental organisms or endosymbionts. Here, we present a novel, integrated method for detection and generation of high-quality genomes of all non-target genomes co-sequenced in eukaryotic genome sequencing projects. After performing taxonomic profiling of an assembly from the raw data, and leveraging the identity of small rRNA sequences discovered therein as markers, a targeted classification approach retrieves and assembles high-quality genomes. The genomes of these cobionts are then not only removed from the target species’ genome but also available for further interrogation. Source code is available from https://github.com/CobiontID/MarkerScan. MarkerScan is written in Python and is deployed as a Docker container.
更多
查看译文
关键词
bioinformatics tools,cobionts,genome sequencing,database contamination,eukaryotic genomics,eng
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要