Assembly-free and alignment-free barcoding from genome skims

research in computational molecular biology(2018)

引用 23|浏览16
暂无评分
摘要
The ability to quickly and inexpensively describe the taxonomic diversity in an environment is critical in this era of rapid climate and biodiversity changes. The currently preferred molecular technique is (meta)barcoding in which taxonomically informative plasmid/mitochondrial markers are sequenced. It is low-cost, and widely used, but has drawbacks. As sequencing costs continue to fall, an alternative approach based on genome-skimming has been proposed. This approach first applies low-pass (100Mb -- several Gb per sample) sequencing to voucher and/or query samples and then recovers marker genes and/or organelle genomes computationally. In contrast, we suggest the use of the unassembled sequence data for taxonomic identification using an alignment-free approach based on the k-mer decomposition of the sequencing reads. Our approach is motivated by earlier work that connects genomic distance to the Jaccard index on k-mer collections, but improves upon prior work through a careful modeling of the impact of low-coverage, sequencing error, and other factors on the Jaccard index. Our tool, Skmer, estimates genomic distance between two organisms represented by their k-mer collections obtained from the genome-skims, and uses distance estimates to match a genome-skim query to a reference collection. Skmer shows excellent performance in our simulation studies, and makes the assembly-free approach to genome-skimming a viable alternative to the traditional barcoding. The Skmer software is made publicly available on https://github.com/shahab-sarmashghi/Skmer.git
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要