MitoGeneExtractor: Efficient extraction of mitochondrial genes from next-generation sequencing libraries

METHODS IN ECOLOGY AND EVOLUTION(2023)

引用 1|浏览2
暂无评分
摘要
Mitochondrial DNA (mtDNA) sequences are often found as byproducts in next-generation sequencing (NGS) datasets that were originally created to capture genomic or transcriptomic information of an organism. These mtDNA sequences are often discarded, wasting this valuable sequencing information. We developed MitoGeneExtractor, an innovative tool which allows to extract mitochondrial protein coding genes (PCGs) of interest from NGS libraries through multiple sequence alignments of sequencing reads to amino acid references. General references, for example on order level are sufficient for mining mitochondrial PCGs. In a case study, we applied MitoGeneExtractor to recently published genomic datasets of 1993 birds and were able to extract complete or nearly complete sequences for all 13 mitochondrial PCGs for a large proportion of libraries. Compared to an existing assembly guided sequence reconstruction algorithm, MitoGeneExtractor was faster and substantially more sensitive. We compared COI sequences mined with MitoGeneExtractor to COI databases. Mined sequences show a high sequence similarity and correct taxonomic assignment between the recovered sequence and the assigned morphospecies in most samples. In some cases of incongruent taxonomic assignments, we found evidence for contamination in NGS libraries. MitoGeneExtractor allows a fast extraction of mitochondrial PCGs from a wide range of NGS datasets. We recommend to routinely harvest and curate mitochondrial sequence information from genomic resources. MitoGeneExtractor output can be used to identify contaminated NGS libraries and to validate the species identity of the sequenced animal based on the extracted COI sequences.
更多
查看译文
关键词
COI,data mining,data re-use,DNA barcoding,mitochondrial genes,ND5
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要