Take out the rubbish – Removing NUMTs and pseudogenes from the Bemisia tabaci cryptic species mtCOI database

biorxiv(2019)

引用 8|浏览3
暂无评分
摘要
Identification of cryptic whitefly species complex currently relies on molecular characterisation of the mitochondrial DNA cytochrome oxidase subunit I (mtCOI) partial gene, however, nuclear mitochondrial sequences (NUMTs), PCR-derived pseudogenes and/or poor sequence editing have hindered this effort. To-date, 5,175 partial (≥ 300bp) mtCOI sequences for species identification purposes have been reported. We reviewed 10% of sequences representing the standard species complex mtCOI dataset. We found that 333 sequences (64.9%) were NUMTs, pseudogenes and/or affected by poor sequence quality. Amino acid pattern analyses of high throughput sequencing-derived mtCOI gene from 24 ‘’ and ‘non-’ species enabled differentiation between NUMTs/pseudogene-affected and likely real mtCOI sequences, and that the SSA4, SSA5/SSA8, AsiaII-2 and AsiaII_4 species were NUMTs/pseudogenes artefacts. Intra-specific uncorrected nucleotide distances (-dist) from our up-dated dataset ranged from 0-1.98%, inter-specific -dist within phylogenetic clades ranged between 2.5 and 8%, and 8 and >19% for species between phylogenetic clades. Differentiating between closely related species could therefore utilise an ‘average’ -dist of 2.5%. Despite the smaller mtCOI dataset, six putative new species were identified. Adoption of our standardised workflow and up-dated mtCOI clean dataset could facilitate better diagnostics of and ‘non-’ cryptic species.
更多
查看译文
关键词
NUMTs,Pseudogenes,amino acid conservation,hemipteran whiteflies,Aleyrodidae
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要