Take out the rubbish – Removing NUMTs and pseudogenes from the Bemisia tabaci cryptic species mtCOI database
biorxiv(2019)
摘要
Identification of cryptic whitefly species complex currently relies on molecular characterisation of the mitochondrial DNA cytochrome oxidase subunit I (mtCOI) partial gene, however, nuclear mitochondrial sequences (NUMTs), PCR-derived pseudogenes and/or poor sequence editing have hindered this effort. To-date, 5,175 partial (≥ 300bp) mtCOI sequences for species identification purposes have been reported. We reviewed 10% of sequences representing the standard species complex mtCOI dataset. We found that 333 sequences (64.9%) were NUMTs, pseudogenes and/or affected by poor sequence quality. Amino acid pattern analyses of high throughput sequencing-derived mtCOI gene from 24 ‘’ and ‘non-’ species enabled differentiation between NUMTs/pseudogene-affected and likely real mtCOI sequences, and that the SSA4, SSA5/SSA8, AsiaII-2 and AsiaII_4 species were NUMTs/pseudogenes artefacts. Intra-specific uncorrected nucleotide distances (-dist) from our up-dated dataset ranged from 0-1.98%, inter-specific -dist within phylogenetic clades ranged between 2.5 and 8%, and 8 and >19% for species between phylogenetic clades. Differentiating between closely related species could therefore utilise an ‘average’ -dist of 2.5%. Despite the smaller mtCOI dataset, six putative new species were identified. Adoption of our standardised workflow and up-dated mtCOI clean dataset could facilitate better diagnostics of and ‘non-’ cryptic species.
更多查看译文
关键词
NUMTs,Pseudogenes,amino acid conservation,hemipteran whiteflies,Aleyrodidae
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要