Misclassification of a whole genome sequence reference defined by the Human Microbiome Project: a detrimental carryover effect to microbiome studies

DJ Darwin R. Bandoy,B Carol Huang,Bart C. Weimer

crossref（2019）

引用 2|浏览1

暂无评分

摘要

Taxonomic classification is an essential step in the analysis of microbiome data that depends on a reference database of whole genome sequences. Taxonomic classifiers are built on established reference species, such as the Human Microbiome Project database, that is growing rapidly. While constructing a population wide pangenome of the bacterium Hungatella , we discovered that the Human Microbiome Project reference species Hungatella hathewayi (WAL 18680) was significantly different to other members of this genus. Specifically, the reference lacked the core genome as compared to the other members. Further analysis, using average nucleotide identity (ANI) and 16s rRNA comparisons, indicated that WAL18680 was misclassified as Hungatella . The error in classification is being amplified in the taxonomic classifiers and will have a compounding effect as microbiome analyses are done, resulting in inaccurate assignment of community members and will lead to fallacious conclusions and possibly treatment. As automated genome homology assessment expands for microbiome analysis, outbreak detection, and public health reliance on whole genomes increases this issue will likely occur at an increasing rate. These observations highlight the need for developing reference free methods for epidemiological investigation using whole genome sequences and the criticality of accurate reference databases. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement This work was supported by the 100K Pathogen Genome Project. ### Author Declarations All relevant ethical guidelines have been followed and any necessary IRB and/or ethics committee approvals have been obtained. Yes All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived. NA Any clinical trials involved have been registered with an ICMJE-approved registry such as [ClinicalTrials.gov][1] and the trial ID is included in the manuscript. NA I have followed all appropriate research reporting guidelines and uploaded the relevant Equator, ICMJE or other checklist(s) as supplementary files, if applicable. NA The whole genome sequences are available now via the SRA for all bu, except BCW8888. It will be publically available within 90 days. [1]: http://ClinicalTrials.gov

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要