HPViewer: Sensitive and specific genotyping of human papillomavirus in metagenomic DNA

crossref(2017)

引用 0|浏览0
暂无评分
摘要
AbstractBackgroundShotgun DNA sequencing provides sensitive detection of all 182 HPV types in tissue and body fluid. However, existing computational methods either produce false positives misidentifying HPV types due to shared sequences among HPV, human, and prokaryotes, or produce false negative since they identify HPV by assembled contigs requiring large abundant of HPV reads.ResultsWe show that HPV shares extensive simple repeats with human and prokaryotes and homologous sequences among different HPV types. The shared sequences caused errors in HPV genotyping and the repeats of human origin caused false positives in HPVDetector. Programs, such as VirusTAP and Vipie, which require de novo assembly of shotgun reads into contigs, eliminated false positives at a cost of substantial reduction in sensitivity. Here, we designed HPViewer with two custom HPV reference databases masking simple repeats and homology sequences respectively and one homology distance matrix to hybridize these two databases. It directly identified HPV from short DNA reads rather than assembled contigs. Using 100,100 simulated samples, we revealed that HPViewer was robust for samples containing either high or low number of HPV reads. Using 12 shotgun sequencing samples from respiratory papillomatosis, HPViewer was equal to VirusTAP, and Vipie and better than HPVDetector with the respect to specificity and was the most sensitive method in the detection of HPV types 6 and 11. We demonstrated that contigs-based approaches had disadvantages of detection of HPV. In 1,573 sets of metagenomic data from 18 human body sites, HPViewer identified 104 types of HPV in a body-site associated pattern and 89 types of HPV co-occurring in one sample with other types of HPV at least once.ConclusionsWe demonstrated HPViewer was sensitive and specific for HPV detection in metagenomic data. It was also suggested that masking shared sequences is an effective approach to avoid false positive detection and identifying HPV from short metagenomic reads is more sensitive than assembled contigs. The innovative homology distance matrix connecting two HPV databases, repeat-mask and homology-mask, optimized the balance of sensitivity and specificity. HPViewer can be accessed at https://github.com/yuhanH/HPViewer/.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要