
Exploring the genome and protein space of viruses


引用 0|浏览33
Recent metagenomic studies have identified a vast number of viruses. However, the systematic assessment of the true genetic diversity of the whole virus community on our planet remains to be investigated. Here, we explored the genome and protein space of viruses by simulating the process of virus discovery in viral metagenomic studies. Among multiple functions, the power function was found to best fit the increasing trends of virus diversity and was therefore used to predict the genetic space of viruses. The estimate suggests that there are at least 8.23e+08 viral Operational Taxonomic Units (vOTUs) and 1.62e+09 viral protein clusters on Earth when assuming the saturation of the virus genetic space, taking into account the balance of costs and the identification of novel viruses. It’s noteworthy that less than 3% of the viral genetic diversity has been uncovered thus far, emphasizing the vastness of the unexplored viral landscape. To saturate the genetic space, a total of 3.08e+08 samples would be required. Analysis of viral genetic diversity by ecosystem yielded estimates consistent with those mentioned above. Furthermore, the estimate of the virus genetic space remained robust when accounting for the redundancy of sampling, sampling time, sequencing platform, and parameters used for protein clustering. This study provides a guide for future sequencing efforts in virus discovery and contributes to a better understanding of viral diversity in nature. ### Competing Interest Statement The authors have declared no competing interest.
AI 理解论文
Chat Paper