Clustering Based Identification of SARS-CoV-2 Subtypes

COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES(2021)

引用 5|浏览14
暂无评分
摘要
With the availability of more than half a million SARS-CoV-2 sequences and counting, many approaches have recently appeared which aim to leverage this information towards understanding the genomic diversity and dynamics of this virus. Early approaches involved building transmission networks or phylogenetic trees, the latter for which scalability becomes more of an issue with each day, due to its high computational complexity. In this work, we propose an alternative approach based on clustering sequences to identify novel subtypes of SARS-CoV-2 using methods designed for haplotyping intra-host viral populations. We assess this approach using cluster entropy, a notion which very naturally captures the underlying process of viral mutation-the first time entropy was used in this context. Using our approach, we were able to identify the well-known B.1.1.7 subtype from the sequences of the EMBL-EBI (UK) database, and also show that the associated cluster is consistent with a measure of fitness. This demonstrates that our approach as a viable and scalable alternative to unveiling trends in the spread of SARS-CoV-2.
更多
查看译文
关键词
Clustering, Viral strains, Viral subtypes, Entropy, Fitness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要