MRF: a tool to overcome the barrier of inconsistent genome annotations and perform comparative genomics studies for the largest animal DNA virus

Virology journal(2023)

引用 1|浏览18
暂无评分
摘要
Background The genome of the largest known animal virus, the white spot syndrome virus (WSSV) responsible for huge economic losses and loss of employment in aquaculture, suffers from inconsistent annotation nomenclature. Novel genome sequence, circular genome and variable genome length led to nomenclature inconsistencies. Since vast knowledge has already accumulated in the past two decades with inconsistent nomenclature, the insights gained on a genome could not be easily extendable to other genomes. Therefore, the present study aims to perform comparative genomics studies in WSSV on uniform nomenclature. Methods We have combined the standard mummer tool with custom scripts to develop missing regions finder (MRF) that documents the missing genome regions and coding sequences in virus genomes in comparison to a reference genome and in its annotation nomenclature. The procedure was implemented as web tool and in command-line interface. Using MRF, we have documented the missing coding sequences in WSSV and explored their role in virulence through application of phylogenomics, machine learning models and homologous genes. Results We have tabulated and depicted the missing genome regions, missing coding sequences and deletion hotspots in WSSV on a common annotation nomenclature and attempted to link them to virus virulence. It was observed that the ubiquitination, transcription regulation and nucleotide metabolism might be essentially required for WSSV pathogenesis; and the structural proteins, VP19, VP26 and VP28 are essential for virus assembly. Few minor structural proteins in WSSV would act as envelope glycoproteins. We have also demonstrated the advantage of MRF in providing detailed graphic/tabular output in less time and also in handling of low-complexity, repeat-rich and highly similar regions of the genomes using other virus cases. Conclusions Pathogenic virus research benefits from tools that could directly indicate the missing genomic regions and coding sequences between isolates/strains. In virus research, the analyses performed in this study provides an advancement to find the differences between genomes and to quickly identify the important coding sequences/genomes that require early attention from researchers. To conclude, the approach implemented in MRF complements similarity-based tools in comparative genomics involving large, highly-similar, length-varying and/or inconsistently annotated viral genomes.
更多
查看译文
关键词
BLAST,Comparative genomics,Deleted CDS,Genome analysis,MRF,Virology
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要