MinSNPs: an R package for derivation of resolution-optimised SNP sets from microbial genomic data

biorxiv(2022)

引用 0|浏览8
暂无评分
摘要
Here we present the R package - MinSNPs. This is designed to assemble resolution optimised sets of single nucleotide polymorphisms (SNPs) from alignments such as genome wide orthologous SNP matrices. We also demonstrate a pipeline for assembling such matrices from multiple bio-projects, so as to facilitate SNP set derivation from globally representative data sets. MinSNPs can derive sets of SNPs optimised for discriminating any user-defined combination of sequences from all others. Alternatively, SNP sets may be optimised to discriminate all from all, i.e., to maximise diversity. MinSNPs encompasses functions that facilitate rapid and flexible SNP mining, and clear and comprehensive presentation of the results. The MinSNPs running time scales in a linear fashion with input data volume, and the numbers of SNPs and SNPs sets specified in the output. MinSNPs was tested using a previously reported orthologous SNP matrix of Staphylococcus aureus . and an orthologous SNP matrix of 3,279 genomes with 164,335 SNPs assembled from four S. aureus short read genomic data sets. MinSNPs demonstrated efficacy in deriving discriminatory SNP sets for potential surveillance targets and in identifying SNP sets optimised to discriminate isolates from different clonal complexes (CC). MinSNPs was also tested with a large Plasmodium vivax orthologous SNP matrix. A set of five SNPs was derived that reliably indicated the country of origin within 3 south-east Asian countries. In summary, we report the capacity to assemble comprehensive SNP matrices that effectively capture microbial genomic diversity, and to rapidly and flexibly mine these entities for optimised surveillance marker sets. Impact statement We present the R package “MinSNPs”. This derives resolution optimised SNP sets from datasets of genome sequence variation. Such SNP sets can underpin targeted genetic analysis for high throughput surveillance of microbial variants of public health concern. MinSNPs supports considerable flexibility in search methods. The package allows non-specialist bioinformaticians to easily and quickly convert global scale data of intra-specific genomic variation into SNP sets precisely and efficiently directed towards many microbial genetic analysis tasks. Data summary 1. The source code for minSNPs is available from GitHub under MIT Licence (URLs – and mirrored in ) 2. Staphylococcus aureus (STARRS data set) Orthologous SNP Matrix; (URL - ) 3. Plasmodium vivax data set (VCF file); (URL - ) 4. Staphylococcus aureus short read sequences (fastq) from bioprojects: PRJEB40888 (or STARRS)(), PRJEB3174 (), PRJEB32286 (), and PRJNA400143 () ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
minsnps sets,resolution-optimised
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要