KPop: Accurate, Assembly-Free, and Scalable Comparative Analysis of Microbial Genomes
biorxiv(2022)
School of Life Sciences and Department of Statistics | Biomathematics and Statistics Scotland
Abstract
The recent explosion in the amount of available sequencing data challenges existing analysis techniques. Here we introduce KPop, a novel versatile method based on full k -mer spectra and dataset-specific transformations, through which thousands of assembled or unassembled microbial genomes can be quickly compared. Unlike minimizer-based methods that produce distances and have lower resolution, KPop is able to accurately map sequences onto a low-dimensional space. Extensive validation on simulated and real-life viral and bacterial datasets shows that KPop can correctly separate sequences at both species and sub-species levels even when the overall genomic diversity is low. KPop also rapidly identifies related sequences and systematically outperforms minimizer-based methods. KPop’s code is open-source and available on GitHub at . ### Competing Interest Statement The authors have declared no competing interest.
MoreTranslated text
求助PDF
上传PDF
View via Publisher
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
- Pretraining has recently greatly promoted the development of natural language processing (NLP)
- We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
- We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
- The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
- Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Upload PDF to Generate Summary
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Related Papers
2012
被引用23337 | 浏览
2011
被引用215 | 浏览
2013
被引用333 | 浏览
2018
被引用1627 | 浏览
2018
被引用115 | 浏览
2019
被引用240 | 浏览
2020
被引用530 | 浏览
2021
被引用16 | 浏览
2021
被引用712 | 浏览
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper