谷歌浏览器插件
订阅小程序
在清言上使用

Kmcex: Memory-Frugal and Retrieval-Efficient Encoding of Counted K-Mers.

Bioinformatics(2019)

引用 4|浏览34
暂无评分
摘要
MOTIVATION:K-mers along with their frequency have served as an elementary building block for error correction, repeat detection, multiple sequence alignment, genome assembly, etc., attracting intensive studies in k-mer counting. However, the output of k-mer counters itself is large; very often, it is too large to fit into main memory, leading to highly narrowed usability.RESULTS:We introduce a novel idea of encoding k-mers as well as their frequency, achieving good memory saving and retrieval efficiency. Specifically, we propose a Bloom filter-like data structure to encode counted k-mers by coupled-bit arrays-one for k-mer representation and the other for frequency encoding. Experiments on five real datasets show that the average memory-saving ratio on all 31-mers is as high as 13.81 as compared with raw input, with 7 hash functions. At the same time, the retrieval time complexity is well controlled (effectively constant), and the false-positive rate is decreased by two orders of magnitude.AVAILABILITY AND IMPLEMENTATION:The source codes of our algorithm are available at github.com/lzhLab/kmcEx.SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要