Memory-Efficient FM-Index Construction for Reference Genomes.

BIBM(2022)

引用 0|浏览8
暂无评分
摘要
FM-index is traditionally constructed over the forward strand complemented with the reverse strand to support searching both strands by executing a single procedure. Although it expedite the process of indexing, it consumes large amount of memory. In this paper we propose a novel algorithm that is capable of compute the FM-index of a given reference sequence without appending its reverse complement to it. In fact, we deduce the rank of the suffixes on the reverse complemented DNA sequences from the suffix ranks on the forward strand. It reduces the memory consumption significantly. Given a reference genome F of length n, FR is the concatenated forward and reverse strand, where R is the reverse complement string of F such that each base is replaced with its corresponding complement in the reverse order. The algorithm makes it possible to compute the FM-index over 2n–symbols long FR by using the FM-index over the n–symbols long FM-index, which nearly halves the memory consumption. The embarrassingly parallel process can speed up significantly with the availability of more cores/threads.
更多
查看译文
关键词
genomes,memory–efficient,fm-index
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要