Fixed Block Compression Boosting in FM-Indexes: Theory and Practice

Algorithmica(2018)

引用 17|浏览59
暂无评分
摘要
The FM index (Ferragina and Manzini in J ACM 52(4):552–581, 2005 ) is a widely-used compressed data structure that stores a string T in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple implementation and improved practical performance. Our main theoretical result is a new technique called fixed block compression boosting , which is a simpler and faster alternative to optimal compression boosting and implicit compression boosting used in previous FM-indexes. We also describe several new techniques for implementing fixed-block boosting efficiently, including a new, fast, and space-efficient implementation of wavelet trees. Our extensive experiments show the new indexes to be consistently fast and small relative to the state-of-the-art, and thus they make a good “off-the-shelf” choice for many applications.
更多
查看译文
关键词
Text indexing,Wavelet tree,FM-index,Compression boosting,Suffix array,Pattern matching
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要