Faster & strong: string dictionary compression using sampling and fast vectorized decompression

Robert Lasch,Ismail Oukid,Roman Dementiev,Norman May,Suleyman S. Demirsoy,Kai-Uwe Sattler

The VLDB Journal（2020）

引用 6|浏览71

暂无评分

摘要

String dictionaries constitute a large portion of the memory footprint of database applications. While strong string dictionary compression algorithms exist, these come with impractical access and compression times. Therefore, lightweight algorithms such as front coding (PFC) are favored in practice. This paper endeavors to make strong string dictionary compression practical. We focus on Re-Pair Front Coding (RPFC), a grammar-based compression algorithm, since it consistently offers better compression ratios than other algorithms in the literature. To accelerate compression times, we propose block-based RPFC (BRPFC) which consists in independently compressing small blocks of the dictionary. For further accelerated compression times especially on large string dictionaries, we also propose an alternative version of BRPFC that uses sampling to speed up compression. Moreover, to accelerate access times, we devise a vectorized access method, using Intel^ Advanced Vector Extensions 512 ( Intel^ AVX-512). Our experimental evaluation shows that sampled BRPFC offers compression times up to 190 × faster than RPFC, and random string lookups 2.3 × faster than RPFC on average. These results move our modified RPFC into a practical range for use in database systems because the overhead of Re-Pair-based compression for access times can be reduced by 2 × .

查看译文

关键词

String dictionary,Compression,Re-pair,Vectorization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要