Faster & strong: string dictionary compression using sampling and fast vectorized decompression

The VLDB Journal(2020)

引用 6|浏览71
暂无评分
摘要
String dictionaries constitute a large portion of the memory footprint of database applications. While strong string dictionary compression algorithms exist, these come with impractical access and compression times. Therefore, lightweight algorithms such as front coding (PFC) are favored in practice. This paper endeavors to make strong string dictionary compression practical. We focus on Re-Pair Front Coding (RPFC), a grammar-based compression algorithm, since it consistently offers better compression ratios than other algorithms in the literature. To accelerate compression times, we propose block-based RPFC (BRPFC) which consists in independently compressing small blocks of the dictionary. For further accelerated compression times especially on large string dictionaries, we also propose an alternative version of BRPFC that uses sampling to speed up compression. Moreover, to accelerate access times, we devise a vectorized access method, using Intel^ Advanced Vector Extensions 512 ( Intel^ AVX-512). Our experimental evaluation shows that sampled BRPFC offers compression times up to 190 × faster than RPFC, and random string lookups 2.3 × faster than RPFC on average. These results move our modified RPFC into a practical range for use in database systems because the overhead of Re-Pair-based compression for access times can be reduced by 2 × .
更多
查看译文
关键词
String dictionary,Compression,Re-pair,Vectorization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要