Variable Length Compression for Bitmap Indices.

Fabian Corrales,David Chiu,Jason Sawin

DEXA'11: Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II（2011）

引用 23|浏览19

暂无评分

摘要

Modern large-scale applications are generating staggering amounts of data. In an effort to summarize and index these data sets, databases often use bitmap indices. These indices have become widely adopted due to their dual properties of (1) being able to leverage fast bit-wise operations for query processing and (2) compressibility. Today, two pervasive bitmap compression schemes employ a variation of run-length encoding, aligned over bytes (BBC) and words (WAH), respectively. While BBC typically offers high compression ratios, WAH can achieve faster query processing, but often at the cost of space. Recent work has further shown that reordering the rows of a bitmap can dramatically increase compression. However, these sorted bitmaps often display patterns of changing run-lengths that are not optimal for a byte nor a word alignment. We present a general framework to facilitate a variable length compression scheme. Given a bitmap, our algorithm is able to use different encoding lengths for compression on a per-column basis. We further present an algorithm that efficiently processes queries when encoding lengths share a common integer factor. Our empirical study shows that in the best case our approach can out-compress BBC by 30% and WAH by 70%, for real data sets. Furthermore, we report a query processing speedup of 1.6× over BBC and 1.25× over WAH. We will also show that these numbers drastically improve in our synthetic, uncorrelated data sets.

查看译文

关键词

query processing,high compression ratio,pervasive bitmap compression scheme,variable length compression scheme,bitmap index,data set,real data set,uncorrelated data set,different encoding length,encoding length

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要