White-box Compression - Learning and Exploiting Compact Table Representations.

Bogdan Ghita,Diego G. Tomé,Peter A. Boncz

CIDR（2020）

引用 23|浏览100

暂无评分

摘要

htmlabstract We formulate a conceptual model for white-box compression, which represents the logical columns in tabular data as an openly deﬁned function over some actually stored physical columns. Each block of data should thus go accompanied by a header that describes this functional mapping. Because these compression functions are openly deﬁned, database systems can exploit them using query optimization and during execution, enabling e.g. better ﬁlter predicate pushdown. In addition, we show that white-box compression is able to identify a broad variety of new opportunities for compression, leading to much better compression factors. These opportunities are identiﬁed using an automatic learning process that learns the functions from the data. We provide a recursive pattern-driven algorithm for such learning. Finally, we demonstrate the effectiveness of white-box compression on a new benchmark we contribute hereby: the Public BI benchmark provides a rich set of real-world datasets. We believe our basic prototype for white-box compression opens the way for future research into transparent compressed data representations on the one hand and database system architectures that can eﬃciently exploit these on the other, and should be seen as another step into the direction of data management systems that are self-learning and optimize themselves for the data they are deployed on.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要