Statisticum: data statistics management in SAP HANA

Anisoara Nica,Reza Sherkat,Mihnea Andrei,Xun Chen,Martin Heidel,Christian Bensberg,Heiko Gerwens

Hosted Content（2017）

引用 28|浏览27

暂无评分

摘要

AbstractWe introduce a new concept of leveraging traditional data statistics as dynamic data integrity constraints. These data statistics produce transient database constraints, which are valid as long as they can be proven to be consistent with the current data. We denote this type of data statistics by constraint data statistics, their properties needed for consistency checking by consistency metadata, and their implied integrity constraints by implied data statistics constraints (implied constraints for short). Implied constraints are valid integrity constraints which are powerful query optimization tools employed, just as traditional database constraints, in semantic query transformation (aka query reformulation), partition pruning, runtime optimization, and semi-join reduction, to name a few. To our knowledge, this is the first work introducing this novel and powerful concept of deriving implied integrity constraints from data statistics. We discuss theoretical aspects of the constraint data statistics concept and their integration into query processing. We present the current architecture of data statistics management in SAP HANA and detail how constraint data statistics are designed and integrated into this architecture. As an instantiation of this framework, we consider dynamic partition pruning for data aging scenarios. We discuss our current implementation for constraint data statistics objects in SAP HANA which can be used for dynamic partition pruning. We enumerate their properties and show how consistency checking for implied integrity constraints is supported in the data statistics architecture. Our experimental evaluations on the TPC-H benchmark and a real customer application confirm the effectiveness of the implied integrity constraints; (1) for 59% of TPC-H queries, constraint data statistics utilization results in pruning cold partitions and reducing memory consumption, and (2) we observe up to 3 orders of magnitude speed-up in query processing time, for a real customer running an S/4HANA application.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要