Join Size Bounds using Lp-Norms on Degree Sequences

CoRR(2023)

引用 0|浏览22
暂无评分
摘要
Estimating the output size of a join query is a fundamental yet longstanding problem in database query processing. Traditional cardinality estimators used by database systems can routinely underestimate the true join size by orders of magnitude, which leads to significant system performance penalty. Recently, size upper bounds have been proposed that are based on information inequalities and incorporate sizes and max-degrees from input relations, yet they grossly overestimate the true join size. This paper puts forward a general class of size bounds that are based on information inequalities involving Lp-norms on the degree sequences of the join columns. They generalise prior efforts and can be asymptotically tighter than the known bounds. We give two types of lower and upper bounds: some hold for all entropic vectors, while others hold for all polymatroids. Whereas the former are asymptotically tight but possibly not computable, the latter are computable but not even asymptotically tight. In the case when all degree constraints are over a single variable then we call them "simple", and prove that the polymatroid and entropic bounds are equal, they are tight up to a query-dependent constant (which is stronger than asymptotically tight), are computable in exponential time in the size of the query, and that the worst case database instance that matches the bound has a simple structure called a "normal database".
更多
查看译文
关键词
degree sequences,size bounds,join,lp-norms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要