CORDS: automatic generation of correlation statistics in DB2

VLDB(2004)

引用 21|浏览30
暂无评分
摘要
When query optimizers erroneously assume that database columns are statistically independent, they can underestimate the selectivities of conjunctive predicates by orders of magnitude. Such underestimation often leads to drastically suboptimal query execution plans. We demonstrate cords, an efficient and scalable tool for automatic discovery of correlations and soft functional dependencies between column pairs. We apply cords to real, synthetic, and TPC-H benchmark data, and show that cords discovers correlations in an efficient and scalable manner. The output of cords can be visualized graphically, making cords a useful mining and analysis tool for database administrators. cords ranks the discovered correlated column pairs and recommends to the optimizer a set of statistics to collect for the "most important" of the pairs. Use of these statistics speeds up processing times by orders of magnitude for a wide range of queries.
更多
查看译文
关键词
tpc-h benchmark data,correlation statistic,scalable manner,query optimizers,analysis tool,database column,scalable tool,column pair,suboptimal query execution plan,correlated column pair,automatic generation,database administrator,statistical independence,query optimization,functional dependency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要