Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters.

SIGMOD/PODS'16: International Conference on Management of Data San Francisco California USA June, 2016(2016)

引用 173|浏览277
暂无评分
摘要
We present a system that approximates the answer to complex ad-hoc queries in big-data clusters by injecting samplers on-the-fly and without requiring pre-existing samples. Improvements can be substantial when big-data queries take multiple passes over data and when samplers execute early in the query plan. We present a new, universe, sampler which is able to sample multiple join inputs. By incorporating samplers natively into a cost-based query optimizer, we automatically generate plans with appropriate samplers at appropriate locations. We devise an accuracy analysis method using which we ensure that query plans with samplers will not miss groups and that aggregate values are within a small ratio of their true value. An implementation on a cluster with tens of thousands of machines shows that queries in the TPC-DS benchmark use a median of 2X fewer resources. In contrast, approaches that construct input samples even when given 10X the size of the input to store samples improve only 22% of the queries, i.e., a median speed up of 0X.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要