Approximate Partition Selection for Big-Data Workloads using Summary Statistics
Proc. VLDB Endow., pp. 2606-2619, 2020.
Many big-data clusters store data in large partitions that support access at a coarse, partition-level granularity. As a result, approximate query processing via row-level sampling is inefficient, often requiring reads of many partitions. In this work, we seek to answer queries quickly and approximately by reading a subset of the data p...More
PPT (Upload PPT)