Approximate Partition Selection for Big-Data Workloads using Summary Statistics
Proc. VLDB Endow., pp. 2606-2619, 2020.
EI
Abstract:
Many big-data clusters store data in large partitions that support access at a coarse, partition-level granularity. As a result, approximate query processing via row-level sampling is inefficient, often requiring reads of many partitions. In this work, we seek to answer queries quickly and approximately by reading a subset of the data p...More
Code:
Data:
Full Text
Tags
Comments