Load balancing and skew resilience for parallel joins

2016 IEEE 32nd International Conference on Data Engineering (ICDE)(2016)

引用 36|浏览64
暂无评分
摘要
We address the problem of load balancing for parallel joins.We show that the distribution of input data received and the output data produced by worker machines are both important for performance. As a result, previous work, which optimizes either for input or output, stands ineffective for load balancing. To that end, we propose a multi-stage load-balancing algorithm which considers the properties of both input and output data through sampling of the original join matrix. To do this efficiently, we propose a novel category of equi-weight histograms. To build them, we exploit state-of-the-art computational geometry algorithms for rectangle tiling. To our knowledge, we are the first to employ tiling algorithms for join load-balancing. In addition, we propose a novel, join-specialized tiling algorithm that has drastically lower time and space complexity than existing algorithms. Experiments show that our scheme outperforms state-of-the-art techniques by up to a factor of 15.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要