Eagle job-aware scheduling : divide and . . . reorder

semanticscholar(2016)

引用 0|浏览0
暂无评分
摘要
We present Eagle, a new hybrid cluster scheduler for data-parallel programs, consisting of a centralized scheduler for long jobs and a set of distributed schedulers for short jobs. Eagle incorporates two new techniques: succinct state sharing and sticky batch probing. With succinct state sharing, the centralized scheduler informs the distributed schedulers of the placement of long jobs in a low-overhead way. The distributed schedulers then avoid worker nodes with long jobs to minimize head-of-line blocking. Combined with a small, dedicated partition for short jobs, succinct state sharing entirely eliminates head-of-line blocking of short jobs by long jobs. With sticky batch probing, the distributed schedulers queue probes for their tasks at various worker nodes, but when a worker node finishes a task, rather than executing the next task in its queue, it requests a new task from a distributed scheduler according to the desired scheduling discipline. We use sticky batch probing to implement a distributed approximation of SRPT (Shortest Remaining Processing Time) with starvation prevention. We have implemented Eagle as a Spark plugin, and we have measured job completion times for a subset of the Google trace on a 100-node cluster for a variety of cluster loads. We show that Eagle improves at all percentiles over Hawk, an earlier hybrid scheduler with which it shares a code base. We provide simulation results for larger clusters, different traces, and for comparison with other scheduling policies. Using traces from Cloudera, Google and Yahoo, we show that Eagle outperforms other scheduling disciplines at most percentiles, and is more robust against mis-estimation of task duration.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要