M edea : scheduling of long running applications in shared production clusters

EuroSys '18: Thirteenth EuroSys Conference 2018 Porto Portugal April, 2018(2018)

引用 125|浏览178
暂无评分
摘要
The rise in popularity of machine learning, streaming, and latency-sensitive online applications in shared production clusters has raised new challenges for cluster schedulers. To optimize their performance and resilience, these applications require precise control of their placements, by means of complex constraints, e.g., to collocate or separate their long-running containers across groups of nodes. In the presence of these applications, the cluster scheduler must attain global optimization objectives, such as maximizing the number of deployed applications or minimizing the violated constraints and the resource fragmentation, but without affecting the scheduling latency of short-running containers. We present Medea, a new cluster scheduler designed for the placement of long-and short-running containers. Medea introduces powerful placement constraints with formal semantics to capture interactions among containers within and across applications. It follows a novel two-scheduler design: (i) for long-running containers, it applies an optimization-based approach that accounts for constraints and global objectives; (ii) for short-running containers, it uses a traditional task-based scheduler for low placement latency. Evaluated on a 400-node cluster, our implementation of Medea on Apache Hadoop YARN achieves placement of long-running applications with significant performance and resilience benefits compared to state-of-the-art schedulers.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要