The Packing Server for real-time scheduling of MapReduce workflows

RTAS(2015)

引用 20|浏览41
暂无评分
摘要
This paper develops new schedulability bounds for a simplified MapReduce workflow model. MapReduce is a distributed computing paradigm, deployed in industry for over a decade. Different from conventional multiprocessor platforms, MapReduce deployments usually span thousands of machines, and a MapReduce job may contain as many as tens of thousands of parallel segments. State-of-the-art MapReduce workflow schedulers operate in a best-effort fashion, but the need for real-time operation has grown with the emergence of real-time analytic applications. MapReduce workflow details can be captured by the generalized parallel task model from recent real-time literature. Under this model, the best-known result guarantees schedulability if the task set utilization stays below 50% of total capacity, and the deadline to critical path length ratio, which we call the stretch φ, surpasses 2. This paper improves this bound further by introducing a hierarchical scheduling scheme based on the novel notion of a Packing Server, inspired by servers for aperiodic tasks. The Packing Server consists of multiple periodically replenished budgets that can execute in parallel and that appear as independent tasks to the underlying scheduler. Hence, the original problem of scheduling MapReduce workflows reduces to that of scheduling independent tasks. We prove that the utilization bound for schedulability of MapReduce workflows is UB · φ-β/φ , where UB is the utilization bound of the underlying independent task scheduling policy, and β is a tunable parameter that controls the maximum individual budget utilization. By leveraging past schedulability results for independent tasks on multiprocessors, we improve schedulable utilization of DAG workflows above 50% of total capacity, when the number of processors is large and the largest server budget is (sufficiently) smaller than its deadline. This surpasses the best known bounds fo- the generalized parallel task model. Our evaluation using a Yahoo! MapReduce trace as well as a physical cluster of 46 machines confirms the validity of the new utilization bound for MapReduce workflows.
更多
查看译文
关键词
directed graphs,parallel processing,processor scheduling,real-time systems,DAG workflows,MapReduce deployments,MapReduce job,MapReduce workflow model,MapReduce workflow schedulers,Yahoo! MapReduce trace,aperiodic tasks,budget utilization,critical path length ratio,directed acyclic graph,distributed computing paradigm,hierarchical scheduling,multiprocessors,packing server,parallel segments,parallel task model,periodically replenished budgets,physical cluster,real-time analytic applications,real-time scheduling,schedulability bounds,task scheduling policy,task set utilization,utilization bound,
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要