Sailfish: A Dependency-Aware and Resource Efficient Scheduling for Low Latency in Clouds.

2023 IEEE International Conference on Big Data (BigData)(2023)

引用 0|浏览3
暂无评分
摘要
Efficiently scheduling jobs in clouds is critical for job performance, system throughput and resource utilization. The growing importance of parallel applications in clouds introduces challenges in scheduling data-parallel jobs. Production data-parallel jobs increasingly have complex dependency structure, i.e., complex task dependencies expressed as directed acyclic graphs (DAGs), and heterogeneous resource demands. NP-hard problems are introduced by relaxing either of these challenges (i.e., scheduling of homogeneous tasks with dependency constraints or independent and heterogeneous tasks) for scheduler design. It is challenging to design a scheduler for simultaneously achieving low latency and high resource utilization due to the complex dependency structure and job heterogeneity. In this paper, we propose Sailfish, a dependency-aware and resource efficient scheduling for low latency in clouds. Sailfish first uses the machine learning algorithm to classify jobs into two categories (high priority jobs and low priority jobs) based on the extracted features. Next, Sailfish splits the jobs into tasks and distributes the tasks to the master nodes based on the dependency of tasks and the load of master nodes. Then, Sailfish utilizes the dependency information of tasks to determine tasks’ priority, and packs tasks by leveraging the complementary of tasks’ requirements on different resource types and task dependency. Finally, the master nodes leverage the proposed mutual reinforcement algorithm to distribute tasks to workers in the system based on the resource demands of tasks, the available resources of workers and task dependency. Extensive experimental results based on a real cluster and experiments using real-world Amazon EC2 cloud service show that Sailfish can improve the average resource utilization (by up to 40%) and reduce the latency (the average job completion time) significantly (by up to 91%) compared to the existing schedulers.
更多
查看译文
关键词
scheduling,task dependency,heterogeneity,resource utilization,latency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要