Chrome Extension
WeChat Mini Program
Use on ChatGLM

Hydra: Deadline-Aware and Efficiency-Oriented Scheduling for Deep Learning Jobs on Heterogeneous GPUs

IEEE transactions on computers/IEEE transactions on computers(2023)

Cited 3|Views8
No score
Abstract
With the rapid proliferation of deep learning (DL) jobs running on heterogeneous GPUs, scheduling DL jobs to meet various scheduling requirements, such as meeting deadlines and reducing job completion time (JCT), is critical. Unfortunately, existing efficiency-oriented and deadline-aware efforts are still rudimentary. They lack the capability of scheduling jobs to meet deadline requirements while reducing total JCT, especially when the jobs have various execution times on heterogeneous GPUs. Therefore, we present Hydra, a novel quantitative cost comparison approach, to address this scheduling issue. Here, the cost represents the total JCT plus a dynamic penalty calculated from the total tardiness (i.e., the delay time of exceeding the deadline) of all jobs. Hydra adopts a sampling approach that exploits the inherent iterative periodicity of DL jobs to estimate job execution times accurately on heterogeneous GPUs. Then, Hydra considers various combinations of job sequences and GPUs to obtain the minimized cost by leveraging an efficient branch-and-bound algorithm. Finally, the results of evaluation experiments on Alibaba traces show that Hydra can reduce total tardiness by 85.8% while reducing total JCT as much as possible, compared with state-of-the-art efforts.
More
Translated text
Key words
Deadline-aware scheduler,deep learning,GPU cluster
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined