Towards Fair and Firm Real-Time Scheduling in DNN Multi-Tenant Multi-Accelerator Systems via Reinforcement Learning
arxiv(2024)
摘要
This paper addresses the critical challenge of managing Quality of Service
(QoS) in cloud services, focusing on the nuances of individual tenant
expectations and varying Service Level Indicators (SLIs). It introduces a novel
approach utilizing Deep Reinforcement Learning for tenant-specific QoS
management in multi-tenant, multi-accelerator cloud environments. The chosen
SLI, deadline hit rate, allows clients to tailor QoS for each service request.
A novel online scheduling algorithm for Deep Neural Networks in
multi-accelerator systems is proposed, with a focus on guaranteeing
tenant-wise, model-specific QoS levels while considering real-time constraints.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要