PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)（2020）

引用 111|浏览64

暂无评分

摘要

To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictive" multi-task scheduler to meet the latency demands of high-priority inference while maintaining high throughput. We evaluate both the mechanisms that enable NPUs to be preemptible and the policies that utilize them to meet scheduling objectives. We show that preemptive NPU multi-tasking can achieve an average 7.8×, 1.4×, and 4.8× improvement in latency, throughput, and SLA satisfaction, respectively.

查看译文

关键词

DNN acceleration,deep neural network,SLA satisfaction,PREMA,cloud vendors,preemptible neural processing units,predictive multitask scheduling algorithm,preemptive NPU multitasking,scheduling objectives,high-priority inference,predictive multitask scheduler,preemptible neural processing unit,multiple DNN service requests,virtualization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要