Nexus: a GPU cluster engine for accelerating DNN-based video analysis

Haichen Shen,Lequn Chen,Yuchen Jin,Liangyu Zhao,Bingyu Kong,Matthai Philipose,Arvind Krishnamurthy,Ravi Sundaram

Proceedings of the 27th ACM Symposium on Operating Systems Principles（2019）

引用 104|浏览236

暂无评分

摘要

We address the problem of serving Deep Neural Networks (DNNs) efficiently from a cluster of GPUs. In order to realize the promise of very low-cost processing made by accelerators such as GPUs, it is essential to run them at sustained high utilization. Doing so requires cluster-scale resource management that performs detailed scheduling of GPUs, reasoning about groups of DNN invocations that need to be co-scheduled, and moving from the conventional whole-DNN execution model to executing fragments of DNNs. Nexus is a fully implemented system that includes these innovations. In large-scale case studies on 16 GPUs, when required to stay within latency constraints at least 99% of the time, Nexus can process requests at rates 1.8-12.7X higher than state of the art systems can. A long-running multi-application deployment stays within 84% of optimal utilization and, on a 100-GPU cluster, violates latency SLOs on 0.27% of requests.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要