Profiling DNN Workloads on a Volta-based DGX-1 System

Saiful A. Mojumder,Marcia S. Louis,Yifan Sun,Amir Kavyan Ziabari,José L. Abellán,John Kim,David R. Kaeli,Ajay Joshi

2018 IEEE International Symposium on Workload Characterization (IISWC)（2018）

引用 28|浏览72

暂无评分

摘要

High performance multi-GPU systems are widely used to accelerate training of deep neural networks (DNNs) by exploiting the inherently massive parallel nature of the training process. Typically, the training of DNNs in multi-GPU systems leverages a data-parallel model in which a DNN is replicated on every GPU, and each GPU performs Forward Propagation (FP), Backward Propagation (BP) and, Weight Update (WU). We analyze the WU stage that is composed of collective communication (e.g., allReduce, broadcast), which demands very efficient communication among the GPUs to avoid diminishing returns when scaling the number of GPUs in the system. To overcome this issue, different data transfer mechanisms and libraries have been introduced by NVIDIA, and adopted by high-level frameworks to train DNNs. In this work, we evaluate and compare the performance of peer-to-peer (P2P) data transfer method and NCCL library-based communication method for training DNNs on a DGX-1 system consisting of 8 NVIDIA Volta-based GPUs. We profile and analyze the training of five popular DNNs (GoogLeNet, AlexNet, Inception-v3, ResNet and LeNet) using 1, 2, 4 and 8 GPUs. We show the breakdown of the training time across the FP+ BP stage and the WU stage to provide insights about the limiting factors of the training algorithm as well as to identify the bottlenecks in the multi-GPU system architecture. Our detailed profiling and analysis can help programmers and DNN model designers accelerate the training process in DNNs.

查看译文

关键词

high performance multiGPU systems,deep neural networks,forward propagation,backward propagation,weight update,data transfer mechanisms,profiling,DNN workloads,NVIDIA Volta-based GPUs,Volta-based DGX-1 system,DNN model designers,multiGPU system architecture,NCCL library-based communication method,peer-to-peer data transfer method,data-parallel model

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要