Deep Learning at Scale on NVIDIA V100 Accelerators

Rengan Xu, Frank Han,Quy Ta

2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)(2018)

引用 25|浏览20
暂无评分
摘要
The recent explosion in the popularity of Deep Learning (DL) is due to a combination of improved algorithms, access to large datasets and increased computational power. This had led to a plethora of open-source DL frameworks, each with varying characteristics and capabilities. End users are then left with the difficult task of determining software and hardware configurations to get optimal performance from each framework. We share our experiences and develop best practices for DL training with TensorFlow, MXNet and Caffe2. The paper also looks at DL inferencing with TensorRT on NVIDIA V100 Volta GPUs. It focuses on one of the more prominent neural network architectures, Resnet50, combined with Imagenet dataset. We quantify the impact of hardware attributes on DL workloads such as the usage of PCIe vs NVLink GPUs, performance past a single worker node, effect of high speed interconnect such as InfiniBand EDR on training and the implication of utilizing a network attached storage and its advantages.
更多
查看译文
关键词
Deep Learning,Distributed Training,GPU,Benchmarking,V100
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要