Time-Based Roofline for Deep Learning Performance Analysis

Yunsong Wang,Charlene Yang,Steven Farrell,Yan Zhang,Thorsten Kurth,Samuel Williams

2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS)（2020）

引用 9|浏览42

暂无评分

摘要

Deep learning applications based on neural networks are generating considerable interest in various fields due to their high accuracy. Such an application is usually very compute-intensive thus requires a long run time. Researchers and engineers are actively exploring new solutions to this issue from both hardware and software/algorithm sides. However, little previous work has focused on providing a practical methodology to characterize deep learning performance bottlenecks and potentially guide the following optimization efforts. In this paper, we introduce an extension of the Roofline model and use it to analyze two representative computation kernels in deep learning, 2D convolution and long short-term memory, on NVIDIA GPUs. This new time-based Roofline model incorporates both compute/bandwidth complexity and run time in its formulae to demonstrate performance issues that cannot be reflected by the classic Roofline. Factors such as arithmetic intensity, data transfer, kernel launch overhead, and the Tensor Core usage will be examined by varying different parameters such as batch size and feature size, etc. This work helped form a more systematic way to understand the performance issue of deep learning applications. Last but not least, this generic performance model can be applied to a wide category of applications besides deep learning as well.

查看译文

关键词

Roofline,Performance Analysis,Deep Learning,NVIDIA GPU,PyTorch,TensorFlow

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要