Hierarchical Roofline Performance Analysis for Deep Learning Applications

arxiv(2020)

引用 7|浏览28
暂无评分
摘要
This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the Empirical Roofline Toolkit for more data precision support and Tensor Core support and introduces an Nsight Compute based method to accurately collect application performance information. This methodology allows for automated machine characterization and application characterization for Roofline analysis across the entire memory hierarchy on NVIDIA GPUs, and it is validated by a complex deep learning application used for climate image segmentation. We will use two versions of the code, in TensorFlow and PyTorch respectively, to demonstrate the use and effectiveness of this methodology, and some insights will be highlighted on how the application utilizes the compute and memory capabilities on the GPU and how the implementation and performance differs in two deep learning frameworks.
更多
查看译文
关键词
Roofline model, Performance analysis, Memory hierarchy, NVIDIA GPUs, Deep learning, Image segmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要