I/O Performance Evaluation of Large-Scale Deep Learning on an HPC System

Minho Bae,Minjoong Jeong,Sangho Yeo,Sangyoon Oh,Oh-Kyoung Kwon

2019 International Conference on High Performance Computing & Simulation (HPCS)（2019）

引用 3|浏览11

暂无评分

摘要

Recently, deep learning has become important in diverse fields. Because the process requires a huge amount of computing resources, many researchers have proposed methods to utilize large-scale clusters to reduce the training time. Despite many proposals concerning the training process for large-scale clusters, there remain areas to be developed. In this study, we benchmark the performance of Intel-Caffe, which is a generalpurpose distributed deep learning framework on the Nurion supercomputer of the Korea Institute of Science and Technology Information. We particularly focus on identifying the file I/O factors that affect the performance of Intel-Caffe, as well as a performance evaluation in a container-based environment. Finally, to the best of our knowledge, we present the first benchmark results for distributed deep learning in the container-based environment for a large-scale cluster.

查看译文

关键词

component,distributed deep learning,large-scale cluster,HPC,Intel-Caffe,large mini-batch

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要