I/O Performance Evaluation of Large-Scale Deep Learning on an HPC System

2019 International Conference on High Performance Computing & Simulation (HPCS)(2019)

引用 3|浏览11
暂无评分
摘要
Recently, deep learning has become important in diverse fields. Because the process requires a huge amount of computing resources, many researchers have proposed methods to utilize large-scale clusters to reduce the training time. Despite many proposals concerning the training process for large-scale clusters, there remain areas to be developed. In this study, we benchmark the performance of Intel-Caffe, which is a generalpurpose distributed deep learning framework on the Nurion supercomputer of the Korea Institute of Science and Technology Information. We particularly focus on identifying the file I/O factors that affect the performance of Intel-Caffe, as well as a performance evaluation in a container-based environment. Finally, to the best of our knowledge, we present the first benchmark results for distributed deep learning in the container-based environment for a large-scale cluster.
更多
查看译文
关键词
component,distributed deep learning,large-scale cluster,HPC,Intel-Caffe,large mini-batch
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要