MLPs: Efficient Training of MiniGo on Large-scale Heterogeneous Computing System

2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)(2023)

引用 0|浏览24
暂无评分
摘要
Deep Reinforcement Learning has been successfully applied in various applications and achieved impressive performance compared with previous traditional methods but suffers from high computation cost and long training time. MLPerf takes deep reinforcement learning as one of the benchmark tracks and provides a single node training version of MiniGo as a reference. A key challenge is to achieve efficient MiniGo training on a large-scale computing system. According to the training computation pattern in MiniGo and the characteristics of our large-scale heterogeneous computing system, we propose a MultiLevel Parallel strategy, MLPs, including task-level parallelism between nodes, CPU-DSP heterogeneous parallelism, and DSP multi-core parallelism. The proposed method reduces the overall execution time from 43 hours to 16 hours while scaling the node size from 1067 to 4139. The scaling efficiency is 69.1%. According to our fitting method, the scaling efficiency is 46.5% when scaling to 8235 nodes. The experimental results show that the proposed method achieves the efficient training of MiniGo on the largescale heterogeneous computing system.
更多
查看译文
关键词
deep reinforcement learning,deep neural networks,MLPerf,heterogeneous architecture,large-scale parallel computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要