Horizontal or Vertical?: A Hybrid Approach to Large-Scale Distributed Machine Learning

Proceedings of the 10th Workshop on Scientific Cloud Computing（2019）

引用 15|浏览116

暂无评分

摘要

Data parallelism and model parallelism are two typical parallel modes for distributed machine learning (DML). Traditionally, DML mainly leverages data parallelism, which maintains one model instance for each node and synchronizes the model parameters at the end of every iteration. However, as the model grows larger, communication cost and GPU memory consumption become significant. Data parallelism thus fails to work efficiently in large scale, and model-parallel solutions are proposed in recent years. In this paper, we comprehensively discuss the benefits and drawbacks on both sides. Based on the comparative analysis, we propose Hove, a hybrid approach incorporating data parallelism and model parallelism to balance the overheads and achieve high performance for large-scale DML.

查看译文

关键词

GPU utilization, communication overhead, data parallelism, hybrid approach, model parallelism

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要