Accelerating Distributed Machine Learning by Smart Parameter Server

Proceedings of the 3rd Asia-Pacific Workshop on Networking 2019(2019)

引用 9|浏览88
暂无评分
摘要
Parameter Server (PS)-based architecture is widely applied in distributed machine learning (DML), but it is still an open issue how to improve the DML performance in this frame-work. Existing works mainly focus on the view of workers. In this paper, we tackle this problem from another perspective, by leveraging the central control on the PS. Specifically, we propose SmartPS, which transforms the passive role of PS in traditional DML and fully exploits the intelligence of PS. Firstly, the PS holds the global view of parameter dependency, facilitating it to update workers' parameters selectively and proactively. Secondly, the PS records the workers' speeds, and prioritizes parameter transmission to narrow the gap between stragglers and fast workers. Thirdly, the PS considers the parameter dependency in consecutive training iterations, and opportunistically blocks unnecessary pushes from workers. We conduct comparative experiments with two typical benchmarks, Matrix Factorization (MF) and PageRank (PR). The experimental results prove that, compared with all the baseline algorithms (i.e. standard BSP, ASP and SSP), SmartPS can reduce the overall training time by 65.7%~84.9%, with the same training accuracy.
更多
查看译文
关键词
Distributed machine learning (DML), global view, opportunistically block, parameter dependency, prioritize parameter transmission
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要