Rima: An RDMA-Accelerated Model-Parallelized Solution to Large-Scale Matrix Factorization

2019 IEEE 35th International Conference on Data Engineering (ICDE)（2019）

引用 15|浏览69

暂无评分

摘要

Matrix factorization (MF) is a fundamental technique in machine learning and data mining, which gains wide application in many fields. When the matrix becomes large, MF cannot be processed on a single machine. Considering this, many distributed SGD algorithms (e.g. DSGD) have been developed to solve large-scale MF on multiple machines in a model-parallel way. Existing distributed algorithms are primarily implemented under Map/Reduce or PS (parameter server)-based architectures, which incur significant communication overheads. Besides, existing solutions cannot well embrace the benefit of RDMA/RoCE transport and suffer from scalability problems. Targeting at these drawbacks, we propose Rima, which uses ring-based model parallelism to solve large-scale MF with higher communication efficiency. Compared with PS-based SGD algorithms, Rima also consumes less queue pairs (QPs) and can thus better leverage the power of RDMA/RoCE to accelerate the training speed. Our experiment shows that, compared with PS-based DSGD when solving 1M × 1M MF, Rima achieves comparable convergence performance after equal number of iterations, but reduces the training time by 68.7% and 85.4% via TCP and RDMA respectively.

查看译文

关键词

Computer architecture,Distributed algorithms,Partitioning algorithms,Training,Machine learning,Data mining,Scalability

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要