Rima: An RDMA-Accelerated Model-Parallelized Solution to Large-Scale Matrix Factorization

2019 IEEE 35th International Conference on Data Engineering (ICDE)(2019)

引用 15|浏览69
暂无评分
摘要
Matrix factorization (MF) is a fundamental technique in machine learning and data mining, which gains wide application in many fields. When the matrix becomes large, MF cannot be processed on a single machine. Considering this, many distributed SGD algorithms (e.g. DSGD) have been developed to solve large-scale MF on multiple machines in a model-parallel way. Existing distributed algorithms are primarily implemented under Map/Reduce or PS (parameter server)-based architectures, which incur significant communication overheads. Besides, existing solutions cannot well embrace the benefit of RDMA/RoCE transport and suffer from scalability problems. Targeting at these drawbacks, we propose Rima, which uses ring-based model parallelism to solve large-scale MF with higher communication efficiency. Compared with PS-based SGD algorithms, Rima also consumes less queue pairs (QPs) and can thus better leverage the power of RDMA/RoCE to accelerate the training speed. Our experiment shows that, compared with PS-based DSGD when solving 1M × 1M MF, Rima achieves comparable convergence performance after equal number of iterations, but reduces the training time by 68.7% and 85.4% via TCP and RDMA respectively.
更多
查看译文
关键词
Computer architecture,Distributed algorithms,Partitioning algorithms,Training,Machine learning,Data mining,Scalability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要