Dolphin : Runtime Optimization for Distributed Machine Learning

Proc. of ICML ML Systems Workshop(2016)

引用 3|浏览19
暂无评分
摘要
Large-scale machine learning (ML) systems are becoming widely used. Typically, these ML systems run on fixed resources, but it is difficult to find their optimal configurations (e.g., how many nodes to use, how to distribute data) since they depend on multiple factors such as hardware environments, ML algorithms, input datasets, etc. Furthermore, optimal configurations often can change over time due to fluctuating cluster resources and changing ML algorithm patterns. In this paper, we present Dolphin, an elastic machine learning framework that addresses the configuration problem at runtime. Dolphin solves a cost-based optimization problem to find an optimal configuration and reconfigures the system dynamically at runtime. Dolphin introduces a new distributed memory abstraction to change resource and data configurations based on the optimizer plan transparently and efficiently.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要