Hatch: Self-distributing systems for data centers

Future Generation Computer Systems（2022）

引用 4|浏览6

暂无评分

摘要

Designing and maintaining distributed systems remains highly challenging: there is a high-dimensional design space of potential ways to distribute a system’s sub-components over a large-scale infrastructure; and the deployment environment for a system tends to change in unforeseen ways over time. For engineers, this is a complex prediction problem to gauge which distributed design may best suit a given environment. We present the concept of self-distributing systems, in which any local system built using our framework can learn, at runtime, the most appropriate distributed design given its perceived operating conditions. Our concept abstracts distribution of a system’s sub-components to a list of simple actions in a reward matrix of distributed design alternatives to be used by reinforcement learning algorithms. By doing this, we enable software to experiment, in a live production environment, with different ways in which to distribute its software modules by placing them in different hosts throughout the system’s infrastructure. We implement this concept in a framework we call Hatch, which has three major elements: (i) a transparent and generalized RPC layer that supports seamless relocation of any local component to a remote host during execution; (ii) a set of primitives, including relocation, replication and sharding, from which to create an action/reward matrix of possible distributed designs of a system; and (iii) a decentralized reinforcement learning approach to converge towards more optimal designs in real time. Using an example of a self-distributing web-serving infrastructure, Hatch is able to autonomously select the most suitable distributed design from among ≈700,000 alternatives in about 5 min.

查看译文

关键词

Self-distributing systems,Emergent systems,Autonomic computing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要