Hatch: Self-distributing systems for data centers

Future Generation Computer Systems(2022)

引用 4|浏览6
暂无评分
摘要
Designing and maintaining distributed systems remains highly challenging: there is a high-dimensional design space of potential ways to distribute a system’s sub-components over a large-scale infrastructure; and the deployment environment for a system tends to change in unforeseen ways over time. For engineers, this is a complex prediction problem to gauge which distributed design may best suit a given environment. We present the concept of self-distributing systems, in which any local system built using our framework can learn, at runtime, the most appropriate distributed design given its perceived operating conditions. Our concept abstracts distribution of a system’s sub-components to a list of simple actions in a reward matrix of distributed design alternatives to be used by reinforcement learning algorithms. By doing this, we enable software to experiment, in a live production environment, with different ways in which to distribute its software modules by placing them in different hosts throughout the system’s infrastructure. We implement this concept in a framework we call Hatch, which has three major elements: (i) a transparent and generalized RPC layer that supports seamless relocation of any local component to a remote host during execution; (ii) a set of primitives, including relocation, replication and sharding, from which to create an action/reward matrix of possible distributed designs of a system; and (iii) a decentralized reinforcement learning approach to converge towards more optimal designs in real time. Using an example of a self-distributing web-serving infrastructure, Hatch is able to autonomously select the most suitable distributed design from among ≈700,000 alternatives in about 5 min.
更多
查看译文
关键词
Self-distributing systems,Emergent systems,Autonomic computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要