Accordia - Adaptive Cloud Configuration Optimization for Recurring Data-Intensive Applications.
SoCC '19: ACM Symposium on Cloud Computing Santa Cruz CA USA November, 2019(2019)
摘要
Recognizing the diversity of big data analytic jobs, cloud providers offer a wide range of virtual machine (VM) instances for different use cases. The choice of cloud instance configurations can have significant impact on the response time and running cost of data-intensive, recurring jobs for production. A poor choice of cloud instance-type/configuration can substantially degrade the response time by 5x, or increase the cost by 10x. Identifying the best cloud configuration under low search budget is a challenging problem due to i) the large and high-dimensional configuration-parameters space, ii) the dynamically varying price of some instance types, iii) job response time variation even given the same configuration, and iv) gradual drifts/ unexpected changes of the characteristics of the recurring jobs. To tackle this problem, we have designed and implemented Accordia, a system which enables Adaptive Cloud Configuration Optimization for Recurring Data-Intensive Applications.
Accordia extends the Gaussian-Process Upper Confidence Bound (GP-UCB) approach in [3] to search for and track the potentially dynamic optimal cloud configuration within a high-dimensional para-meter-space. Unlike other state-of-the-art schemes, such as CherryPick[1] and Arrow[2], Accordia can handle time-varying instance pricing while providing a performance guarantee of sub-linear regret when comparing with the static, offline optimial solution.
Figure 1 depicts the system architecture of our implementation of Accordia for Apache Spark running over Kubernetes. When a job is submitted, a Spark driver and multiple Spark executors are deployed as containers, each within its own Kubernetes pod. Accordia then dynamically adjusts the resource types/ allocation for the containers within their respective pods to minimize the job completion cost using the GP-UCB online-learning approach.
To evaluate the performance of Accordia, we have run different mixes of recurring Spark jobs over the Google public cloud. In our experiments, Accordia dynamically learns the best cloud configuration from over 7000 candidate choices within a 5-dimensional parameter space, covering the number of executors, as well as the number of CPU cores and memory (RAM) allocation for the driver and the executor pods. Empirical measurements show that Accordia can find a near-cost-optimal configuration for a recurring job (i.e. within 10% of the optimal cost) with fewer than 20 runs, which translates to a 2X-speedup and a 20.9% cost-savings, when comparing to CherryPick. To highlight Accordia's capability to handle abrupt/unexpected changes of the characteristics of a recurring job, we even dynamically switch the type of a recurring job (without notifying Accordia) over exponentially-distributed time-intervals. Under such cases, Accordia can still achieve on average a cost-savings of 18.4% over CherryPick. The full technical report is available at http://mobitec.ie.cuhk.edu.hk/cloudComputing/Accordia.pdf.
更多查看译文
关键词
Big data analytics, Cloud configuration, Gaussian-Process UCB, Kubernetes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络