Exploiting cloud heterogeneity for optimized cost/performance MapReduce processing.

EUROSYS(2014)

引用 21|浏览24
暂无评分
摘要
ABSTRACTCloud computing enables a user to quickly provision any size Hadoop cluster, execute a given MapReduce workload, and then pay for the time the resources were used. Typically, there is a choice of different types of VM instances in the Cloud (e.g., small, medium, or large EC2 instances). The capacity differences of the offered VMs are reflected in VM's pricing. Therefore, for the same price a user can get a variety of Hadoop clusters based on different VM instance types. We observe that performance of MapReduce applications may vary significantly on different platforms. This makes a selection of the best cost/performance platform for a given workload a non-trivial problem, especially when different jobs exhibit different platform preferences. In this work, we aim to solve the following problem: given a completion time target for a set of MapReduce jobs, determine a homogeneous or heterogeneous Hadoop cluster configuration (i.e., the number, types of VMs, and the job schedule) for processing these jobs within a given dead-line while minimizing the rented infrastructure cost. We offer a simulation-based framework for solving this problem. Our evaluation study and experiments with Amazon EC2 platform reveal that for different workload mixes, an optimized platform choice may result in 41-67% cost savings for achieving the same performance objectives when using different (but seemingly equivalent) choices. Moreover, depending on a workload the heterogeneous cluster solution may outperform the homogeneous one by 26-42%. The results of our simulation study are validated through experiments with Hadoop clusters deployed on Amazon EC2 instances.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要