Tr-Spark: Transient Computing For Big Data Analytics

Ying Yan,Yanjie Gao,Yang Chen,Zhongxin Guo,Bole Chen,Thomas Moscibroda

MOD（2016）

引用 70|浏览141

暂无评分

摘要

Large-scale public cloud providers invest billions of dollars into their cloud infrastructure and operate hundreds of thousands of servers across the globe. For various reasons, much of this provisioned server capacity runs at low average utilization, and there is tremendous competitive pressure to increase utilization. Conceptually, the way to increase utilization is clear: Run time-insensitive batch-job workloads as secondary background tasks whenever server capacity is underutilized; and evict these workloads when the server's primary task requires more resources. Big data analytic tasks would seem to be an ideal fit to run opportunistically on such transient resources in the cloud. In reality, however, modern distributed data processing systems such as MapReduce or Spark are designed to run as the primary task on dedicated hardware, and they perform badly on transiently available resources because of the excessive cost of cascading re-computations in case of evictions.In this paper, we propose a new framework for big data analytics on transient resources. Specifically, we design and implement TR-Spark, a version of Spark that can run highly efficiently as a secondary background task on transient (evictable) resources. The design of TR-Spark is based on two principles: resource stability and data size reduction-aware scheduling and lineage-aware checkpointing. The combination of these principles allows TR-Spark to naturally adapt to the stability characteristics of the underlying compute infrastructure. Evaluation results show that while regular Spark effectively fails to finish a job in clusters of even moderate instability, TR-Spark performs nearly as well as Spark running on stable resources.

查看译文

关键词

Transient computing,Spark,Checkpointing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要