Quartet: Harmonizing Task Scheduling and Caching for Cluster Computing.

Francis Deslauriers, Peter McCormick,George Amvrosiadis,Ashvin Goel,Angela Demke Brown

HotStorage'16: Proceedings of the 8th USENIX Conference on Hot Topics in Storage and File Systems(2016)

引用 0|浏览68
暂无评分
摘要
Cluster computing frameworks such as Apache Hadoop and Apache Spark are commonly used to analyze large data sets. The analysis often involves running multiple, similar queries on the same data sets. This data reuse should improve query performance, but we find that these frameworks schedule query tasks independently of each other and are thus unable to exploit the data sharing across these tasks. We present Quartet, a system that leverages information on cached data to schedule together tasks that share data. Our preliminary results are promising, showing that Quartet can increase the cache hit rate of Hadoop and Spark jobs by up to 54%. Our results suggest a shift in the way we think about job and task scheduling today, as Quartet is expected to perform better as more jobs are dispatched on the same data.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要