Musketeer: all for one, one for all in data processing systems.

EuroSys(2015)

引用 162|浏览256
暂无评分
摘要
Many systems for the parallel processing of big data are available today. Yet, few users can tell by intuition which system, or combination of systems, is \"best\" for a given workflow. Porting workflows between systems is tedious. Hence, users become \"locked in\", despite faster or more efficient systems being available. This is a direct consequence of the tight coupling between user-facing front-ends that express workflows (e.g., Hive, SparkSQL, Lindi, GraphLINQ) and the back-end execution engines that run them (e.g., MapReduce, Spark, PowerGraph, Naiad). We argue that the ways that workflows are defined should be decoupled from the manner in which they are executed. To explore this idea, we have built Musketeer, a workflow manager which can dynamically map front-end workflow descriptions to a broad range of back-end execution engines. Our prototype maps workflows expressed in four high-level query languages to seven different popular data processing systems. Musketeer speeds up realistic workflows by up to 9x by targeting different execution engines, without requiring any manual effort. Its automatically generated back-end code comes within 5%--30% of the performance of hand-optimized implementations.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要