Optimizing Interactive Development of Data-Intensive Applications.

SoCC '16: ACM Symposium on Cloud Computing Santa Clara CA USA October, 2016(2016)

引用 10|浏览113
暂无评分
摘要
Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. Vega is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage Vega to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications.
更多
查看译文
关键词
Big Data,H.2.4 [Information Systems]: Database Management—query processing,Incremental Evaluation,Interactive Development,Languages,Performance,Query Rewriting,Spark,Theory,parallel databases
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要