Building Pipelines for Heterogeneous Execution Environments for Big Data Processing.

IEEE Software(2016)

引用 49|浏览40
暂无评分
摘要
Many real-world data analysis scenarios require pipelining and integration of multiple (big) data-processing and data-analytics jobs, which often execute in heterogeneous environments, such as MapReduce; Spark; or R, Python, or Bash scripts. Such a pipeline requires much glue code to get data across environments. Maintaining and evolving these pipelines are difficult. Pipeline frameworks that try to solve such problems are usually built on top of a single environment. They might require rewriting the original job to take into account a new API or paradigm. The Pipeline61 framework supports the building of data pipelines involving heterogeneous execution environments. It reuses the existing code of the deployed jobs in different environments and provides version control and dependency management that deals with typical software engineering issues. A real-world case study shows its effectiveness. This article is part of a special issue on Software Engineering for Big Data Systems.
更多
查看译文
关键词
Big data,Pipeline processing,Software engineering,Context modeling,Data analysis,Programming,Software development
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要