Getting More For Less In Optimized Mapreduce Workflows

Integrated Network Management(2013)

引用 3|浏览23
暂无评分
摘要
Many companies are piloting the use of Hadoop for advanced data analytics over large datasets. Typically, such MapReduce programs represent workflows of MapReduce jobs. Currently, a user must specify the number of reduce tasks for each MapReduce job. The choice of the right number of reduce tasks is non-trivial and depends on the cluster size, input dataset of the job, and the amount of resources available for processing this job. In the workflow of MapReduce jobs, the output of one job becomes the input of the next job, and therefore the number of reduce tasks in the previous job may impact the performance and processing efficiency of the next job. In this work, 1 we offer a novel performance evaluation framework for easing the user efforts of tuning the reduce task settings while achieving performance objectives. The proposed framework is based on two performance models: a platform performance model and a workflow performance model. A platform performance model characterizes the execution time of each generic phase in the MapReduce processing pipeline as a function of processed data. The complementary workflow performance model evaluates the completion time of a given workflow as a function of i) input dataset size(s) and ii) the reduce tasks' settings in the jobs that comprise a given workflow. We validate the accuracy, effectiveness, and performance benefits of the proposed framework using a set of realistic MapReduce applications and queries from the TPC-H benchmark.
更多
查看译文
关键词
data analysis,parallel programming,pipeline processing,software performance evaluation,task analysis,workflow management software,Hadoop,MapReduce job workflow,MapReduce processing pipeline,MapReduce programs,MapReduce queries,TPC-H benchmark,advanced data analytics,cluster size,complementary workflow performance model,generic phase,input dataset,input dataset size,job processing,optimized MapReduce workflows,performance evaluation framework,performance impact,platform performance model,processing efficiency,task reduction,workflow completion time evaluation,workflow performance model,
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要