Improving Scientific Workflow Performance Using Policy Based Data Placement

Policies for Distributed Systems and Networks(2012)

引用 9|浏览0
暂无评分
摘要
I/O intensive jobs such as stage-in, stage-out or data clean-up jobs account for significant time in execution of scientific workflows. Workflow managers typically add these data management operations as supporting jobs to computational tasks with scheduling emphasis on compute jobs only. We present the integration of the Pegasus Workflow Management System with a Policy Based Data Placement Service (PDPS) to reduce overall workflow execution time. Pegasus delegates all data staging jobs to PDPS, which schedules and executes stage-in jobs based on selected data placement policies and simply executes stage-out and clean-up jobs independent of the workflow execution state. We measure the impact of using PDPS with Pegasus first with the Montage workflow, and then with a synthetic workflow. We enforce two policies and demonstrate the advantage of using PDPS for asynchronous data placement for scientific workflows. Our results show that the influence of PDPS on the overall workflow runtimes is dependent on the data characteristics of the executable workflow and the data placement policy being enforced.
更多
查看译文
关键词
scientific workflows,montage workflow,selected data placement policy,improving scientific workflow performance,data placement policy,executable workflow,data clean-up job,data placement,data management operation,asynchronous data placement,overall workflow execution time,data characteristic,servers,scheduling,schedules,workflows
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要