Towards efficient resource management for data-analytic platforms

Integrated Network Management(2011)

引用 1|浏览16
暂无评分
摘要
We present architectural and experimental work exploring the role of intermediate data handling in the performance of MapReduce workloads. Our findings show that: (a) certain jobs are more sensitive to disk cache size than others and (b) this sensitivity is mostly due to the local file I/O for the intermediate data. We also show that a small amount of memory is sufficient for the normal needs of map workers to hold their intermediate data until it is read. We introduce Hannibal, which exploits the modesty of that need in a simple and direct way - holding the intermediate data in application-level memory for precisely the needed time - to improve performance when the disk cache is stressed. We have implemented Hannibal and show through experimental evaluation that Hannibal can make MapReduce jobs run faster than Hadoop when little memory is available to the disk cache. This provides better performance insulation between concurrent jobs.
更多
查看译文
关键词
public domain software,mapreduce,hannibal,disk,cache storage,application-level memory,mapreduce workloads,open source middleware,resource management,map-reduce,data handling,intermediate data handling,middleware,disk cache size,data-analytic platforms,performance,hadoop,reliability,resource manager
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要