Pfimbi: Accelerating big data jobs through flow-controlled data replication

2016 32nd Symposium on Mass Storage Systems and Technologies (MSST)(2016)

引用 4|浏览101
暂无评分
摘要
The performance of HDFS is critical to big data software stacks and has been at the forefront of recent efforts from the industry and the open source community. A key problem is the lack of flexibility in how data replication is performed. To address this problem, this paper presents Pfimbi, the first alternative to HDFS that supports both synchronous and flow-controlled asynchronous data replication. Pfimbi has numerous benefits: It accelerates jobs, exploits under-utilized storage I/O bandwidth, and supports hierarchical storage I/O bandwidth allocation policies. We demonstrate that for a job trace derived from a Facebook workload, Pfimbi improves the average job runtime by 18% and by up to 46% in the best case. We also demonstrate that flow control is crucial to fully exploiting the benefits of asynchronous replication; removing Pfimbi's flow control mechanisms resulted in a 2.7× increase in job runtime.
更多
查看译文
关键词
Pfimbi,Big Data job acceleration,HDFS,Big Data software stacks,open source community,synchronous data replication,flow-controlled asynchronous data replication,under-utilized storage I/O bandwidth,hierarchical storage I/O bandwidth allocation policies,Facebook workload,job runtime,Hadoop distributed file system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要