Fast Recovery MapReduce (FAR-MR) to accelerate failure recovery in big data applications

Yongqing Zhu,Juniarto Samsudin,Renuga Kanagavelu,Weiwen Zhang,Long Wang,Theint Theint Aye,Rick Siow Mong Goh

The Journal of Supercomputing（2018）

引用 7|浏览31

暂无评分

摘要

Existing Hadoop MapReduce fault tolerance strategy causes the computing jobs suffering from high performance penalty during failure recovery. In this paper, we propose Fast Recovery MapReduce (FAR-MR) to improve MapReduce performance in failure recovery. FAR-MR includes a novel fault tolerance strategy that combines distributed checkpointing and proactive push mechanism to support fast recovery from task failure and node failure. With distributed checkpointing, computing progress of each task is recorded as checkpoints periodically and kept in distributed data storage. The recovered task can obtain the last progress of the failed task from the distributed storage during failure recovery. In addition, the proactive push mechanism enables the computing results of map tasks to be proactively transmitted to the nodes hosting reduce tasks of the same computing job. When a failure happens, the partial output results being pushed to the reducer nodes can be used by the reduce tasks without the necessity of re-compute. FAR-MR allows a failed task to be recovered efficiently at any node in the cluster. The performance evaluation has shown that the proposed FAR-MR can improve computing job performance by up to 62% and 45% compared to Hadoop MapReduce in the case of task failure recovery and node failure recovery, respectively.

查看译文

关键词

Parallel computing, MapReduce, Fault tolerance, Checkpointing, Big data

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要