Push-Based Network-efficient Hadoop YARN Scheduling Mechanism for In-Memory Computing

2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS)(2019)

引用 0|浏览17
暂无评分
摘要
In the big data era, data-intensive cluster computing systems like Hadoop, have gained much popularity, and YARN, the second generation of Hadoop becomes the general resource manager in the Hadoop ecosystem. In the distributed computing scenarios, data locality (scheduling tasks on where the data resides) is essential to the performance since higher data locality brings lower network transmission cost and higher throughput. However, we find that the native YARN scheduling mechanism has little data locality and the delay scheduling strategy leads to the long-tail effect while achieving data locality for in-memory computing scenarios. Therefore, in this paper we propose the push-based YARN scheduling mechanism for the in-memory computing environment. First, we classify the Resource Requests into various categories. Then, we prune the non-local Resource Requests to achieve fast datalocality in-memory computation. Finally, we push the left longtail Resource Requests to the data-locality nodes to avoid the long-tail effect. The experimental results demonstrate that the proposed scheduling mechanism achieves nearly 100% datalocality percentage comparing to the native YARN scheduling mechanism that only achieves 10% 20% data-locality percentage. Under the identical data-locality percentage, the proposed push based scheduling mechanism promotes nearly 20% throughput and reduces nearly 10% application running time comparing to the existing delay scheduling mechanism used in YARN.
更多
查看译文
关键词
Scheduling,Data locality,Hadoop YARN,In memory computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要