Dependency-Aware Data Locality For Mapreduce

IEEE CLOUD(2018)

引用 27|浏览17
暂无评分
摘要
MapReduce effectively partitions and distributes computation workloads to a cluster of servers, facilitating today's big data processing. Given the massive data to be dispatched, and the intermediate results to be collected and aggregated, there have been a significant studies on data locality that seeks to co-locate computation with data, so as to reduce cross-server traffic in MapReduce. They generally assume that the input data have little dependency with each other, which however is not necessarily true for that of many real-world applications, and we show strong evidence that the finishing time of MapReduce tasks can be greatly prolonged with such data dependency. In this paper, we present Dependency-Aware Locality for MapReduce (DALM) for processing the real-world input data that can be highly skewed and dependent. DALM accommodates data-dependency in a data-locality framework, organically synthesizing the key components from data reorganization, replication, placement. Beside algorithmic design within the framework, we have also closely examined the deployment challenges, particularly in public virtualized cloud environments, and have implemented DALM on Hadoop 1.2.1 with Giraph 1.0.0. Its performance has been evaluated through both simulations and real-world experiments, and compared with that of state-of-the-art solutions.
更多
查看译文
关键词
MapReduce,data locality,data dependency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要