A Framework for Performance Analysis and Tuning in Hadoop Based Clusters

semanticscholar(2014)

引用 1|浏览5
暂无评分
摘要
Big Data computing platforms such as MapReduce frameworks is foraying into the domain of high performance computing with stringent non-functional requirements namely execution times and throughputs. Over the last couple of years, several hundreds of sequential programs in various domains like biological informatics, health-care and financial domains have been converted into parallel paradigms. Movement of such time sensitive application will harden the problem of optimal resource utilization on the MapReduce frameworks. Traditional scheduling have been predominantly handling similar workflow with pre-defined non-functional requirements on diverse set of resources. Thanks to the Hadoop which provides us with flexibility of varying various parameters according to our choice but this facility proves to be the main bottleneck as configuring too many parameters with a perfect balance between all of them to get the best result is a time consuming and a challenging job. In our work,we attempt to analyze the effect of various configuration parameters on Hadoop Map-Reduce performance under various conditions, to achieve maximum throughput. Using these methodologies we have been able to achieve performance improvements. We study through extensive experiments, the impact of various configuration parameters and suggest an optimal value in each case.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要