Spark deployment and performance evaluation on the MareNostrum supercomputer
Big Data(2015)
摘要
In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a petascale supercomputer designed mainly for compute-intensive applications. As far as we know, this is the first attempt to investigate optimized deployment configurations of Spark on a petascale HPC setup. We detail the design of the framework and present some benchmark data to provide insights into the scalabilityof the system. We examine the impact of different configurations including parallelism, storage and networking alternatives, and we discuss several aspects in executing Big Data workloads on a computing system that is based on the compute-centric paradigm. Further, we derive conclusions aiming to pave the way towards systematic and optimized methodologies for fine-tuning data-intensive application on large clusters emphasizing on parallelism configurations.
更多查看译文
关键词
parallelism configurations,compute-centric paradigm,big data workloads,petascale HPC setup,optimized deployment configurations,compute-intensive applications,petascale supercomputer,data-intensive Spark workloads,MareNostrum supercomputer,performance evaluation,Spark deployment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络