Efficient data redistribution for malleable applications.

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis(2023)

引用 0|浏览1
暂无评分
摘要
Process malleability can be defined as the ability of a distributed MPI parallel job to change the number of processes on–the–fly without stopping its execution, reallocating the compute resources originally assigned to the job, and without storing application data to disk. MPI malleability consists of four stages: resource reallocation, process management, data redistribution and execution resuming. Among them, data redistribution is the most time-consuming and determines the reconfiguration time. In this paper, we compare different implementations of this stage using point-to-point and collective MPI operations, and discuss the impact of overlapping computation-communication. We then combine these strategies with different methods to expand/shrink jobs, using a synthetic application to emulate MPI-based codes and their malleable counterparts, in order to evaluate the effect of different malleability methods in parallel distributed applications. The results show that the use of asynchronous techniques speeds up execution by 1.14 and 1.21, depending on the network used.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要