A Process Management Runtime with Dynamic Reconfiguration.

HPC Asia Workshops(2022)

引用 0|浏览3
暂无评分
摘要
This paper proposes DyProReconf, a system runtime that can dynamically change the number of processes. By coordinating the system software and this DyProReconf runtime system during system operation, it is possible to flexibly change the system configuration according to the amount of power used, and to execute priority jobs even when Urgernt Computing is executed. DyProReconf allows users to dynamically modify a large number of processes from external input by using user level checkpoint/restart programs and ULFM(User Level Fault Mitigation) for user process failure. We implemented DyProReconf with a fault injection mechanism by using ULFM-enabled Open MPI and applied to pHEAT-3D application, 3D unsteady-state heat transfer problems with the finite element method (FEM) using iterative linear solvers. The results of evaluation show that DyProReconf easily applied to pHEAT-3D, and the U-pHEAT-3D, pHEAT-3D with DyProReconf, can dynamically change the number of processes, and continues the calculation for injected process failures.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要