Fault Recovery Methods for Asynchronous Linear Solvers

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING(2020)

引用 1|浏览9
暂无评分
摘要
This study seeks to understand the soft error vulnerability of asynchronous iterative methods, with a focus on stationary iterative solvers such as Jacobi. A theoretical investigation into the performance of the asynchronous iterative methods is presented and used to motivate several fault recovery methods for asynchronous linear solvers. The numerical experiments utilize a hybrid-parallel implementation where the computational work is distributed over multiple nodes using MPI and parallelized on each node using OpenMP, and a series of runs are conducted to measure both the impact of soft faults and the effectiveness of the recovery methods. Trials are run to compare two models for simulating the occurrence of a fault as well as techniques for recovering from the effects of a fault. The results show that the proposed strategies can effectively recover from the impact of a fault and that the numerical model for simulating soft faults consistently produces fault effects that enable the investigation and tuning of recovery techniques in action.
更多
查看译文
关键词
Soft fault, Fault tolerance, Fault models, Asynchronous iteration, Linear system solver
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要