Soft fault detection and correction for multigrid

Periodicals(2018)

引用 11|浏览25
暂无评分
摘要
AbstractWe introduce a novel algorithm-based fault-tolerance scheme to detect and repair soft transient faults silent data corruption, bitflips in multigrid solvers: by applying the full approximation scheme FAS variant of multigrid to linear systems, we prove invariants that enable fault detection and correction, and ultimately lead to a black-box protection of the smoothing stage. A statistical analysis for a wide range of prototypical problems demonstrates the efficiency of our approach, especially compared with full checksum protection. In particular, the overhead of our new method is negligible in the fault-free case, since we only employ readily available quantities.
更多
查看译文
关键词
Fault tolerance, resilience, robust multigrid, robust iterative solvers, high-performance computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要