Error Checking and Snapshot-Based Recovery in a Preconditioned Conjugate Gradient Solver

semanticscholar(2013)

引用 9|浏览1
暂无评分
摘要
Soft errors are a significant concern for highperformance computing systems in the exascale time frame. We apply our group’s Global View Resilience (GVR) library to a preconditioned conjugate gradient solver, evaluating perdata-structure snapshots and varied error detection approaches to tolerate soft errors. Using 14 real-world matrices from the University of Florida Sparse Matrix Collection, we use error-injection to assess the viability of several detection and correction schemes. These studies show: 1) though inexpensive,residual-based detection performs poorly. To achieve acceptably low false negative rates, much higher (20x) false positives rates are required. 2) though more expensive, algorithm-based detection performs better overall, achieving much lower false negative rates at one fifth the false positive rate. Even this “expensive” error detection is inexpensive compared to a single iteration, and therefore is viable for linear solvers—particularly in high faultrate systems. Keywords-fault-tolerant computing; high-performance computing; numerical computing
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要