GASPI/GPI In-memory Checkpointing Library.

Euro-Par(2017)

引用 24|浏览24
暂无评分
摘要
Fault tolerance becomes an important feature at large computer systems where the mean time between failure decreases. Checkpointing is a method often used to provide resilience. We present an in-memory checkpointing library based on a PGAS API implemented with GASPI/GPI. It offers a substantial benefit when recovering from failure and leverages existing fault tolerance features of GASPI/GPI. The overhead of the library is negligible when testing it with a simple stencil code and a real life seismic imaging method.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要