Checkpoint/Restart Approaches for a Thread-Based MPI Runtime

Parallel Computing(2019)

引用 10|浏览98
暂无评分
摘要
•Transparent checkpoint restart can be applied to high-speed networks with collaboration from the MPI runtime (particularly network modularity).•Thread-based MPI runtimes can be checkpointed both transparently and at application-level without blocking difficulties when compared to their process-based counterpart.•We introduce an asynchronous checkpointing interface for transparent checkpointing.
更多
查看译文
关键词
Checkpoint-restart,Fault-tolerance,DMTCP,Infiniband,Multilevel checkpointing,MPI oversubscribing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要