Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems

SC '12 Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis(2012)

引用 122|浏览0
暂无评分
摘要
This paper describes and evaluates a scalable and efficient resilience scheme based on the concept of containment domains. Containment domains are a programming construct that enable applications to express resilience needs and to interact with the system to tune and specialize error detection, state preservation and restoration, and recovery schemes. Containment domains have weak transactional semantics and are nested to take advantage of the machine and application hierarchies and to enable hierarchical state preservation, restoration, and recovery. We evaluate the scalability and efficiency of containment domains using generalized trace-driven simulation and analytical analysis and show that containment domains are superior to both checkpoint restart and redundant execution approaches.
更多
查看译文
关键词
hierarchical state restoration,application hierarchies,state preservation,exascale system,application program interfaces,containment domain,containment domain efficiency,analytical analysis,error detection,generalized trace-driven simulation,transactional semantics,software performance evaluation,programming language semantics,programming construct,scalable resilience scheme,system recovery,hierarchical state preservation,application hierarchy,exascale systems,hierarchical state recovery,efficient resilience scheme,resilience need,flexible resilience scheme,recovery scheme,quantum chemistry,heuristic algorithm,optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要