Simulation Framework for Studying Optical Cable Failures in Dragonfly Topologies

Tiffany Connors,Taylor L. Groves, Tony Quan,K. Scott Hemmert

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)(2019)

引用 7|浏览27
暂无评分
摘要
In high performance computing (HPC) systems, optical network links are often utilized for the HPC networks of these systems, but they have a relatively high rate of failure compared to their electrical counterparts. Due to the high link failure rate, evaluating the impact of these failures on HPC workloads is of particular interest. We extended the Merlin network module of the Structural Simulation Toolkit (SST) in order to evaluate the impact of link failures on applications running on HPC systems which use dragonfly network topologies.We focus on dragonfly topologies as they are frequently found in HPC systems, including NERSC Cori and Edison systems.We demonstrate our changes to SST by providing a sample of performance results and routing statistics for a dragonfly network of 8,192 nodes and three HPC workloads with 1% of optical link failures. For the three motifs under consideration, we show that the impact of link failure is largely dependent on the underlying workloads running on the system.
更多
查看译文
关键词
Networks,Resilience,HPC,Dragonfly,Optical Networks,Simulation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要