Regression Testing on Shaheen Cray XC 40 : Implementation and Lessons Learned

semanticscholar(2017)

引用 1|浏览0
暂无评分
摘要
Leadership-class supercomputers are becoming larger and more complex tightly integrated systems consisting of many different hardware components, tens of thousands of processors and memory chips, kilometers of networking cables, large numbers of disks, and hundreds of applications and libraries. To increase scientific productivity and ensure that applications efficiently and effectively exploit a system’s full potential, all the components must deliver reliable, stable, and performant service. Therefore, to deliver the best computing environment to our users, system performance assessments are critical, especially after an unplanned downtime or any scheduled maintenance session. This paper describes the design and implementation of the regression testing methodology used on the Shaheen2 XC40 to detect and track issues related to the performance and functionality of compute nodes, storage, network, and programming environment. We also present an analysis of the results over 24 months, along with the lessons learned.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要