Application of Comprehensive Data Analysis for Interactive, Hierarchical Views of HPC Workloads

Matthew Dwyer, John Hwang, Alexander Shires,Jacob Cohen

2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2018)

引用 3|浏览42
暂无评分
摘要
Alongside advancements in computer related technologies, High Performance Computing (HPC) systems continue to grow in both complexity and scale. As the compute capabilities and space efficiency of these powerful machines continue to improve, there is a correlated increase in complexity which results in increased acquisition costs, node failures, and operational costs. In an effort to address these growing concerns, there have been attempts to improve cost efficiency through the use of data analysis and data monitoring. Facilities use data analysis to understand causes of degraded performance, causes of failure, and requirements for future acquisitions. This information is often obtained through ad-hoc programs. Data monitoring, in turn, is used by HPC facility managers to detect node failures in real-time and decrease downtime, thereby minimizing the impact of failures on operational costs. In this paper we present an application to ingest, store, analyze and display this data at the scale of HPC. Our approach brings monitoring, alerting, ad-hoc analysis, and exploratory analysis into a single integrated solution. Beyond providing support for existing diagnostic data, the analysis pipeline makes it simple to link diagnostic data between multiple data sources. With this linking capability and the features available in the data analysis software stack, the user is able to create interactive, hierarchical views of diagnostic data.
更多
查看译文
关键词
hierarchical views,diagnostic data,comprehensive data analysis,HPC workloads,computer related technologies,High Performance Computing systems,space efficiency,node failures,operational costs,cost efficiency,data monitoring,degraded performance,HPC facility managers,ad-hoc analysis,exploratory analysis,analysis pipeline,multiple data sources,data analysis software stack,interactive views,acquisition costs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要