Using Small-Scale History Data To Predict Large-Scale Performance Of Hpc Application

Wenju Zhou,Jiepeng Zhang,Jingwei Sun,Guangzhong Sun

2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020)（2020）

引用 1|浏览52

暂无评分

摘要

Performance modeling is an important problem in high-performance computing (HPC). Machine Learning (ML) is a powerful approach for HPC performance modeling. ML can learn complex relations between application parameters and the performance of HPC applications from historical execution data. However, extrapolation of large-scale performance with only small-scale execution data using ML is difficult, because the independent and identically distributed hypothesis (the basic hypothesis of most ML algorithms) does not hold in this situation. To solve the extrapolation problem, we propose a two-level model consisting of interpolation level and extrapolation level. The interpolation level predicts small-scale performance with small-scale execution. The extrapolation level predicts the large-scale performance of the fixed input parameter with its small-scale performance predictions. We use the random forest to build interpolation models to predict small-scale performance in the interpolation level. In the extrapolation level, to reduce the negative influence of interpolation errors, we employ the multi-task lasso with clustering to construct the scalability models to predict large-scale performance. To validate the utility of our two-level model, we conduct experiments on a real HPC platform. We build models for two HPC applications using our two-level model. Compare with existing ML methods, our method can achieve higher prediction accuracy.

查看译文

关键词

performance modeling, machine learning, extrapolation, multi-task learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要