Decreasing the Learning Cost of Offline Parallel Application Optimization Strategies

Gustavo Berned,Fábio Diniz Rossi,Marcelo Caggiani Luizelli,Antonio Carlos Schneider Beck,Arthur Francisco Lorenzon

2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)（2020）

引用 1|浏览9

暂无评分

摘要

Many parallel applications do not scale as the number of threads increases, which means that executing them with the maximum possible number of threads will not always deliver the best outcome in performance, energy consumption, or the tradeoff between both (represented by the energy-delay product- EDP). Given that, several strategies, online and offline, have already been proposed to rightly tune the number of threads according to the application. While the former can capture some behaviors that can only be known at runtime, the latter do not impose any execution overhead and can use more efficient and costly algorithms. However, these learning algorithms in static strategics may take several hours, precluding their use or a smooth migration across different systems. In this scenario, we propose a generic methodology for such offline strategies to significantly decrease the learning time by inferring the execution behavior of parallel applications using smaller input sets than the ones used by the target applications. Through the execution of eighteen well-known benchmarks on two multicore processors, we show that our methodology is capable of converging to results that are very close to those that use the regular input set, but converging 84.7% faster, on average. We also show that such a strategy delivers better results than a dynamic one, presenting an EDP 7.7% lower, on average, when executing the applications with the number of threads found during learning. Finally, we also compare our learning methodology with an exhaustive search. It has an average learning cost (i.e., the time spent by our search algorithm to find the best configuration) of only 3.1% to optimize the EDP of the entire benchmark set ¹ .

查看译文

关键词

Parallel Computing,runtime optimization systems,thread-level parallelism exploitation,energy-delay product

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要