Numa-Aware Task Performance Analysis

Lecture Notes in Computer Science(2016)

引用 3|浏览16
暂无评分
摘要
The tasking feature enriches OpenMP by a method to express parallelism in a more general way than before, as it can be applied to loops but also to recursive algorithms without the need of nested parallel regions. However, the performance of a tasking program is very much influenced by the task scheduling inside the OpenMP runtime. Especially on large NUMA systems and when tasks work on shared data structures which are split across NUMA nodes, the runtime influence is significant. For a programmer there is no easy way to examine these performance relevant decisions taken by the runtime, neither with functionality provided by OpenMP nor with external performance tools. Therefore, we will present a method based on the Score-P measurement infrastructure which allows to analyze task parallel programs on NUMA systems more deeply, allowing the user to see if tasks were executed by the creating thread or remotely on the same or a different socket. Exemplary the Intel and the GNU Compiler were used to execute the same task parallel code, where a performance difference of 8x could be observed, mainly due to task scheduling. We evaluate the presented method by investigating both execution runs and highlight the differences of the task scheduling applied.
更多
查看译文
关键词
Task Schedule, Runtime System, Work Item, Task Creation, Event Trace
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要