OpenMP Extension for Explicit Task Allocation on NUMA Architecture.

Lecture Notes in Computer Science(2016)

引用 0|浏览45
暂无评分
摘要
Most modern HPC systems consist of a number of cores grouped into multiple NUMA nodes. The latest Intel processors have multiple NUMA nodes inside a chip. Task parallelism using OpenMP dependent tasks is a promising programming model for many-core architecture because it can exploit parallelism in irregular applications with fine-grain synchronization. However, the current specification lacks functionality to improve data locality in task parallelism. In this paper, we propose an extension for the OpenMP task construct to specify the location of tasks to exploit the locality in an explicit manner. The prototype compiler is implemented based on GCC. The performance evaluation using the KASTORS benchmark shows that our approach can reduce remote page access. The Jacobi kernel using our approach shows 3.6 times better performance than GCC when using 36 threads on a 36-core, 4-NUMA node machine.
更多
查看译文
关键词
OpenMP,Task parallelism,NUMA optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要