Towards a portable hierarchical view of distributed shared memory systems: challenges and solutions

PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming San Diego California February, 2020(2020)

引用 4|浏览18
暂无评分
摘要
An ever-growing diversity in the architecture of modern super-computers has led to challenges in developing scientific software. Utilizing heterogeneous and disruptive architectures (e.g., off-chip and, in the near future, on-chip accelerators) has increased the software complexity and worsened its maintainability. To that end, we need a productive software ecosystem that improves the usability and portability of applications for such systems while allowing every parallelism opportunity to be exploited. In this paper, we outline several challenges that we encountered in the implementation of Gecko, a hierarchical model for distributed shared memory architectures, using a directive-based programming model, and discuss our solutions. Such challenges include: 1) inferred kernel execution with respect to the data placement, 2) workload distribution, 3) hierarchy maintenance, and 4) memory management. We performed the experimental evaluation of our implementation by using the Stream and Rodinia benchmarks. These benchmarks represent several major scientific software applications commonly used by the domain scientists. Our results reveal how the Stream benchmark reaches a sustainable bandwidth of 80 GB/s and 1.8 TB/s for single Intel Xeon Processor and four NVIDIA V100 GPUs, respectively. Additionally, the srad_v2 in the Rodinia benchmark reaches the 88% speedup efficiency while using four GPUs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要