A Case Study and Characterization of a Many-socket, Multi-tier NUMA HPC Platform

2020 IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar)(2020)

引用 2|浏览14
暂无评分
摘要
As the number of processor cores and sockets on HPC compute nodes increase and systems expose more hierarchical non-uniform memory access (NUMA) architectures, efficiently scaling applications within even a single shared memory system is becoming more challenging. It is now common for HPC compute nodes to have two or more sockets and dozens of cores, but future generation systems may contain an order of magnitude more of each. We conduct experiments on a state-of-the-art Intel Xeon Platinum system with 12 processor sockets, totaling 288 cores (576 hardware threads), arranged in a multi-tier NUMA hierarchy. Platforms of this scale and memory hierarchy are uncommon today, providing us a unique opportunity to empirically evaluate-rather than model or simulate-an architecture potentially representative of future HPC compute nodes. We quantify the platform's multi-tier NUMA patterns, then evaluate its suitability for HPC workloads using a modern HPC metagenome assembler application as a case study, and other HPC benchmarks with a variety of parallelization techniques to characterize the system's performance, scalability, I/O patterns, and performance/power behavior. Our results demonstrate near-perfect scaling for embarrassingly parallel and weak scaling workloads, but challenges for random memory access workloads. For the latter, we find poor scaling performance with the default scheduling approaches-e.g., which do not pin threads-suggesting that userspace or kernel schedulers may require changes to better manage the multi-tier NUMA hierarchies of very large shared memory platforms.
更多
查看译文
关键词
high performance computing,performance analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要