OpenSHMEM over MPI as a Performance Contender: Thorough Analysis and Optimizations

OPENSHMEM AND RELATED TECHNOLOGIES: OPENSHMEM IN THE ERA OF EXASCALE AND SMART NETWORKS(2022)

引用 1|浏览10
暂无评分
摘要
OpenSHMEM is a Partitioned Global Address Space (PGAS) style programming model for one-sided scalable communication over distributed-memory systems. The community has always focused on high levels of performance for specific communication operations such as RMA, atomics, and collectives and encourages native implementations directly porting onto each network hardware in order to pursue minimal instructions from the application to the network hardware. OSHMPI is an OpenSHMEM implementation on top of MPI, which aims to provide portable support of the OpenSHMEM communication over mainstream HPC systems. Because of the generalized functionality of MPI, however, OSHMPI incurs heavy software overheads in the performance-critical path. Why does OpenSHMEM over MPI not perform well? In order to answer this question, this paper provides an in-depth analysis of the software overheads of the OSHMPI performance-critical path, from the aspects of both the semantics and the library implementation. We also present various optimizations in the MPI and OSHMPI implementations while maintaining the full MPI functionality. For remaining performance overheads that fundamentally cannot be avoided based on the MPI-3.1 standard, we recommend extensions to the MPI standard to provide efficient support for OpenSHMEM-like PGAS programming models. We evaluate the optimized OSHMPI by comparing with the native implementation of OpenSHMEM on an Intel Broadwell cluster with the Omni-Path interconnect. The evaluation results demonstrate that the optimized OSHMPI/MPI environment can deliver performance similar to that of the native implementation.
更多
查看译文
关键词
mpi,performance,optimizations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要