PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
CoRR(2024)
摘要
Recent dual in-line memory modules (DIMMs) are starting to support
processing-in-memory (PIM) by associating their memory banks with processing
elements (PEs), allowing applications to overcome the data movement bottleneck
by offloading memory-intensive operations to the PEs. Many highly parallel
applications have been shown to benefit from these PIM-enabled DIMMs, but
further speedup is often limited by the huge overhead of inter-PE
communication. This mainly comes from the slow CPU-mediated inter-PE
communication methods which incurs significant performance overheads, making it
difficult for PIM-enabled DIMMs to accelerate a wider range of applications.
Prior studies have tried to alleviate the communication bottleneck, but they
lack enough flexibility and performance to be used for a wide range of
applications. In this paper, we present PID-Comm, a fast and flexible
collective inter-PE communication framework for commodity PIM-enabled DIMMs.
The key idea of PID-Comm is to abstract the PEs as a multi-dimensional
hypercube and allow multiple instances of collective inter-PE communication
between the PEs belonging to certain dimensions of the hypercube. Leveraging
this abstraction, PID-Comm first defines eight collective inter-PE
communication patterns that allow applications to easily express their complex
communication patterns. Then, PID-Comm provides high-performance
implementations of the collective inter-PE communication patterns optimized for
the DIMMs. Our evaluation using 16 UPMEM DIMMs and representative parallel
algorithms shows that PID-Comm greatly improves the performance by up to 4.20x
compared to the existing inter-PE communication implementations. The
implementation of PID-Comm is available at https://github.com/AIS-SNU/PID-Comm.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要