ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems


引用 1|浏览31
Partitioning applications between near-data processing (NDP) and host CPU cores causes inter-segment data movement overhead, which is caused by moving data generated by one segment (e.g., instructions, functions) and used in other consecutive segments. Prior works take two approaches to this problem. The first approach maps segments to NDP or host cores based on the properties of each segment, neglecting the inter-segment data movement overhead. The second approach partitions applications based on the overall memory bandwidth savings, and does not offload each segment to the best-fitting core if they incur high inter-segment data movement. We show that 1) mapping each segment to its best-fitting core ideally can provide substantial benefits, and 2) the inter-segment data movement reduces this benefit significantly. We introduce ALP, a new programmer-transparent technique to alleviate the inter-segment data movement overhead between host and memory in NDP systems. ALP proactively and accurately transfers the required data between the segments based on the key observation that the instructions that generate the inter-segment data stay the same across different executions of a program. ALP uses a compiler pass to identify these instructions and uses specialized hardware to transfer their produced data at runtime. We evaluate ALP across a wide range of workloads and demonstrate 54.3% and 45.4% average speedup over CPU-only and NDP-only executions, respectively.
Near-data processing,inter-segment data movement,application partitioning
AI 理解论文