ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

arxiv(2023)

引用 1|浏览31
暂无评分
摘要
Partitioning applications between near-data processing (NDP) and host CPU cores causes inter-segment data movement overhead, which is caused by moving data generated by one segment (e.g., instructions, functions) and used in other consecutive segments. Prior works take two approaches to this problem. The first approach maps segments to NDP or host cores based on the properties of each segment, neglecting the inter-segment data movement overhead. The second approach partitions applications based on the overall memory bandwidth savings, and does not offload each segment to the best-fitting core if they incur high inter-segment data movement. We show that 1) mapping each segment to its best-fitting core ideally can provide substantial benefits, and 2) the inter-segment data movement reduces this benefit significantly. We introduce ALP, a new programmer-transparent technique to alleviate the inter-segment data movement overhead between host and memory in NDP systems. ALP proactively and accurately transfers the required data between the segments based on the key observation that the instructions that generate the inter-segment data stay the same across different executions of a program. ALP uses a compiler pass to identify these instructions and uses specialized hardware to transfer their produced data at runtime. We evaluate ALP across a wide range of workloads and demonstrate 54.3% and 45.4% average speedup over CPU-only and NDP-only executions, respectively.
更多
查看译文
关键词
Near-data processing,inter-segment data movement,application partitioning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络