At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads

Jens Domke,Emil Vatai,Balazs Gerofi,Yuetsu Kodama,Mohamed Wahib,Artur Podobas,Sparsh Mittal,Miquel Pericas,Lingqi Zhang,Peng Chen,Aleksandr Drozd,Satoshi Matsuoka

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION（2023）

引用 0|浏览0

暂无评分

摘要

Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method oblivious to the memory subsystem to gauge the upper-bound in performance improvements when data movement costs are eliminated. Then, using the gem5 simulator, we model two variants of a hypothetical LARge Cache processor (LARC), fabricated in 1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of experiments involving a broad set of proxy-applications and benchmarks, we aim to reveal how HPC CPU performance will evolve, and conclude an average boost of 9.56x for cache-sensitive HPC applications, on a per-chip basis. Additionally, we exhaustively document our methodological exploration to motivate HPC centers to drive their own technological agenda through enhanced co-design.

查看译文

关键词

hpc workloads,cache,performance,d-stacked

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要