HBM3 RAS: Enhancing Resilience at Scale

IEEE Computer Architecture Letters(2021)

引用 10|浏览25
暂无评分
摘要
HBM3 is the next-generation technology in the JEDEC High Bandwidth Memory™ die-stacked DRAM standard. HBM3 is expected to be widely used in future SoCs to accelerate data center and automotive workloads. Reliability, Availability, and Serviceability (RAS) are key requirements in most of these computing domains and use cases. Memory reliability is especially key to attaining resilience at scale. This paper presents the RAS challenges facing HBM3 and how they are addressed by a novel memory RAS architecture that is now part of the HBM3 standard. The paper shows how this novel HBM3 RAS architecture can reduce the uncorrected memory error rate by 7X compared to HBM2 in future large-scale systems for assumed DRAM fault rates and modes . HBM3 also provides architected metadata to further enhance RAS or enable innovations in memory system design.
更多
查看译文
关键词
B.3.1.a DRAM,B.3.4.b Error-checking,B.3.4.a Diagnostics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要