NOMAD: Enabling Non-blocking OS-managed DRAM Cache via Tag-Data Decoupling.

HPCA(2023)

引用 0|浏览0
暂无评分
摘要
This paper introduces a DRAM cache architecture that provides near-ideal access time and non-blocking miss handling. Previous DRAM cache (DC) designs are classified into two categories, HW-based and OS-managed schemes. Hardware-based designs implement non-blocking caches that can handle multiple DC misses using MSHRs, but they have drawbacks in metadata management since storing tags in on-package DRAM significantly increases the effective cycle time of DC accesses. In contrast, OS-managed schemes utilize PTEs for storing tags and caching them in TLBs, which can achieve ideal DC access time. However, they implement blocking caches that stall application threads on misses until cache fills are completed. To overcome the limitations of both HW-based and OS-managed schemes, this paper introduces a DRAM cache architecture named Non-blocking OS-managed DRAM cache (NOMAD). Unlike conventional caches that guarantee the presence of data on tag hits, NOMAD decouples tag and data management to enable non-blocking miss handling in an OS-managed DRAM cache. The front-end OS routines of NOMAD manage DC tags using PTEs and TLBs, and its back-end hardware handles data management in the DRAM cache. On a DC miss, the OS updates a tag, offloads a cache-fill command to the back-end, and immediately resumes an application thread without waiting for the cache fill to complete. Instead, the back-end hardware handles the cache fill without blocking the application thread. By decoupling tag and data management in NOMAD, a tag hit does not necessarily guarantee the presence of data in the DRAM cache. The back-end traces which DC lines are still in transfers and checks if the demanded part of a cache line has been transferred yet for every DC access. Notably, this back-end procedure does not require an OS intervention, thereby implementing a non-blocking DRAM cache. Experiment results show that NOMAD reduces application stall cycles by 76.1% and improves IPC by 16.7% over a state-of-the-art OS-managed scheme.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要