X-cache: a modular architecture for domain-specific caches

ISCA: International Symposium on Computer Architecture(2022)

引用 2|浏览21
暂无评分
摘要
With Dennard scaling ending, architects are turning to domain-specific accelerators (DSAs). State-of-the-art DSAs work with sparse data [37] and indirectly-indexed data structures [18, 30]. They introduce non-affine and dynamic memory accesses [7, 35], and require domain-specific caches. Unfortunately, cache controllers are notorious for being difficult to architect; domain-specialization compounds the problem. DSA caches need to support custom tags, data-structure walks, multiple refills, and preloading. Prior DSAs include ad-hoc cache structures, and do not implement the cache controller. We propose X-Cache, a reusable caching idiom for DSAs. We will be open-sourcing a toolchain for both generating the RTL and programming X-Cache. There are three key ideas: i) DSA-specific Tags (Meta-tag) : The designer can use any combination of fields from the DSA-metadata as the tag. Meta-tags eliminate the overhead of walking and translating metadata to global addresses. This saves energy, and improves load-to-use latency. ii) DSA-programmable walkers (X-Actions) : We find that a common set of microcode actions can be used to implement the DSA-specific walking, data block, and tag management. We develop a programmable microcode engine that can efficiently realize the data orchestration. iii) DSA-portable controller (X-Routines) : We use a portable abstraction, coroutines, to let the designer express walking and orchestration. Coroutines capture the block-level parallelism, remain lightweight, and minimize controller occupancy. We create caches for four different DSA families: Sparse GEMM [35, 37], GraphPulse [30], DASX [22], and Widx [18]. X-Cache outperforms address-based caches by 1.7 × and remains competitive with hardwired DSAs (even 50% improvement in one case). We demonstrate that meta-tags save 26--79% energy compared to address-tags. In X-Cache , meta-tags consume 1.5--6.5% of data RAM energy and the programmable microcode adds a further 7%.
更多
查看译文
关键词
Domain-specific Architectures, Caches, Coroutines, Dataflow architectures
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要