Work-in-Progress: NoRF: A Case Against Register File Operands in Tightly-Coupled Accelerators

2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)(2022)

引用 0|浏览30
暂无评分
摘要
Accelerators are often used to increase performance and/or energy efficiency of general-purpose CPUs. However, Tightly-Coupled Accelerators (TCAs) often perform computations on data structures that may not be a natural fit for general-purpose registers. The designer can either use the existing register file (RF), a RF tailored for the accelerator, or eschew a RF entirely (NoRF), accessing operands directly from the memory hierarchy. Designers for embedded and edge devices are particularly conscientious towards energy-efficient compute and data transfer. We explore the possibility of mini-DGEMM accelerators (example TCAs) within the context of CPUs and edge devices, which also have increasing applications for DGEMM compute. At a high level, register files help reduce memory accesses (steps 1, 2, 5, and 6 in Figure 1 ) when the compiler finds reuse of operands in the program dataflow. On the other hand, direct memory access simplifies the data movement by completely eliminating the intermediate reads and writes to a register file but issues more memory requests. This paper evaluates the difference between these options of operand delivery. Figure 2 shows that all recent vector extensions use a register file implementation. By this trend, it may seem natural to incorporate mini-matrices into the RF. However, we present quantitative and qualitative evidence to advocate for direct cache access for operands.
更多
查看译文
关键词
Tightly-Coupled Accelerators,data structures,natural fit,general-purpose registers,existing register file,RF,NoRF,memory hierarchy,embedded edge devices,energy-efficient compute,data transfer,mini-DGEMM accelerators,example TCAs,DGEMM compute,memory accesses,direct memory access,data movement,memory requests,operand delivery,register file implementation,direct cache access,register file operands,energy efficiency,general-purpose CPUs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要