A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

2019 Symposium on VLSI Technology(2019)

引用 21|浏览267
暂无评分
摘要
A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The $2.0\ \text{mm}\times 2.6\ \text{mm}$ chip exhibits $12.6\times(8.4\times)$ energy efficiency gain, $11.7\times(77.6\times)$ off-chip bandwidth efficiency gain and $17.1\times(36.9\times)$ compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.
更多
查看译文
关键词
Sparse matrix multiplier,synthesizable crossbar,decoupled access-execution,reconfigurablility and accelerator
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要