A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm
2019 Symposium on VLSI Technology(2019)
摘要
A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The
$2.0\ \text{mm}\times 2.6\ \text{mm}$
chip exhibits
$12.6\times(8.4\times)$
energy efficiency gain,
$11.7\times(77.6\times)$
off-chip bandwidth efficiency gain and
$17.1\times(36.9\times)$
compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.
更多查看译文
关键词
Sparse matrix multiplier,synthesizable crossbar,decoupled access-execution,reconfigurablility and accelerator
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要