Look-Up Table based Energy Efficient Processing in Cache Support for Neural Network Acceleration
2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)(2020)
摘要
This paper presents a Look-Up Table (LUT) based Processing-In-Memory (PIM) technique with the potential for running Neural Network inference tasks. We implement a bitline computing free technique to avoid frequent bitline accesses to the cache sub-arrays and thereby considerably reducing the memory access energy overhead. LUT in conjunction with the compute engines enables sub-array level parallelism while executing complex operations through data lookup which otherwise requires multiple cycles. Sub-array level parallelism and systolic input data flow ensure data movement to be confined to the SRAM slice.Our proposed LUT based PIM methodology exploits substantial parallelism using look-up tables, which does not alter the memory structure/organization, that is, preserving the bit-cell and peripherals of the existing SRAM monolithic arrays. Our solution achieves 1.72× higher performance and 3.14x lower energy as compared to a state-of-the-art processing-in-cache solution. Sub-array level design modifications to incorporate LUT along with the compute engines will increase the overall cache area by 5.6%. We achieve 3.97x speedup w.r.t neural network systolic accelerator with a similar area. The re-configurable nature of the compute engines enables various neural network operations and thereby supporting sequential networks (RNNs) and transformer models. Our quantitative analysis demonstrates 101×, 3× faster execution and 91×, 11× energy efficient than CPU and GPU respectively while running the transformer model, BERT-Base.
更多查看译文
关键词
Processing-in-memory,SRAM,Look-up table,Neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络