HARVEST: Towards Efficient Sparse DNN Accelerators using Programmable Thresholds.

International Conference on VLSI Design(2024)

引用 0|浏览0
暂无评分
摘要
Although DNNs have become the AI standard due to algorithmic advancements, their computational and memory demands pose challenges for their deployment on edge devices. Extensive research has explored optimizations to efficiently run DNN models on resource-constrained mobile devices, encompassing both software and hardware enhancements. Sparsity exploitation is a prominent optimization technique that aims to boost DNN inference efficiency and speed by eliminating redundant MAC operations resulting from zero operands. In this paper, we propose HARVEST, a hardware-software co-design approach that utilizes existing sparsity engines in accelerators to introduce sparsity in DNN weights and activations during inference. This technique involves two methods: (1) Activation sparsity is achieved by applying thresholds to intermediate activations. These thresholds are determined based on constraints and statistics of the activations stored in the accelerator’s SRAM banks. This approach applies thresholding to all layer types, including convolution, element-wise, fully connected, attention, normalization layers, and non-linear activation functions, thereby increasing sparsity without any additional overhead in terms of area and power. (2) Weight sparsity is introduced before deployment using customized thresholds for each layer. These methods collectively reduce memory and compute energy consumption, leading to improvements in accelerator energy efficiency, at the cost of minimal accuracy loss. Results on state-of-the-art Transformer and CNN models demonstrate a gain of up to 61% and 80% in activation and weight sparsity, respectively. Exploiting this sparsity, an in-house fully-sparse accelerator provides up to 24%, 36%, and 32% reductions in memory, compute, and overall accelerator energy, respectively, for minimal (< 0.5%) loss in accuracy. Furthermore, HARVEST provides up to 32% and 36% reduction in memory and compute cycle count during DNN inference, leading to increased throughput.
更多
查看译文
关键词
Deep Neural Networks,activation sparsity,weight sparsity,DNN accelerators,thresholding,energy-efficiency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要