Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs.

ACM Great Lakes Symposium on VLSI(2018)

引用 22|浏览74
暂无评分
摘要
Both industry and academia have extensively investigated hardware accelerations. To address the demands in increasing computational capability and memory requirement, in this work, we propose the structured weight matrices (SWM)-based compression technique for both Field Programmable Gate Array (FPGA) and application-specific integrated circuit (ASIC) implementations. In the algorithm part, the SWM-based framework adopts block-circulant matrices to achieve a fine-grained tradeoff between accuracy and compression ratio. The SWM-based technique can reduce computational complexity from O(n2) to O(nlog n) and storage complexity from O(n2) to O(n) for each layer and both training and inference phases. For FPGA implementations on deep convolutional neural networks (DCNNs), we achieve at least 152X and 72X improvement in performance and energy efficiency, respectively using the SWM-based framework, compared with the baseline of IBM TrueNorth processor under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10. For FPGA implementations on long short term memory (LSTM) networks, the proposed SWM-based LSTM can achieve up to 21X enhancement in performance and 33.5X gains in energy efficiency compared with the ESE accelerator. For ASIC implementations, the proposed SWM-based ASIC design exhibits impressive advantages in terms of power, throughput, and energy efficiency. Experimental results indicate that this method is greatly suitable for applying DNNs onto both FPGAs and mobile/IoT devices.
更多
查看译文
关键词
Deep learning, FPGA, ASIC, Accelerator, Structured weight matrices
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要