谷歌浏览器插件
订阅小程序
在清言上使用

Variable Precision Multiplication for Software-Based Neural Networks

2020 IEEE High Performance Extreme Computing Conference (HPEC)(2020)

引用 3|浏览0
暂无评分
摘要
As the number of applications of neural networks continues to grow, so does the need to efficiently perform inference computations on highly constrained devices. In this paper, we propose a methodology to accelerate neural networks in software. We exploit the limited-precision requirements of typical neural networks by formulating recurring operations in a bit-slice computation format. Bit-slice computation ensures that every bit of an M-bit processor word contributes useful work even while computing a limited-precision n-bit (with n < M) operation. This paper brings the following contributions. We first present an environment to efficiently create bitslice descriptions in software, by synthesizing them from Verilog. We then develop bitsliced designs of matrix multiplication and evaluate their performance. Our target is a small microcontroller, and we rely solely on software optimization. Our driving application is a neural network classifier for the MNIST database. Range-Based Linear Quantization in symmetric mode quantizes pre-trained 32-bit floating point weights and activation to low-precision data-widths. Experiments on RISC-V with varying levels of hardware-support show that for data-widths common to neural network applications, the bit-sliced code produces a speedup over traditional methods, which leads to faster and efficient inference without incurring significant loss in accuracy. For example, 8-bit matrix multiplications are sped up by a factor of 2.62x when compared with non-bitsliced rv32i ISA implementation with no hardware multiplier.
更多
查看译文
关键词
Bitslice compilation,Neural networks,Software Acceleration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要