Evaluating Low-Memory GEMMs for Convolutional Neural Network Inference on FPGAs

2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)(2020)

引用 4|浏览126
暂无评分
摘要
FPGAs are becoming significant for implementing low-latency convolutional neural networks, because of performance demands and power constraints. Conventional implementations of convolutional layers are usually direct convolution, involving nested loops over channels, feature maps, and filters. Explicit general matrix multiplications (GEMMs) cost extra memory space, and the limited on-chip RAMs prevent an efficient GEMM-based implementation. In this paper, we evaluate a low-memory method of GEMMs on FPGAs based systolic arrays. We design a novel accelerator to save the bandwidth and increase the parallelism. We evaluate our design on MobileNet V1 and Inception V4. Our implementation achieves a throughput of around 3.5 TOP/s for both models. We also reduce the memory usage by 21% compared to explicit GEMM implementation for MobileNet V1 and 44% for Inception V4.
更多
查看译文
关键词
convolutional neural network inference,FPGA,low-latency convolutional neural networks,power constraints,convolutional layers,direct convolution,feature maps,explicit general matrix multiplications,extra memory space,on-chip RAM,low-memory method,MobileNet V1,memory usage,explicit GEMM implementation,low-memory GEMM,systolic arrays,Inception V4
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要