A 119.64 GOPs/W FPGA-Based ResNet50 Mixed-Precision Accelerator Using the Dynamic DSP Packing

Yaozhong Ou,Wei-Han Yu,Ka-Fai Un,Chi-Hang Chan,Yan Zhu

IEEE Transactions on Circuits and Systems II: Express Briefs（2024）

引用 0|浏览3

暂无评分

摘要

This paper presents a precision-sensitivity-aware quantization (PSAQ) mixed precision (MP) compression scheme designed for both weights and activations. The PSAQ MP method achieves a better trade-off between accuracy and energy efficiency, maintaining 75.6% top-1 accuracy in ResNet-50 and achieving 2.06× reduction in normalized operation with less than 1% accuracy difference compared to baseline. We propose two DSP-pipeline-friendly methods, dynamic DSP packing (DDP) and fully pre-calibrated (FPC) unpacking, to pack multiple operations into single DSP in error-free style with only one more clock cycle and slight logic overhead compared to the one without packing, by which the accelerator can simultaneously address the support for MP algorithms and efficient utilization of DSP bandwidth. Cooperated by the router network and optimized dataflow, our MP accelerator achieves 330.15 GOP/s throughput and 119.64 GOPs/W energy efficiency under 2.27-b weight and 3.61-b input feature map (ifmap).

查看译文

关键词

convolutional neural network (CNN),mixed-precision quantization,field programmable gate array (FPGA),digital signal processor (DSP),image classification

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要