Unified Scaling-Based Pure-Integer Quantization for Low-Power Accelerator of Complex CNNs

ELECTRONICS(2023)

引用 0|浏览1
暂无评分
摘要
Although optimizing deep neural networks is becoming crucial for deploying the networks on edge AI devices, it faces increasing challenges due to scarce hardware resources in modern IoT and mobile devices. This study proposes a quantization method that can quantize all internal computations and parameters in the memory modification. Unlike most previous methods that primarily focused on relatively simple CNN models for image classification, the proposed method, Unified Scaling-Based Pure-Integer Quantization (USPIQ), can handle more complex CNN models for object detection. USPIQ aims to provide a systematic approach to convert all floating-point operations to pure-integer operations in every model layer. It can significantly reduce the computational overhead and make it more suitable for low-power neural network accelerator hardware consisting of pure-integer datapaths and small memory aimed at low-power consumption and small chip size. The proposed method optimally calibrates the scale parameters for each layer using a subset of unlabeled representative images. Furthermore, we introduce a notion of the Unified Scale Factor (USF), which combines the conventional two-step scaling processes (quantization and dequantization) into a single process for each layer. As a result, it improves the inference speed and the accuracy of the resulting quantized model. Our experiment on YOLOv5 models demonstrates that USPIQ can significantly reduce the on-chip memory for parameters and activation data by similar to 75% and 43.68%, respectively, compared with the floating-point model. These reductions have been achieved with a minimal loss in mAP@0.5-at most 0.61%. In addition, our proposed USPIQ exhibits a significant improvement in the inference speed compared to ONNX Run-Time quantization, achieving a speedup of 1.64 to 2.84 times. We also demonstrate that USPIQ outperforms the previous methods in terms of accuracy and hardware reduction for 8-bit quantization of all YOLOv5 versions.
更多
查看译文
关键词
scaling-based,pure-integer,low-power
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要