Adaptive Quantization Method for CNN with Computational-Complexity-Aware Regularization

2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)(2021)

引用 7|浏览16
暂无评分
摘要
Quantization is a typical approach toward reducing processing time for inference of convolutional neural networks (CNNs). The key to reducing inference times without drastic decreases in accuracy is allocating optimal bit widths according to each layer or filter. One of the most promising ways to find optimal bit allocations is to learn quantization step size and weight parameters through gradient descent. The conventional method optimizes those parameters under the constraint for model size or memory footprint of CNN. However, the bit allocations obtained by the conventional method are not always optimal for inference time because the arithmetic intensity of CNN is significantly high, and the time spent on computing is a bottleneck for the inference time. In this paper, we propose a regularization method using a computational-complexity metric (which we call MACxbit) that is correlated with the inference time of quantized CNN models. The proposed method can obtain an optimal bit allocation that achieves better recognition accuracy under a specified computational-complexity target than the conventional method. For similar recognition accuracy on the optimized ResNet-18 model, the proposed method achieves 21.0% less inference time compared to the conventional method.
更多
查看译文
关键词
CNN, quantization, mixed-precision computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要