Inter-Layer Hybrid Quantization Scheme for Hardware Friendly Implementation of Embedded Deep Neural Networks

GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023（2023）

引用 0|浏览10

暂无评分

摘要

Compression techniques have been widely deployed to amortize the model size and inference computations of Deep Neural Networks (DNNs), particularly for embedded systems. In this work, we propose an inter-layer approach that deploys a weight distribution aware quantization scheme (a hybrid of fixed-point and power-of-two) and multi-precision (3-bit and 4-bit) to better use heterogeneity in FPGA resources. Based on our evaluation, with similar hardware logic and memory resource usage, our proposed approach improved the throughput of the ResNet-18 network by 37% with negligible accuracy loss compared to the state-of-the-art on an embedded FPGA.

查看译文

关键词

Quantization,hardware accelerator,deep neural network

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要