谷歌浏览器插件
订阅小程序
在清言上使用

Inter-Layer Hybrid Quantization Scheme for Hardware Friendly Implementation of Embedded Deep Neural Networks

GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023(2023)

引用 0|浏览10
暂无评分
摘要
Compression techniques have been widely deployed to amortize the model size and inference computations of Deep Neural Networks (DNNs), particularly for embedded systems. In this work, we propose an inter-layer approach that deploys a weight distribution aware quantization scheme (a hybrid of fixed-point and power-of-two) and multi-precision (3-bit and 4-bit) to better use heterogeneity in FPGA resources. Based on our evaluation, with similar hardware logic and memory resource usage, our proposed approach improved the throughput of the ResNet-18 network by 37% with negligible accuracy loss compared to the state-of-the-art on an embedded FPGA.
更多
查看译文
关键词
Quantization,hardware accelerator,deep neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要