O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference

IEEE Transactions on Parallel and Distributed Systems(2021)

引用 34|浏览70
暂无评分
摘要
Binarized Neural Networks (BNN), which significantly reduce computational complexity and memory demand, have shown potential in cost- and power-restricted domains, such as IoT and smart edge-devices, where reaching certain accuracy bars is sufficient and real-time is highly desired. In this article, we demonstrate that the highly-condensed BNN model can be shrunk significantly by dynamically pruning irregular redundant edges. Based on two new observations on BNN-specific properties, an out-of-order (OoO) architecture, O3BNN-R, which can curtail edge evaluation in cases where the binary output of a neuron can be determined early at runtime during inference, is proposed. Similar to instruction level parallelism (ILP), fine-grained, irregular, and runtime pruning opportunities are traditionally presumed to be difficult to exploit. To further enhance the pruning opportunities, we conduct an algorithm/architecture co-design approach where we augment the loss function during the training stage with specialized regularization terms favoring edge pruning. We evaluate our design on an embedded FPGA using networks that include VGG-16, AlexNet for ImageNet, and a VGG-like network for Cifar-10. Results show that O3BNN-R without regularization can prune, on average, 30 percent of the operations, without any accuracy loss, bringing 2.2× inference-speedup, and on average 34× energy-efficiency improvement over state-of-the-art BNN implementations on FPGA/GPU/CPU. With regularization at training, the performance is further improved, on average, by 15 percent.
更多
查看译文
关键词
Machine learning,BNN,high-performance computing,neural network pruning,out-of-order architecture
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要