Hardware-Aware Evolutionary Filter Pruning

EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2022(2022)

引用 2|浏览7
暂无评分
摘要
Compression techniques for Convolutional Neural Networks (CNNs) are key to performance. One common technique is filter pruning, which can effectively reduce the memory footprint, number of arithmetic operations, and consequently inference time. Recently, several approaches have been presented for automatic CNN compression using filter pruning, where the number of pruned filters is optimized by nature-inspired metaheuristics (e.g., artificial bee colony algorithms). However, these approaches focus on finding an optimal pruned network structure without considering the targeted device for CNN deployment. In this work, we show that the typical objective of reducing the number of operations does not necessarily lead to a maximum reduction in inference time, which is usually the main goal for compressing CNNs besides reducing the memory footprint. We then propose a hardware-aware multi-objective Design Space Exploration (DSE) technique for filter pruning that involves the targeted device (i.e., Graphics Processing Units (GPUs)). For each layer, the number of filters to be pruned is optimized with the objectives of minimizing the inference time and the error rate of the CNN. Experimental results show that our approach can further speed up inference time by 1.24x and 1.09x for VGG-16 on the CIFAR-10 dataset and ResNet-101 on the ILSVRC-2012 dataset, respectively, compared to the state-of-the-art ABCPruner.
更多
查看译文
关键词
pruning,hardware-aware
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要