# NISP: Pruning Networks using Neuron Importance Score Propagation

computer vision and pattern recognition, 2018.

EI

Keywords:

Weibo:

Abstract:

To reduce the significant redundancy in deep Convolutional Neural Networks (CNNs), most existing methods prune neurons by only considering the statistics of an individual or two consecutive layers (e.g., prune one to minimize the reconstruction error of the next layer), ignoring the effect of error propagation in deep networks. In contr...More

Code:

Data:

Introduction

- CNNs require a large number of parameters and high computational cost in both training and testing phases.
- Recent studies have investigated the significant redundancy in deep networks [6] and reduced the number of neurons and filters [3, 13, 22, 26] by pruning the unimportant ones.
- Pre$trained Network FRL Feature Selection Fine tuning Neuron Importance NISP

Highlights

- Convolutional Neural Networks (CNNs) require a large number of parameters and high computational cost in both training and testing phases
- Our experiments reveal that greedy layer-by-layer pruning leads to significant reconstruction error propagation, especially in deep networks, which indicates the need for a global measurement of neuron importance across different layers of a CNN. We argue that it is essential for a pruned model to retain the most important responses of the second-to-last layer before classification—final response layer (FRL)—to retrain its predictive power, since those responses are the direct inputs of the classification task
- We introduce a generic network pruning algorithm, formulating the pruning problem as a binary integer optimization and deriving a closed-form solution based on final response importance
- We proposed a generic framework for network compression and acceleration based on identifying the importance levels of neurons
- We presented the Neuron Importance Score Propagation algorithm that efficiently propagates the importance to every neuron in the whole network
- Experiments demonstrated that our method effectively reduces CNN redundancy and achieves full-network acceleration and compression

Methods

- The authors evaluate the approach on standard datasets with popular CNN networks.
- The authors first compare to random pruning and training-from-scratch baselines to demonstrate the effectiveness of the method.
- The authors benchmark the pruning results and compare to existing methods such as [11, 18, 33, 22].
- The authors evaluate using five commonly used CNN architectures: LeNet [21], Cifar-net3, AlexNet [20], GoogLeNet [34] and ResNet [14]

Results

- With almost zero accuracy loss on ResNet-56, the authors achieve a 43.61% FLOP reduction, significantly higher than the 27.60% reduction by Li et al [22].
- The authors' method has less than 1% top-1 accuracy loss with 50% pruning ratio for each layer.
- On GoogLeNet, The authors' method achieves similar accuracy loss with larger FLOPs reduction (58.34% vs 51.50%) Using ResNet on Cifar10 dataset, with top-1 accuracy loss similar to [22] (56-A, 56-B.
- 110-A and 110-B), the method reduces more FLOPs and parameters
- On GoogLeNet, The authors' method achieves similar accuracy loss with larger FLOPs reduction (58.34% vs. 51.50%) Using ResNet on Cifar10 dataset, with top-1 accuracy loss similar to [22] (56-A, 56-B. 110-A and 110-B), the method reduces more FLOPs and parameters

Conclusion

- The authors proposed a generic framework for network compression and acceleration based on identifying the importance levels of neurons.
- The authors formulated the network pruning problem as a binary integer program and obtained a closed-form solution to a relaxed version of the formulation.
- The authors presented the Neuron Importance Score Propagation algorithm that efficiently propagates the importance to every neuron in the whole network.
- Experiments demonstrated that the method effectively reduces CNN redundancy and achieves full-network acceleration and compression

Summary

## Introduction:

CNNs require a large number of parameters and high computational cost in both training and testing phases.- Recent studies have investigated the significant redundancy in deep networks [6] and reduced the number of neurons and filters [3, 13, 22, 26] by pruning the unimportant ones.
- Pre$trained Network FRL Feature Selection Fine tuning Neuron Importance NISP
## Methods:

The authors evaluate the approach on standard datasets with popular CNN networks.- The authors first compare to random pruning and training-from-scratch baselines to demonstrate the effectiveness of the method.
- The authors benchmark the pruning results and compare to existing methods such as [11, 18, 33, 22].
- The authors evaluate using five commonly used CNN architectures: LeNet [21], Cifar-net3, AlexNet [20], GoogLeNet [34] and ResNet [14]
## Results:

With almost zero accuracy loss on ResNet-56, the authors achieve a 43.61% FLOP reduction, significantly higher than the 27.60% reduction by Li et al [22].- The authors' method has less than 1% top-1 accuracy loss with 50% pruning ratio for each layer.
- On GoogLeNet, The authors' method achieves similar accuracy loss with larger FLOPs reduction (58.34% vs 51.50%) Using ResNet on Cifar10 dataset, with top-1 accuracy loss similar to [22] (56-A, 56-B.
- 110-A and 110-B), the method reduces more FLOPs and parameters
- On GoogLeNet, The authors' method achieves similar accuracy loss with larger FLOPs reduction (58.34% vs. 51.50%) Using ResNet on Cifar10 dataset, with top-1 accuracy loss similar to [22] (56-A, 56-B. 110-A and 110-B), the method reduces more FLOPs and parameters
## Conclusion:

The authors proposed a generic framework for network compression and acceleration based on identifying the importance levels of neurons.- The authors formulated the network pruning problem as a binary integer program and obtained a closed-form solution to a relaxed version of the formulation.
- The authors presented the Neuron Importance Score Propagation algorithm that efficiently propagates the importance to every neuron in the whole network.
- Experiments demonstrated that the method effectively reduces CNN redundancy and achieves full-network acceleration and compression

- Table1: Compression Benchmark. [Accu.↓%] denotes the absolute accuracy loss; [FLOPs↓%] denotes the reduction of computations; [Params.↓%] demotes the reduction of parameter numbers

Related work

- There has been recent interest in reducing the redundancy of deep CNNs to achieve acceleration and compression. In [6] the redundancy in the parameterization of deep learning models has been studied and demonstrated. Cheng et al [2] exploited properties of structured matrices and used circulant matrices to represent FC layers, reducing storage cost. Han et al [13] studied weight sparsity and compressed CNNs by combining pruning, quantization, and Huffman coding. Sparsity regularization terms have been use to learn sparse CNN structure in [23, 35, 33]. Miao et al [27] studied network compression based on float data quantization for the purpose of massive model storage.

Funding

- The research was partially supported by the Office of Naval Research under Grant N000141612713: Visual Common Sense Reasoning for Multi-agent Activity Prediction and Recognition

Reference

- W. Chen, J. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. Compressing neural networks with the hashing trick”. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 2285–2294, 2015.
- Y. Cheng, F. X. Yu, R. S. Feris, S. Kumar, A. Choudhary, and S. F. Chang. An exploration of parameter redundancy in deep networks with circulant projections. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 2857–2865, Dec 2015.
- D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, and J. Schmidhuber. Flexible, high performance convolutional neural networks for image classification. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI’11, pages 1237–1242, 2011.
- M. Courbariaux, Y. Bengio, and J. David. Training deep neural networks with low precision multiplications. In ICLR Workshop, 2015.
- J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248– 255, June 2009.
- M. Denil, B. Shakibi, L. Dinh, M. Ranzato, and N. D. Freitas. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems 26 (NIPS), pages 2148–215Curran Associates, Inc., 2013.
- E. L. Denton, W. Zaremba, J. Bruna, Y. Lecun, and R. Fergus. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems 27 (NIPS), pages 1269–1272014.
- B. H. et al. Second order derivatives for network pruning: Optimal brain surgeon. In NIPS. 1993.
- P. M. et al. Pruning convolutional neural networks for resource efficient transfer learning. CoRR, abs/1611.06440, 2016.
- Y. L. C. et al. Optimal brain damage. In NIPS, 1990.
- M. Figurnov, A. Ibraimova, D. P. Vetrov, and P. Kohli. Perforatedcnns: Acceleration through elimination of redundant convolutions. In Advances in Neural Information Processing Systems 29 (NIPS), pages 947–955. 2016.
- M. Gao, R. Yu, A. Li, V. I. Morariu, and L. S. Davis. Dynamic zoom-in network for fast object detection in large images. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In International Conference on Learning Representations (ICLR), 2016.
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- J. D. G. Hinton and O. Vinyals. Distilling the knowledge in a neural network. In NIPS 2014 Deep Learning Workshop, 2014.
- M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up convolutional neural networks with low rank expansions. In British Machine Vision Conference (BMVC), 2014.
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia, MM’14, pages 675–678, New York, NY, USA, 2014. ACM.
- Y. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shi. Compression of deep convolutional neural networks for fast and low power mobile applications. In International Conference on Learning Representations (ICLR), 2016.
- A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS), pages 1097–1105. Curran Associates, Inc., 2012.
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. In Intelligent signal processing, pages 306–351. IEEE Press, 2001.
- H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. Pruning filters for efficient convnets. In International Conference on Learning Representations (ICLR), 2017.
- B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Penksy. Sparse convolutional neural networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 806–814, June 2015.
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. Ssd: Single shot multibox detector. 2016. To appear.
- Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang. Learning efficient convolutional networks through network slimming. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
- J.-H. Luo, J. Wu, and W. Lin. Thinet: A filter level pruning method for deep neural network compression. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
- H. Miao, A. Li, L. S. Davis, and A. Deshpande. Towards unified data and lifecycle management for deep learning. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pages 571–582, April 2017.
- P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz. Pruning convolutional neural networks for resource efficient inference. International Conference on Learning Representations (ICLR), 2017.
- M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision (ECCV), 2016.
- S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 91–99. Curran Associates, Inc., 2015.
- G. Roffo, S. Melzi, and M. Cristani. Infinite feature selection. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 4202–4210, 2015.
- S. Srinivas and R. V. Babu. Data-free parameter pruning for deep neural networks. In Proceedings of the British Machine Vision Conference (BMVC), pages 31.1–31.12. BMVA Press, 2015.
- S. Srinivas and R. V. Babu. Learning the architecture of deep neural networks. In Proceedings of the British Machine Vision Conference (BMVC), pages 104.1– 104.11, September 2016.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems 29 (NIPS), pages 2074–2082. 2016.
- Z. Wu, T. Nagarajan, A. Kumar, S. Rennie, L. S. Davis, K. Grauman, and R. Feris. Blockdrop: Dynamic inference paths in residual networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Z. Yang, M. Moczulski, M. Denil, N. d. Freitas, A. Smola, L. Song, and Z. Wang. Deep fried convnets. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1476–1483, Dec 2015.
- R. Yu, H. Wang, and L. S. Davis. Remotenet: Efficient relevant motion event detection for large-scale home surveillance videos. IEEE Winter Conference on Applications of Computer Vision (WACV), 2018.
- X. Zhang, J. Zou, X. Ming, K. He, and J. Sun. Efficient and accurate approximations of nonlinear convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.

Tags

Comments