NISP: Pruning Networks using Neuron Importance Score Propagation

computer vision and pattern recognition, 2018.

Cited by: 230|Bibtex|Views50|Links
EI
Keywords:
binary integernetwork compressionfinal response layer”reconstruction errorpredictive powerMore(12+)
Weibo:
We introduce a generic network pruning algorithm, formulating the pruning problem as a binary integer optimization and deriving a closed-form solution based on final response importance

Abstract:

To reduce the significant redundancy in deep Convolutional Neural Networks (CNNs), most existing methods prune neurons by only considering the statistics of an individual or two consecutive layers (e.g., prune one to minimize the reconstruction error of the next layer), ignoring the effect of error propagation in deep networks. In contr...More

Code:

Data:

0
Introduction
  • CNNs require a large number of parameters and high computational cost in both training and testing phases.
  • Recent studies have investigated the significant redundancy in deep networks [6] and reduced the number of neurons and filters [3, 13, 22, 26] by pruning the unimportant ones.
  • Pre$trained Network FRL Feature Selection Fine tuning Neuron Importance NISP
Highlights
  • Convolutional Neural Networks (CNNs) require a large number of parameters and high computational cost in both training and testing phases
  • Our experiments reveal that greedy layer-by-layer pruning leads to significant reconstruction error propagation, especially in deep networks, which indicates the need for a global measurement of neuron importance across different layers of a CNN. We argue that it is essential for a pruned model to retain the most important responses of the second-to-last layer before classification—final response layer (FRL)—to retrain its predictive power, since those responses are the direct inputs of the classification task
  • We introduce a generic network pruning algorithm, formulating the pruning problem as a binary integer optimization and deriving a closed-form solution based on final response importance
  • We proposed a generic framework for network compression and acceleration based on identifying the importance levels of neurons
  • We presented the Neuron Importance Score Propagation algorithm that efficiently propagates the importance to every neuron in the whole network
  • Experiments demonstrated that our method effectively reduces CNN redundancy and achieves full-network acceleration and compression
Methods
  • The authors evaluate the approach on standard datasets with popular CNN networks.
  • The authors first compare to random pruning and training-from-scratch baselines to demonstrate the effectiveness of the method.
  • The authors benchmark the pruning results and compare to existing methods such as [11, 18, 33, 22].
  • The authors evaluate using five commonly used CNN architectures: LeNet [21], Cifar-net3, AlexNet [20], GoogLeNet [34] and ResNet [14]
Results
  • With almost zero accuracy loss on ResNet-56, the authors achieve a 43.61% FLOP reduction, significantly higher than the 27.60% reduction by Li et al [22].
  • The authors' method has less than 1% top-1 accuracy loss with 50% pruning ratio for each layer.
  • On GoogLeNet, The authors' method achieves similar accuracy loss with larger FLOPs reduction (58.34% vs 51.50%) Using ResNet on Cifar10 dataset, with top-1 accuracy loss similar to [22] (56-A, 56-B.
  • 110-A and 110-B), the method reduces more FLOPs and parameters
  • On GoogLeNet, The authors' method achieves similar accuracy loss with larger FLOPs reduction (58.34% vs. 51.50%) Using ResNet on Cifar10 dataset, with top-1 accuracy loss similar to [22] (56-A, 56-B. 110-A and 110-B), the method reduces more FLOPs and parameters
Conclusion
  • The authors proposed a generic framework for network compression and acceleration based on identifying the importance levels of neurons.
  • The authors formulated the network pruning problem as a binary integer program and obtained a closed-form solution to a relaxed version of the formulation.
  • The authors presented the Neuron Importance Score Propagation algorithm that efficiently propagates the importance to every neuron in the whole network.
  • Experiments demonstrated that the method effectively reduces CNN redundancy and achieves full-network acceleration and compression
Summary
  • Introduction:

    CNNs require a large number of parameters and high computational cost in both training and testing phases.
  • Recent studies have investigated the significant redundancy in deep networks [6] and reduced the number of neurons and filters [3, 13, 22, 26] by pruning the unimportant ones.
  • Pre$trained Network FRL Feature Selection Fine tuning Neuron Importance NISP
  • Methods:

    The authors evaluate the approach on standard datasets with popular CNN networks.
  • The authors first compare to random pruning and training-from-scratch baselines to demonstrate the effectiveness of the method.
  • The authors benchmark the pruning results and compare to existing methods such as [11, 18, 33, 22].
  • The authors evaluate using five commonly used CNN architectures: LeNet [21], Cifar-net3, AlexNet [20], GoogLeNet [34] and ResNet [14]
  • Results:

    With almost zero accuracy loss on ResNet-56, the authors achieve a 43.61% FLOP reduction, significantly higher than the 27.60% reduction by Li et al [22].
  • The authors' method has less than 1% top-1 accuracy loss with 50% pruning ratio for each layer.
  • On GoogLeNet, The authors' method achieves similar accuracy loss with larger FLOPs reduction (58.34% vs 51.50%) Using ResNet on Cifar10 dataset, with top-1 accuracy loss similar to [22] (56-A, 56-B.
  • 110-A and 110-B), the method reduces more FLOPs and parameters
  • On GoogLeNet, The authors' method achieves similar accuracy loss with larger FLOPs reduction (58.34% vs. 51.50%) Using ResNet on Cifar10 dataset, with top-1 accuracy loss similar to [22] (56-A, 56-B. 110-A and 110-B), the method reduces more FLOPs and parameters
  • Conclusion:

    The authors proposed a generic framework for network compression and acceleration based on identifying the importance levels of neurons.
  • The authors formulated the network pruning problem as a binary integer program and obtained a closed-form solution to a relaxed version of the formulation.
  • The authors presented the Neuron Importance Score Propagation algorithm that efficiently propagates the importance to every neuron in the whole network.
  • Experiments demonstrated that the method effectively reduces CNN redundancy and achieves full-network acceleration and compression
Tables
  • Table1: Compression Benchmark. [Accu.↓%] denotes the absolute accuracy loss; [FLOPs↓%] denotes the reduction of computations; [Params.↓%] demotes the reduction of parameter numbers
Download tables as Excel
Related work
  • There has been recent interest in reducing the redundancy of deep CNNs to achieve acceleration and compression. In [6] the redundancy in the parameterization of deep learning models has been studied and demonstrated. Cheng et al [2] exploited properties of structured matrices and used circulant matrices to represent FC layers, reducing storage cost. Han et al [13] studied weight sparsity and compressed CNNs by combining pruning, quantization, and Huffman coding. Sparsity regularization terms have been use to learn sparse CNN structure in [23, 35, 33]. Miao et al [27] studied network compression based on float data quantization for the purpose of massive model storage.
Funding
  • The research was partially supported by the Office of Naval Research under Grant N000141612713: Visual Common Sense Reasoning for Multi-agent Activity Prediction and Recognition
Reference
  • W. Chen, J. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. Compressing neural networks with the hashing trick”. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 2285–2294, 2015.
    Google ScholarLocate open access versionFindings
  • Y. Cheng, F. X. Yu, R. S. Feris, S. Kumar, A. Choudhary, and S. F. Chang. An exploration of parameter redundancy in deep networks with circulant projections. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 2857–2865, Dec 2015.
    Google ScholarLocate open access versionFindings
  • D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, and J. Schmidhuber. Flexible, high performance convolutional neural networks for image classification. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI’11, pages 1237–1242, 2011.
    Google ScholarLocate open access versionFindings
  • M. Courbariaux, Y. Bengio, and J. David. Training deep neural networks with low precision multiplications. In ICLR Workshop, 2015.
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248– 255, June 2009.
    Google ScholarLocate open access versionFindings
  • M. Denil, B. Shakibi, L. Dinh, M. Ranzato, and N. D. Freitas. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems 26 (NIPS), pages 2148–215Curran Associates, Inc., 2013.
    Google ScholarLocate open access versionFindings
  • E. L. Denton, W. Zaremba, J. Bruna, Y. Lecun, and R. Fergus. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems 27 (NIPS), pages 1269–1272014.
    Google ScholarLocate open access versionFindings
  • B. H. et al. Second order derivatives for network pruning: Optimal brain surgeon. In NIPS. 1993.
    Google ScholarLocate open access versionFindings
  • P. M. et al. Pruning convolutional neural networks for resource efficient transfer learning. CoRR, abs/1611.06440, 2016.
    Findings
  • Y. L. C. et al. Optimal brain damage. In NIPS, 1990.
    Google ScholarLocate open access versionFindings
  • M. Figurnov, A. Ibraimova, D. P. Vetrov, and P. Kohli. Perforatedcnns: Acceleration through elimination of redundant convolutions. In Advances in Neural Information Processing Systems 29 (NIPS), pages 947–955. 2016.
    Google ScholarLocate open access versionFindings
  • M. Gao, R. Yu, A. Li, V. I. Morariu, and L. S. Davis. Dynamic zoom-in network for fast object detection in large images. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
    Google ScholarLocate open access versionFindings
  • J. D. G. Hinton and O. Vinyals. Distilling the knowledge in a neural network. In NIPS 2014 Deep Learning Workshop, 2014.
    Google ScholarFindings
  • M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up convolutional neural networks with low rank expansions. In British Machine Vision Conference (BMVC), 2014.
    Google ScholarLocate open access versionFindings
  • Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia, MM’14, pages 675–678, New York, NY, USA, 2014. ACM.
    Google ScholarFindings
  • Y. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shi. Compression of deep convolutional neural networks for fast and low power mobile applications. In International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
    Google ScholarFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS), pages 1097–1105. Curran Associates, Inc., 2012.
    Google ScholarLocate open access versionFindings
  • Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. In Intelligent signal processing, pages 306–351. IEEE Press, 2001.
    Google ScholarLocate open access versionFindings
  • H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. Pruning filters for efficient convnets. In International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Penksy. Sparse convolutional neural networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 806–814, June 2015.
    Google ScholarLocate open access versionFindings
  • W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. Ssd: Single shot multibox detector. 2016. To appear.
    Google ScholarFindings
  • Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang. Learning efficient convolutional networks through network slimming. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
    Google ScholarLocate open access versionFindings
  • J.-H. Luo, J. Wu, and W. Lin. Thinet: A filter level pruning method for deep neural network compression. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
    Google ScholarLocate open access versionFindings
  • H. Miao, A. Li, L. S. Davis, and A. Deshpande. Towards unified data and lifecycle management for deep learning. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pages 571–582, April 2017.
    Google ScholarLocate open access versionFindings
  • P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz. Pruning convolutional neural networks for resource efficient inference. International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision (ECCV), 2016.
    Google ScholarLocate open access versionFindings
  • S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 91–99. Curran Associates, Inc., 2015.
    Google ScholarLocate open access versionFindings
  • G. Roffo, S. Melzi, and M. Cristani. Infinite feature selection. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 4202–4210, 2015.
    Google ScholarLocate open access versionFindings
  • S. Srinivas and R. V. Babu. Data-free parameter pruning for deep neural networks. In Proceedings of the British Machine Vision Conference (BMVC), pages 31.1–31.12. BMVA Press, 2015.
    Google ScholarLocate open access versionFindings
  • S. Srinivas and R. V. Babu. Learning the architecture of deep neural networks. In Proceedings of the British Machine Vision Conference (BMVC), pages 104.1– 104.11, September 2016.
    Google ScholarLocate open access versionFindings
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    Google ScholarLocate open access versionFindings
  • W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems 29 (NIPS), pages 2074–2082. 2016.
    Google ScholarLocate open access versionFindings
  • Z. Wu, T. Nagarajan, A. Kumar, S. Rennie, L. S. Davis, K. Grauman, and R. Feris. Blockdrop: Dynamic inference paths in residual networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Z. Yang, M. Moczulski, M. Denil, N. d. Freitas, A. Smola, L. Song, and Z. Wang. Deep fried convnets. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1476–1483, Dec 2015.
    Google ScholarLocate open access versionFindings
  • R. Yu, H. Wang, and L. S. Davis. Remotenet: Efficient relevant motion event detection for large-scale home surveillance videos. IEEE Winter Conference on Applications of Computer Vision (WACV), 2018.
    Google ScholarLocate open access versionFindings
  • X. Zhang, J. Zou, X. Ming, K. He, and J. Sun. Efficient and accurate approximations of nonlinear convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments