## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Neuron Merging: Compensating for Pruned Neurons

NIPS 2020, (2020)

EI

Keywords

Abstract

Network pruning is widely used to lighten and accelerate neural network models. Structured network pruning discards the whole neuron or filter, leading to accuracy loss. In this work, we propose a novel concept of neuron merging applicable to both fully connected layers and convolution layers, which compensates for the information loss ...More

Code:

Data:

Introduction

- Modern Convolutional Neural Network (CNN) models have shown outstanding performance in many computer vision tasks.
- Due to their numerous parameters and computation, it remains challenging to deploy them to mobile phones or edge devices.
- The most prevalent structured pruning method for CNN models is to prune filters of each convolution layer and the corresponding output feature map channels.
- The filter or channel to be removed is determined by various saliency criteria [15, 26, 27]

Highlights

- Modern Convolutional Neural Network (CNN) models have shown outstanding performance in many computer vision tasks
- Unstructured pruning produces sparse weight matrices, which cannot lead to actual speedup and compression without specialized hardware or libraries [3]
- Since structured pruning maintains the original weight structure, no specialized hardware or libraries are necessary for acceleration
- Our contributions are as follows: (1) We propose and formulate a novel concept of neuron merging that compensates for the information loss due to the pruned neurons/filters in both fully connected layers and convolution layers
- (3) We show that our merged model better preserves the original model than the pruned model with various measures, such as the accuracy immediately after pruning, feature map visualization, and Weighted Average Reconstruction Error [27]
- For VGG-16 on CIFAR-10, we achieve an accuracy of 93.16% while reducing 64% of total parameters, without any fine-tuning
- We propose and formulate a novel concept of neuron merging that compensates for the accuracy loss of the pruned neurons

Methods

- The authors mathematically formulate the new concept of neuron merging in the fully connected layer.
- The authors show how merging is applied to the convolution layer.
- The authors start with the fully connected layer without bias.
- Let Ni denote the length of input column vector for the i-th fully connected layer.
- The i-th fully connected layer transforms the input vector xi ∈ RNi into the output vector xi+1 ∈ RNi+1.
- The network weights of the i-th layer are denoted as Wi ∈ RNi×Ni+1

Results

**Image Classification Results on ImageNet**

In Table 4, the authors present the test results of VGG-16 and ResNet-34 on ImageNet.**Image Classification Results on ImageNet**.- In Table 4, the authors present the test results of VGG-16 and ResNet-34 on ImageNet. The authors prune only the last convolution layer of VGG-16 as most of the parameters come from fully connected layers.
- Due to the large scale of the dataset, the initial accuracy right after the pruning drops rapidly as the pruning ratio increases.
- The authors' merging recovers the accuracy in all cases, showing the idea is effective even for large-scale datasets like ImageNet. VGG-16

Conclusion

- The authors propose and formulate a novel concept of neuron merging that compensates for the accuracy loss of the pruned neurons.
- The authors' one-shot and data-free method better reconstructs the output feature maps of the original model than vanilla pruning.
- To demonstrate the effectiveness of merging over network pruning, the authors compare the initial accuracy, WARE, and feature map visualization on image-classification tasks.
- It is worth noting that decomposing the weights can be varied in the neuron merging formulation.
- The authors plan to generalize the neuron merging formulation to more diverse activation functions and model architectures

Summary

## Introduction:

Modern Convolutional Neural Network (CNN) models have shown outstanding performance in many computer vision tasks.- Due to their numerous parameters and computation, it remains challenging to deploy them to mobile phones or edge devices.
- The most prevalent structured pruning method for CNN models is to prune filters of each convolution layer and the corresponding output feature map channels.
- The filter or channel to be removed is determined by various saliency criteria [15, 26, 27]
## Objectives:

The authors' goal is to maintain the activation feature map of the (i + 1)-th layer, which is## Methods:

The authors mathematically formulate the new concept of neuron merging in the fully connected layer.- The authors show how merging is applied to the convolution layer.
- The authors start with the fully connected layer without bias.
- Let Ni denote the length of input column vector for the i-th fully connected layer.
- The i-th fully connected layer transforms the input vector xi ∈ RNi into the output vector xi+1 ∈ RNi+1.
- The network weights of the i-th layer are denoted as Wi ∈ RNi×Ni+1
## Results:

**Image Classification Results on ImageNet**

In Table 4, the authors present the test results of VGG-16 and ResNet-34 on ImageNet.**Image Classification Results on ImageNet**.- In Table 4, the authors present the test results of VGG-16 and ResNet-34 on ImageNet. The authors prune only the last convolution layer of VGG-16 as most of the parameters come from fully connected layers.
- Due to the large scale of the dataset, the initial accuracy right after the pruning drops rapidly as the pruning ratio increases.
- The authors' merging recovers the accuracy in all cases, showing the idea is effective even for large-scale datasets like ImageNet. VGG-16
## Conclusion:

The authors propose and formulate a novel concept of neuron merging that compensates for the accuracy loss of the pruned neurons.- The authors' one-shot and data-free method better reconstructs the output feature maps of the original model than vanilla pruning.
- To demonstrate the effectiveness of merging over network pruning, the authors compare the initial accuracy, WARE, and feature map visualization on image-classification tasks.
- It is worth noting that decomposing the weights can be varied in the neuron merging formulation.
- The authors plan to generalize the neuron merging formulation to more diverse activation functions and model architectures

- Table1: Performance comparison of pruning and merging for LeNet-300-100 on FashionMNIST without fine-tuning. ‘Acc.↑’ denotes the accuracy gain of merging compared to pruning
- Table2: Performance comparison of pruning and merging for VGG-16 on CIFAR datasets without fine-tuning. ‘M-P’ denotes the accuracy recovery of merging compared to pruning. ‘B-M’ denotes the accuracy drop of the merged model compared to the baseline model. ‘Param. ↓ (#)’ denotes the parameter reduction rate and the absolute number of pruned/merged models
- Table3: WARE comparison of pruning and merging for various models on CIFAR-10. ‘WARE ↓’ denotes the WARE drop of the merged model compared to the pruned model
- Table4: Performance comparison of pruning and merging for VGG-16 and ResNet-34 on ImageNet dataset without fine-tuning. ‘Param. #’ denotes absolute parameter number of pruned/merged models. For VGG, ‘Last-{}%’ denotes the pruning ratio of the last convolution layer

Related work

- A variety of criteria [5, 6, 15, 18, 26, 27] have been proposed to evaluate the importance of a neuron, in the case of CNN, a filter. However, all of them suffer from significant accuracy drop immediately after the pruning. Therefore, fine-tuning the pruned model often requires as many epochs as training the original model to restore the accuracy near the original model. Several works [16, 25] add trainable parameters to each feature map channel to obtain data-driven channel sparsity, enabling the model to automatically identify redundant filters. In this case, training the model from scratch is inevitable to obtain the channel sparsity, which is a time- and resource-consuming process.

Among filter pruning works, Luo et al [17] and He et al [7] have similar motivation to ours, aiming to similarly reconstruct the output feature map of the next layer. Luo et al [17] search the subset of filters that have the smallest effect on the output feature map of the next layer. He et al [7] propose LASSO regression based channel selection and least square reconstruction of output feature maps. In both papers, data samples are required to obtain feature maps. However, our method is novel in that it compensates for the loss of removed filters in a one-shot and data-free way.

Funding

- Acknowledgments and Disclosure of Funding This research was results of a study on the “HPC Support” Project, supported by the ‘Ministry of Science and ICT’ and NIPA
- This work was also supported by Korea Institute of Science and Technology (KIST) under the project “HERO Part 1: Development of core technology of ambient intelligence for proactive service in digital in-home care.”

Reference

- Misha Denil, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato, and Nando De Freitas. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems, 2013.
- Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems, 2015.
- Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. Eie: efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 44(3):243–254, 2016.
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. Soft filter pruning for accelerating deep convolutional neural networks. In International Joint Conference on Artificial Intelligence, 2018.
- Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, and Yi Yang. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
- Yihui He, Xiangyu Zhang, and Jian Sun. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision, 2017.
- Geoffrey E Hinton. Learning multiple layers of representation. Trends in cognitive sciences, 11(10):428–434, 2007.
- Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
- Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. Compression of deep convolutional neural networks for fast and low power mobile applications. In 4th International Conference on Learning Representations, 2016.
- Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.
- Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan V. Oseledets, and Victor S. Lempitsky. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. In 3rd International Conference on Learning Representations, 2015.
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Namhoon Lee, Thalaiyasingam Ajanthan, and Philip HS Torr. Snip: Single-shot network pruning based on connection sensitivity. In 7th International Conference on Learning Representations, 2019.
- Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. In 5th International Conference on Learning Representations, 2017.
- Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision, 2017.
- Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision, 2017.
- Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. Pruning convolutional neural networks for resource efficient inference. In 5th International Conference on Learning Representations, 2017.
- Ben Mussay, Margarita Osadchy, Vladimir Braverman, Samson Zhou, and Dan Feldman. Dataindependent neural pruning via coresets. In 8th International Conference on Learning Representations, 2020.
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li FeiFei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
- Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, 2015.
- Suraj Srinivas and R. Venkatesh Babu. Data-free parameter pruning for deep neural networks. In Proceedings of the British Machine Vision Conference, 2015.
- Chaoqi Wang, Guodong Zhang, and Roger Grosse. Picking winning tickets before training by preserving gradient flow. In 8th International Conference on Learning Representations, 2020.
- Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Jianbo Ye and James Z. Wang Xin Lu, Zhe Lin. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In 6th International Conference on Learning Representations, 2018.
- Zhonghui You, Kun Yan, Jinmian Ye, Meng Ma, and Ping Wang. Gate decorator: Global filter pruning method for accelerating deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2019.
- Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, and Larry S Davis. Nisp: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
- Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In Proceedings of the British Machine Vision Conference, 2016.
- Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.

Tags

Comments