AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We present a method to realize simultaneously size economy and speed improvement while pruning CNNs
Faster CNNs with Direct Sparse Convolutions and Guided Pruning
international conference on learning representations, (2017)
Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed in mobile devices, data centers, and even supercomputers.The number of parameters needed in CNNs, however, are often large and undesirable. Consequently, various methods have been developed to prune a CNN once it is trained. Ne...More
PPT (Upload PPT)
- Due to the success of deep neural networks in a broad set of practical and even critical artificial intelligence tasks, they are widely deployed in a spectrum of platforms: smart phones, autonomous cars, data center servers, and even supercomputers.
- That large neural network models incur cost in terms of memory, energy, and inference speed is easy to see
- This motivated a line of research (Han et al (2015; 2016b); Guo et al (2016); Denton et al (2014), to name a few) that tries to prune the parameters after a CNN design is trained and proved useful.
- The benefits of CNN pruning seem not fully realized
- Due to the success of deep neural networks in a broad set of practical and even critical artificial intelligence tasks, they are widely deployed in a spectrum of platforms: smart phones, autonomous cars, data center servers, and even supercomputers
- We present a highly efficient direct sparse convolution design formulated as sparse-matrix-dense-matrix multiplication with the dense matrix columns generated on-the-fly from a single column vector
- We develop a performance model that projects speedup over different sparsity levels and on different processor architectures
- We aim to more fully realize the potential performance benefits due to the reduced FLOP counts resulting from pruned convolution kernels
- By combining our high-performance direct sparse convolution method with a performance model, we developed a guided approach that prunes convolutional neural networks in a co-design fashion for different computer architectures and on different layers of a convolutional neural networks in question
- As this paper shows that pruning can boost inference speed significantly in additional to reducing model size, further techniques in pruning should be explored
- The authors' sparse CNN design is evaluated on three platforms shown in Table 1.
- Intel C2750 (Atom) represents resource-constrained mobile platforms or micro servers optimized for energy efficiency.
- Xeon E5-2697 v4 (BDW) represents data-center servers.
- Xeon Phi 7250 (KNL) is designed for highperformance computing, but its version, Knights Mill, will target machine learning.
- The authors' sparse CNN is implemented as an extension of Caffe deep learning framework (Jia et al, 2014) and is at https://github.com/IntelLabs/SkimCaffe.
- The SGEMM performance and achievable memory bandwidth listed are measured with Intel MKL version 2017 and STREAM benchmark (McCalpin), respectively
- In AlexNet, using the same element-wise regularization factor across all layers provides non-zero densities around 0.4 for conv2-5
- This is fine sparsity when the primary goal is reducing model size, but not high enough for speeding-up inference.
- Guided ESL (GESL) reduces the regularization factor of fc layers and avoid pruning conv1 entirely
- This leads to less than 0.2 non-zero density for conv2-5, the range where the authors can get speedups from sparse convolution.
- Applying GSL to dynamic network surgery (DNS), a recent proposal to obtain high sparsity, as Guided DNS (GDNS), the authors can see that GSL effectively improve the obtained sparsity for accelerating inference by de-prioritizing conv1 and fc layers3
- Pruning as a post-processing step has been effective in drastically reducing the model size while boosting inference speed moderately.
- While the direct sparse convolution algorithm is successful, the performance model reveals that sparse convolution cannot speedup all convolution layers, as seen from 1×1 convolutions in GoogLeNet. The authors plan to expand the performance model to cover other FLOP-reduction methods such as FFT, Winograd, and tensor factorization, so that the authors can make informed decisions to choose the best performing method for each layer and the training process can be guided
- Table1: Evaluated Platforms
- Table2: Design space of techniques in reducing model size and accelerating inference, categorized as 3 groups
- Recent researches have achieved great success on reducing model size and accelerating inference of CNNs while maintaining accuracy, exploring a large design space as shown in Table 2. Regularizationbased and factorization-based approaches are the two main camps. Regularization-based approaches
A: Lebedev & B: Han et al (2015), Lempitsky (2015)∗, Wen et al (2016)∗
Han et al (2016b), Liu et al (2015)∗, Guo et al. (2016), GESL∗ Regularization Group-wise Element-wise Dense Sparse
C: Denton et al (2014), Jaderberg et al (2014), Lebedev et al (2015), Zhang et al (2015), Kim et al (2016), Ioannou et al (2016), Tai et al (2016), Denil et al (2013)
- Aydin Buluç, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, and Charles E. Leiserson. Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2009.
- Soumith Chintala. convnet-benchmarks: Layer-wise Benchmarking, Updated 2015, 2015. https://github.com/soumith/convnet-benchmarks.
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
- Misha Denil, Babak Shakibi, Laurent Dinh, Marc’aurelio Ranzato, and Nando D. Freitas. Predicting Parameters in Deep Learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2013.
- Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann Lecun, and Rob Fergus. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2014.
- Yiwen Guo, Anbang Yao, and Yurong Chen. Dynamic Network Surgery for Efficient DNNs. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2016.
- Stefan Hadjis, Firas Abuzaid, Ce Zhang, and Christopher Re. Caffe con Troll: Shallow Ideas to Speed Up Deep Learning. arXiv preprint arXiv:1504.04343, 2015.
- Song Han, Jeff Pool, John Tran, and William J. Dally. Learning both Weights and Connections for Efficient Neural Networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2015.
- Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. EIE: efficient inference engine on compressed deep neural network. CoRR, 2016a.
- Song Han, Huizi Mao, and William J. Dally. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. In International Conference on Learning Representations (ICLR), 2016b.
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385, 2015.
- Yani Ioannou, Duncan Robertson, Jamie Shotton, Roberto Cipolla, and Antonio Criminisi. Training CNNs with Low-Rank Filters for Efficient Image Classification. In International Conference on Learning Representations (ICLR), 2016.
- Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. Speeding up Convolutional Neural Networks with Low Rank Expansions. In British Machine Vision Conference (BMVC), 2014.
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the ACM International Conference on Multimedia, 2014.
- Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications. In International Conference on Learning Representations (ICLR), 2016.
- Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. SIAM review, 51(3): 455–500, 2009.
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2012.
- Andrew Lavin and Scott Gray. Fast Algorithms for Convolutional Neural Networks. arXiv preprint arXiv:1509.09308, 2015.
- Vadim Lebedev and Victor Lempitsky. Fast ConvNets Using Group-wise Brain Damage. arXiv preprint arXiv:1506.02515, 2015.
- Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky. Speedingup Convolutional Neural Networks Using Fine-tuned CP-Decomposition. In International Conference on Learning Representations (ICLR), 2015.
- Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Penksy. Sparse Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- John D. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream.
- Jongsoo Park, Sheng Li, Wei Wen, Hai Li, Yiran Chen, and Pradeep Dubey. Holistic SparseCNN: Forging the Trident of Accuracy, Speed, and Size. arXiv preprint arXiv:1608.01409, 2016.
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going Deeper with Convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, and Weinan E. Convolutional neural networks with low-rank regularization. In International Conference on Learning Representations (ICLR), 2016.
- Nicolas Vasilache, Jeff Johnson, Michael Mathieu, Soumith Chintala, Serkan Piantino, and Yann LeCun. Fast Convolutional Nets with fbfft: A GPU Performance Evaluation. In International Conference on Learning Representations (ICLR), 2015.
- Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning Structured Sparsity in Deep Neural Networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2016.
- Samuel Williams, Andrew Waterman, and David Patterson. Roofline: An Insightful Visual Performance Model for Multicore Architectures. Communications of the ACM, 52(4):65–76, April 2009. ISSN 0001-0782. doi: 10.1145/1498765.1498785. URL http://doi.acm.org/10.1145/1498765.1498785.
- Xiangyu Zhang, Jianhua Zou, Kaiming He, and Jian Sun. Accelerating Very Deep Convolutional Networks for Classification and Detection. IEEE Transactions on Pattern Anaylsis and Machine Intelligence, 2015.