AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We present a method to realize simultaneously size economy and speed improvement while pruning CNNs

Faster CNNs with Direct Sparse Convolutions and Guided Pruning

international conference on learning representations, (2017)

Cited: 165|Views98
EI
Full Text
Bibtex
Weibo

Abstract

Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed in mobile devices, data centers, and even supercomputers.The number of parameters needed in CNNs, however, are often large and undesirable. Consequently, various methods have been developed to prune a CNN once it is trained. Ne...More

Code:

Data:

0
Introduction
  • Due to the success of deep neural networks in a broad set of practical and even critical artificial intelligence tasks, they are widely deployed in a spectrum of platforms: smart phones, autonomous cars, data center servers, and even supercomputers.
  • That large neural network models incur cost in terms of memory, energy, and inference speed is easy to see
  • This motivated a line of research (Han et al (2015; 2016b); Guo et al (2016); Denton et al (2014), to name a few) that tries to prune the parameters after a CNN design is trained and proved useful.
  • The benefits of CNN pruning seem not fully realized
Highlights
  • Due to the success of deep neural networks in a broad set of practical and even critical artificial intelligence tasks, they are widely deployed in a spectrum of platforms: smart phones, autonomous cars, data center servers, and even supercomputers
  • We present a highly efficient direct sparse convolution design formulated as sparse-matrix-dense-matrix multiplication with the dense matrix columns generated on-the-fly from a single column vector
  • We develop a performance model that projects speedup over different sparsity levels and on different processor architectures
  • We aim to more fully realize the potential performance benefits due to the reduced FLOP counts resulting from pruned convolution kernels
  • By combining our high-performance direct sparse convolution method with a performance model, we developed a guided approach that prunes convolutional neural networks in a co-design fashion for different computer architectures and on different layers of a convolutional neural networks in question
  • As this paper shows that pruning can boost inference speed significantly in additional to reducing model size, further techniques in pruning should be explored
Methods
  • The authors' sparse CNN design is evaluated on three platforms shown in Table 1.
  • Intel C2750 (Atom) represents resource-constrained mobile platforms or micro servers optimized for energy efficiency.
  • Xeon E5-2697 v4 (BDW) represents data-center servers.
  • Xeon Phi 7250 (KNL) is designed for highperformance computing, but its version, Knights Mill, will target machine learning.
  • The authors' sparse CNN is implemented as an extension of Caffe deep learning framework (Jia et al, 2014) and is at https://github.com/IntelLabs/SkimCaffe.
  • The SGEMM performance and achievable memory bandwidth listed are measured with Intel MKL version 2017 and STREAM benchmark (McCalpin), respectively
Results
  • In AlexNet, using the same element-wise regularization factor across all layers provides non-zero densities around 0.4 for conv2-5
  • This is fine sparsity when the primary goal is reducing model size, but not high enough for speeding-up inference.
  • Guided ESL (GESL) reduces the regularization factor of fc layers and avoid pruning conv1 entirely
  • This leads to less than 0.2 non-zero density for conv2-5, the range where the authors can get speedups from sparse convolution.
  • Applying GSL to dynamic network surgery (DNS), a recent proposal to obtain high sparsity, as Guided DNS (GDNS), the authors can see that GSL effectively improve the obtained sparsity for accelerating inference by de-prioritizing conv1 and fc layers3
Conclusion
  • Pruning as a post-processing step has been effective in drastically reducing the model size while boosting inference speed moderately.
  • While the direct sparse convolution algorithm is successful, the performance model reveals that sparse convolution cannot speedup all convolution layers, as seen from 1×1 convolutions in GoogLeNet. The authors plan to expand the performance model to cover other FLOP-reduction methods such as FFT, Winograd, and tensor factorization, so that the authors can make informed decisions to choose the best performing method for each layer and the training process can be guided
Tables
  • Table1: Evaluated Platforms
  • Table2: Design space of techniques in reducing model size and accelerating inference, categorized as 3 groups
Download tables as Excel
Related work
  • Recent researches have achieved great success on reducing model size and accelerating inference of CNNs while maintaining accuracy, exploring a large design space as shown in Table 2. Regularizationbased and factorization-based approaches are the two main camps. Regularization-based approaches

    Pruning Computing

    A: Lebedev & B: Han et al (2015), Lempitsky (2015)∗, Wen et al (2016)∗

    Han et al (2016b), Liu et al (2015)∗, Guo et al. (2016), GESL∗ Regularization Group-wise Element-wise Dense Sparse

    C: Denton et al (2014), Jaderberg et al (2014), Lebedev et al (2015), Zhang et al (2015), Kim et al (2016), Ioannou et al (2016), Tai et al (2016), Denil et al (2013)
Reference
  • Aydin Buluç, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, and Charles E. Leiserson. Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2009.
    Google ScholarLocate open access versionFindings
  • Soumith Chintala. convnet-benchmarks: Layer-wise Benchmarking, Updated 2015, 2015. https://github.com/soumith/convnet-benchmarks.
    Findings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
    Google ScholarLocate open access versionFindings
  • Misha Denil, Babak Shakibi, Laurent Dinh, Marc’aurelio Ranzato, and Nando D. Freitas. Predicting Parameters in Deep Learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2013.
    Google ScholarLocate open access versionFindings
  • Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann Lecun, and Rob Fergus. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2014.
    Google ScholarLocate open access versionFindings
  • Yiwen Guo, Anbang Yao, and Yurong Chen. Dynamic Network Surgery for Efficient DNNs. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2016.
    Google ScholarLocate open access versionFindings
  • Stefan Hadjis, Firas Abuzaid, Ce Zhang, and Christopher Re. Caffe con Troll: Shallow Ideas to Speed Up Deep Learning. arXiv preprint arXiv:1504.04343, 2015.
    Findings
  • Song Han, Jeff Pool, John Tran, and William J. Dally. Learning both Weights and Connections for Efficient Neural Networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2015.
    Google ScholarLocate open access versionFindings
  • Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. EIE: efficient inference engine on compressed deep neural network. CoRR, 2016a.
    Google ScholarLocate open access versionFindings
  • Song Han, Huizi Mao, and William J. Dally. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. In International Conference on Learning Representations (ICLR), 2016b.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385, 2015.
    Findings
  • Yani Ioannou, Duncan Robertson, Jamie Shotton, Roberto Cipolla, and Antonio Criminisi. Training CNNs with Low-Rank Filters for Efficient Image Classification. In International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. Speeding up Convolutional Neural Networks with Low Rank Expansions. In British Machine Vision Conference (BMVC), 2014.
    Google ScholarLocate open access versionFindings
  • Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the ACM International Conference on Multimedia, 2014.
    Google ScholarLocate open access versionFindings
  • Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications. In International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. SIAM review, 51(3): 455–500, 2009.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2012.
    Google ScholarLocate open access versionFindings
  • Andrew Lavin and Scott Gray. Fast Algorithms for Convolutional Neural Networks. arXiv preprint arXiv:1509.09308, 2015.
    Findings
  • Vadim Lebedev and Victor Lempitsky. Fast ConvNets Using Group-wise Brain Damage. arXiv preprint arXiv:1506.02515, 2015.
    Findings
  • Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky. Speedingup Convolutional Neural Networks Using Fine-tuned CP-Decomposition. In International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Penksy. Sparse Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    Google ScholarLocate open access versionFindings
  • John D. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream.
    Findings
  • Jongsoo Park, Sheng Li, Wei Wen, Hai Li, Yiran Chen, and Pradeep Dubey. Holistic SparseCNN: Forging the Trident of Accuracy, Speed, and Size. arXiv preprint arXiv:1608.01409, 2016.
    Findings
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going Deeper with Convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    Google ScholarLocate open access versionFindings
  • Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, and Weinan E. Convolutional neural networks with low-rank regularization. In International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • Nicolas Vasilache, Jeff Johnson, Michael Mathieu, Soumith Chintala, Serkan Piantino, and Yann LeCun. Fast Convolutional Nets with fbfft: A GPU Performance Evaluation. In International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning Structured Sparsity in Deep Neural Networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2016.
    Google ScholarLocate open access versionFindings
  • Samuel Williams, Andrew Waterman, and David Patterson. Roofline: An Insightful Visual Performance Model for Multicore Architectures. Communications of the ACM, 52(4):65–76, April 2009. ISSN 0001-0782. doi: 10.1145/1498765.1498785. URL http://doi.acm.org/10.1145/1498765.1498785.
    Locate open access versionFindings
  • Xiangyu Zhang, Jianhua Zou, Kaiming He, and Jian Sun. Accelerating Very Deep Convolutional Networks for Classification and Detection. IEEE Transactions on Pattern Anaylsis and Machine Intelligence, 2015.
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn