Exploiting Local Structures with the Kronecker Layer in Convolutional Networks

CoRR, Volume abs/1512.09194, 2015.

Cited by: 11|Bibtex|Views128
EI
Other Links: academic.microsoft.com|dblp.uni-trier.de|arxiv.org
Weibo:
We have proposed and studied a framework to reduce the number of parameters and computation time in convolutional neural networks

Abstract:

In this paper, we propose and study a technique to reduce the number of parameters and computation time in convolutional neural networks. We use Kronecker product to exploit the local structures within convolution and fully-connected layers, by replacing the large weight matrices by combinations of multiple Kronecker products of smaller...More

Code:

Data:

0
Introduction
  • Convolutional neural networks (CNNs) have achieved a great success in many computer vision and machine learning tasks.
  • This success facilitates the development of industrial applications using CNNs. there are two major challenges for practical use of these networks, especially on resource-limited devices: 1.
  • CNNs achieving state of the art results may require billions Dean et al (2012), Le (2013), Jaderberg et al (2014a) of parameters for storage
Highlights
  • Convolutional neural networks (CNNs) have achieved a great success in many computer vision and machine learning tasks
  • We have proposed and studied a framework to reduce the number of parameters and computation time in convolutional neural networks
  • Our framework uses Kronecker products to exploit the local structure within convolutional layers and fully-connected layers
  • As Kronecker product is a generalization of the outer product, our method generalizes the low rank approximation method for matrices and tensors
  • We explored combining Kronecker product of different shapes to further balance the drop in accuracy and the reduction in parameters
  • Through a series of experiments on different datasets, our method is proven to be effective and efficient on different tasks. It can reduce the computation time and model size with minor lost in accuracy, or improve previous state-of-the-art performance with similar model size
Methods
  • Baseline KConv-a KConv-b KConv-c

    Configuration (r,o1,c1,h1,w1) /

    1,128,24,9,1; 1,256,64,8,1 1,128,48,1,9; 1,512,64,1,8 2,64,24,9,1; 2,256,64,8,1

    Validation Error 7.84% 8.76%

    The authors have experimented replacing the first convolutional layer with KConv layer.
  • The KFC model with highest accuracy uses a configuration with shapes (26, 15, 719, 122), (26, 15, 122, 719), (13, 30, 61, 1438), (130, 3, 1438, 61), each shape of rank 10, building a KFC layer of total rank 40
  • This layer itself still saves 92% parameters compared to its FC counterpart.
  • The scatter diagram indicates that the KFC layer requires less parameter with the same accuracy or has higher accuracy with the same number of parameters
  • This demonstrates that the technique works well before softmax layer
Results
  • The authors replace the second and third convolutions by the KConv layer since these two layers constitute more than 90% of the running time.
  • The KConv layer can achieve about 3.3× speedup on the whole model with less than 1% accuracy loss
Conclusion
  • The authors have proposed and studied a framework to reduce the number of parameters and computation time in convolutional neural networks.
  • Through a series of experiments on different datasets, the method is proven to be effective and efficient on different tasks.
  • It can reduce the computation time and model size with minor lost in accuracy, or improve previous state-of-the-art performance with similar model size
Summary
  • Introduction:

    Convolutional neural networks (CNNs) have achieved a great success in many computer vision and machine learning tasks.
  • This success facilitates the development of industrial applications using CNNs. there are two major challenges for practical use of these networks, especially on resource-limited devices: 1.
  • CNNs achieving state of the art results may require billions Dean et al (2012), Le (2013), Jaderberg et al (2014a) of parameters for storage
  • Methods:

    Baseline KConv-a KConv-b KConv-c

    Configuration (r,o1,c1,h1,w1) /

    1,128,24,9,1; 1,256,64,8,1 1,128,48,1,9; 1,512,64,1,8 2,64,24,9,1; 2,256,64,8,1

    Validation Error 7.84% 8.76%

    The authors have experimented replacing the first convolutional layer with KConv layer.
  • The KFC model with highest accuracy uses a configuration with shapes (26, 15, 719, 122), (26, 15, 122, 719), (13, 30, 61, 1438), (130, 3, 1438, 61), each shape of rank 10, building a KFC layer of total rank 40
  • This layer itself still saves 92% parameters compared to its FC counterpart.
  • The scatter diagram indicates that the KFC layer requires less parameter with the same accuracy or has higher accuracy with the same number of parameters
  • This demonstrates that the technique works well before softmax layer
  • Results:

    The authors replace the second and third convolutions by the KConv layer since these two layers constitute more than 90% of the running time.
  • The KConv layer can achieve about 3.3× speedup on the whole model with less than 1% accuracy loss
  • Conclusion:

    The authors have proposed and studied a framework to reduce the number of parameters and computation time in convolutional neural networks.
  • Through a series of experiments on different datasets, the method is proven to be effective and efficient on different tasks.
  • It can reduce the computation time and model size with minor lost in accuracy, or improve previous state-of-the-art performance with similar model size
Tables
  • Table1: Comparison of SVD method and KFC layers on SVHN digit recognition
  • Table2: Comparison of different methods on SVHN sequence recognition
  • Table3: Recognition Performances of different methods on CASIA-HWDB
  • Table4: Speedup of KConv on scene text character recognition dataset. Parameters of the different layers are separated by a semicolon
  • Table5: Comparison of using SVD method and using KFC layers on ImageNet
Download tables as Excel
Related work
  • In this section we discuss some related works not yet covered. In addition to the low rank methods mentioned earlier, hashing methods have also been used to reduce the number of parameters Chen et al (2015), Bakhtiary et al (2015), and distillation offers another way of compressing neural networks Hinton et al (2015). Furthermore, Mathieu et al (2013) used FFT to speedup convolution. Yang et al (2015) used adaptive fastfood transform to reparameterize the matrixvector multiplication of fully-connected layers. Han et al (2015) iteratively pruned redundant connection to reduce the number of parameters. Gong et al (2014) used vector quantization to compress the fullyconnected layer. Gupta et al (2015) suggested using low precision arithmetic to compress the neural network.
Funding
  • Proposes and study a technique to reduce the number of parameters and computation time in convolutional neural networks
  • Introduces combinations of different shapes of Kronecker product to increase modeling capacity
  • Explores a framework for approximating the weight matrices and weight tensors in neural networks by sum of Kronecker products
  • Can achieve 3.3× speedup or 3.6× parameter reduction with less than 1% drop in accuracy, showing the effectiveness and efficiency of our method
Reference
  • Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 33(5):898–916, May 201ISSN 0162-8828. doi: 10.1109/TPAMI.2010.161. URL http://dx.doi.org/10.1109/TPAMI.2010.161.
    Locate open access versionFindings
  • J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. In Proc. of ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Amir H Bakhtiary, Agata Lapedriza, and David Masip. Speeding up neural networks for large scale classification using wta hashing. arXiv preprint arXiv:1504.07488, 2015.
    Findings
  • Frederic Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, and Yoshua Bengio. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.
    Google ScholarLocate open access versionFindings
  • James Bergstra, Olivier Breuleux, Frederic Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral Presentation.
    Google ScholarLocate open access versionFindings
  • Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, and Yixin Chen. Compressing neural networks with the hashing trick. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 2285–2294, 2015. URL http://jmlr.org/proceedings/papers/v37/chenc15.html.
    Locate open access versionFindings
  • Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, pages 1223–1231, 2012.
    Google ScholarLocate open access versionFindings
  • Misha Denil, Babak Shakibi, Laurent Dinh, Nando de Freitas, et al. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems, pages 2148–2156, 2013.
    Google ScholarLocate open access versionFindings
  • Emily L Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems, pages 1269–1277, 2014.
    Google ScholarLocate open access versionFindings
  • Weiguang Ding, Ruoyan Wang, Fei Mao, and Graham Taylor. Theano-based large-scale visual recognition with multiple gpus. arXiv preprint arXiv:1412.2302, 2014.
    Findings
  • Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115, 2014.
    Findings
  • I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet. Multi-digit number recognition from street view imagery using deep convolutional neural networks. In arXiv, 2013a.
    Google ScholarLocate open access versionFindings
  • Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron C. Courville, and Yoshua Bengio. Maxout networks. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, pages 1319–1327, 2013b. URL http://jmlr.org/proceedings/papers/v28/goodfellow13.html.
    Locate open access versionFindings
  • Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 1737–1746, 2015. URL http://jmlr.org/proceedings/papers/v37/gupta15.html.
    Locate open access versionFindings
  • Song Han, Jeff Pool, John Tran, and William J Dally. Learning both weights and connections for efficient neural networks. arXiv preprint arXiv:1506.02626, 2015.
    Findings
  • Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
    Findings
  • M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227, 2014a.
    Findings
  • M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu. Spatial transformer networks. In arXiv, 2015.
    Google ScholarLocate open access versionFindings
  • Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. Speeding up convolutional neural networks with low rank expansions. In British Machine Vision Conference, BMVC 2014, Nottingham, UK, September 1-5, 2014, 2014b. URL http://www.bmva.org/bmvc/2014/papers/paper073/index.html.
    Locate open access versionFindings
  • Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. Deep features for text spotting. In Computer Vision– ECCV 2014, pages 512–528.
    Google ScholarLocate open access versionFindings
  • Springer, 2014c.
    Google ScholarFindings
  • Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, pages 675–678. ACM, 2014.
    Google ScholarLocate open access versionFindings
  • Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Mikio Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Jordi Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis-Pere de las Heras. Icdar 2013 robust reading competition. In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on, pages 1484–1493. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. SIAM Rev., 51(3):455–500, August 2009. ISSN 0036-1445. doi: 10.1137/07070111X. URL http://dx.doi.org/10.1137/07070111X.
    Locate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
    Google ScholarLocate open access versionFindings
  • Quoc V Le. Building high-level features using large scale unsupervised learning. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8595–8598. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. arXiv preprint arXiv:1412.6553, 2014.
    Findings
  • Cheng-Lin Liu, Fei Yin, Da-Han Wang, and Qiu-Feng Wang. Casia online and offline chinese handwriting databases. In Document Analysis and Recognition (ICDAR), 2011 International Conference on, pages 37– 41. IEEE, 2011.
    Google ScholarLocate open access versionFindings
  • Michael Mathieu, Mikael Henaff, and Yann LeCun. Fast training of convolutional networks through ffts. arXiv preprint arXiv:1312.5851, 2013.
    Findings
  • Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, volume 2011, page 5.
    Google ScholarLocate open access versionFindings
  • Granada, Spain, 2011.
    Google ScholarFindings
  • Roberto Rigamonti, Amos Sironi, Vincent Lepetit, and Pascal Fua. Learning separable filters. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 2754–2761. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), pages 1–42, April 2015. doi: 10.1007/s11263-015-0816-y.
    Locate open access versionFindings
  • Tara N Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, and Bhuvana Ramabhadran. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 6655–6659. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • Pierre Sermanet, Sandhya Chintala, and Yann LeCun. Convolutional neural networks applied to house numbers digit classification. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 3288– 3291. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014a.
    Findings
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014b.
    Findings
  • Charles F Van Loan and Nikos Pitsianis. Approximation with Kronecker products. Springer, 1993.
    Google ScholarFindings
  • Chunpeng Wu, Wei Fan, Yuan He, Jun Sun, and Satoshi Naoi. Handwritten character recognition by alternately trained relaxation convolutional neural network. In Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on, pages 291–296. IEEE, 2014.
    Google ScholarLocate open access versionFindings
  • Jian Xue, Jinyu Li, and Yifan Gong. Restructuring of deep neural network acoustic models with singular value decomposition. In INTERSPEECH, pages 2365–2369, 2013.
    Google ScholarLocate open access versionFindings
  • Zichao Yang, Andrew Gordon Wilson, Alexander J. Smola, and Le Song. A la carte - learning fast kernels. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2015, San Diego, California, USA, May 9-12, 2015, 2015. URL http://jmlr.org/proceedings/papers/v38/yang15b.html.
    Locate open access versionFindings
  • Fei Yin, Qiu-Feng Wang, Xu-Yao Zhang, and Cheng-Lin Liu. Icdar 2013 chinese handwriting recognition competition. In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on, pages 1464–1470. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • Xiangyu Zhang, Jianhua Zou, Kaiming He, and Jian Sun. Accelerating very deep convolutional networks for classification and detection. arXiv preprint arXiv:1505.06798, 2015.
    Findings
  • Zhuoyao Zhong, Lianwen Jin, and Zecheng Xie. High performance offline handwritten chinese character recognition using googlenet and directional feature maps. CoRR, abs/1505.04925, 2015. URL http://arxiv.org/abs/1505.04925.
    Findings
Full Text
Your rating :
0

 

Tags
Comments