Recent advances in convolutional neural networks
Pattern Recognition, pp. 354-377, 2018.
EI
Weibo:
Abstract:
We give an overview of the basic components of CNN.We discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization an...
Code:
Data:
Introduction
- Convolutional Neural Network (CNN) is a well-known deep learning architecture inspired by the natural visual perception mechanism of the living creatures.
- [3] published the seminal paper establishing the modern framework of CNN, and later improved it in [4]
- They developed a multi-layer artificial neural network called LeNet-5 which could classify handwritten digits.
- LeNet-5 has multiple layers and can be trained with the backpropagation algorithm [5]
- It can obtain effective representations of the original image, which makes it possible to recognize visual patterns directly from raw pixels with little-to-none preprocessing.
- Due to the lack of large training data and computing power at that time, their networks can not perform well on more complex problems, e.g., large-scale image and video classification
Highlights
- Convolutional Neural Network (CNN) is a well-known deep learning architecture inspired by the natural visual perception mechanism of the living creatures
- Kunihiko Fukushima proposed the neocognitron in 1980 [2], which could be regarded as the predecessor of Convolutional Neural Network
- XNOR-Net [148] applies convolutional Binarized Neural Networks on the ImageNet dataset with topologies inspired by AlexNet, Residual Nets and GoogLeNet, reporting top-1 accuracies of up to 51.2% for full binarization and 65.5% for partial binarization
- Beyond surveying the advances of each aspect of Convolutional Neural Network, we have introduced the application of Convolutional Neural Network on many tasks, including image classification, object detection, object tracking, pose estimation, text detection, visual saliency detection, action recognition, scene labeling, speech and natural language processing
- To speed up training procedure, there are already some asynchronous Stochastic Gradient Descent algorithms which have shown promising result by using CPU and GPU clusters, it is still worth to develop effective and scalable parallel training algorithms. These deep models are highly memory demanding and timeconsuming, which makes them not suitable to be deployed on mobile platforms that have limited resources
Results
- Xue et al . [135] apply singular value decomposition on each layer of a deep CNN to reduce the model size by 71% with less than 1% relative accuracy loss.
- [135] apply singular value decomposition on each layer of a deep CNN to reduce the model size by 71% with less than 1% relative accuracy loss.
- XNOR-Net [148] applies convolutional BNNs on the ImageNet dataset with topologies inspired by AlexNet, ResNet and GoogLeNet, reporting top-1 accuracies of up to 51.2% for full binarization and 65.5% for partial binarization.
- The authors introduce some recent works that apply CNNs to achieve state-of-the-art performance, including image classification, object tracking, pose estimation, text detection, visual saliency detection, action recognition, scene labeling, speech and natural language processing
Conclusion
- Conclusions and Outlook
Deep CNNs have made breakthroughs in processing image, video, speech and text. - To speed up training procedure, there are already some asynchronous SGD algorithms which have shown promising result by using CPU and GPU clusters, it is still worth to develop effective and scalable parallel training algorithms
- At testing time, these deep models are highly memory demanding and timeconsuming, which makes them not suitable to be deployed on mobile platforms that have limited resources.
- It is important to investigate how to reduce the complexity and obtain fast-to-execute models without loss of accuracy
Summary
Introduction:
Convolutional Neural Network (CNN) is a well-known deep learning architecture inspired by the natural visual perception mechanism of the living creatures.- [3] published the seminal paper establishing the modern framework of CNN, and later improved it in [4]
- They developed a multi-layer artificial neural network called LeNet-5 which could classify handwritten digits.
- LeNet-5 has multiple layers and can be trained with the backpropagation algorithm [5]
- It can obtain effective representations of the original image, which makes it possible to recognize visual patterns directly from raw pixels with little-to-none preprocessing.
- Due to the lack of large training data and computing power at that time, their networks can not perform well on more complex problems, e.g., large-scale image and video classification
Results:
Xue et al . [135] apply singular value decomposition on each layer of a deep CNN to reduce the model size by 71% with less than 1% relative accuracy loss.- [135] apply singular value decomposition on each layer of a deep CNN to reduce the model size by 71% with less than 1% relative accuracy loss.
- XNOR-Net [148] applies convolutional BNNs on the ImageNet dataset with topologies inspired by AlexNet, ResNet and GoogLeNet, reporting top-1 accuracies of up to 51.2% for full binarization and 65.5% for partial binarization.
- The authors introduce some recent works that apply CNNs to achieve state-of-the-art performance, including image classification, object tracking, pose estimation, text detection, visual saliency detection, action recognition, scene labeling, speech and natural language processing
Conclusion:
Conclusions and Outlook
Deep CNNs have made breakthroughs in processing image, video, speech and text.- To speed up training procedure, there are already some asynchronous SGD algorithms which have shown promising result by using CPU and GPU clusters, it is still worth to develop effective and scalable parallel training algorithms
- At testing time, these deep models are highly memory demanding and timeconsuming, which makes them not suitable to be deployed on mobile platforms that have limited resources.
- It is important to investigate how to reduce the complexity and obtain fast-to-execute models without loss of accuracy
Funding
- The ROSE Lab is supported by the Infocomm Media Development Authority, Singapore
Reference
- D. H. Hubel, T. N. Wiesel, Receptive fields and functional architecture of monkey striate cortex, The Journal of physiology (1968) 215–243.
- K. Fukushima, S. Miyake, Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition, in: Competition and cooperation in neural nets, 1982, pp. 267–285.
- B. B. Le Cun, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, Handwritten digit recognition with a back-propagation network, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 1989, pp. 396–404.
- Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of IEEE 86 (11) (1998) 2278–2324.
- R. Hecht-Nielsen, Theory of the backpropagation neural network, Neural Networks 1 (Supplement-1) (1988) 445–448.
- W. Zhang, K. Itoh, J. Tanida, Y. Ichioka, Parallel distributed processing model with local space-invariant interconnections and its optical architecture, Applied optics 29 (32) (1990) 4790–4797.
- X.-X. Niu, C. Y. Suen, A novel hybrid cnn–svm classifier for recognizing handwritten digits, Pattern Recognition 45 (4) (2012) 1318–1325.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., Imagenet large scale visual recognition challenge, International Journal of Conflict and Violence (IJCV) 115 (3) (2015) 211–252.
- K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: Proceedings of the International Conference on Learning Representations (ICLR), 2015.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
- M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in: Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 818–833.
- K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- Y. A. LeCun, L. Bottou, G. B. Orr, K.-R. Muller, Efficient backprop, in: Neural Networks: Tricks of the Trade - Second Edition, 2012, pp. 9–48.
- V. Nair, G. E. Hinton, Rectified linear units improve restricted boltzmann machines, in: Proceedings of the International Conference on Machine Learning (ICML), 2010, pp. 807–814.
- T. Wang, D. J. Wu, A. Coates, A. Y. Ng, End-to-end text recognition with convolutional neural networks, in: Proceedings of the International Conference on Pattern Recognition (ICPR), 2012, pp. 3304–3308.
- Y. Boureau, J. Ponce, Y. LeCun, A theoretical analysis of feature pooling in visual recognition, in: Proceedings of the International Conference on Machine Learning (ICML), 2010, pp. 111–118.
- G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors, CoRR abs/1207.0580.
- M. Lin, Q. Chen, S. Yan, Network in network, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
- Y. Tang, Deep learning using linear support vector machines, in: Proceedings of the International Conference on Machine Learning (ICML) Workshops, 2013.
- G. Madjarov, D. Kocev, D. Gjorgjevikj, S. Dzeroski, An extensive experimental comparison of methods for multi-label learning, Pattern Recognition 45 (9) (2012) 3084–3104.
- R. G. J. Wijnhoven, P. H. N. de With, Fast training of object detection using stochastic gradient descent, in: International Conference on Pattern Recognition (ICPR), 2010, pp. 424–427.
- M. Zinkevich, M. Weimer, L. Li, A. J. Smola, Parallelized stochastic gradient descent, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2010, pp. 2595–2603.
- J. Ngiam, Z. Chen, D. Chia, P. W. Koh, Q. V. Le, A. Y. Ng, Tiled convolutional neural networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2010, pp. 1279–1287.
- Z. Wang, T. Oates, Encoding time series as images for visual inspection and classification using tiled convolutional neural networks, in: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) Workshops, 2015.
- Y. Zheng, Q. Liu, E. Chen, Y. Ge, J. L. Zhao, Time series classification using multi-channels deep convolutional neural networks, in: Proceedings of the International Conference on Web-Age Information Management (WAIM), 2014, pp. 298–310.
- M. D. Zeiler, D. Krishnan, G. W. Taylor, R. Fergus, Deconvolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2528–2535.
- M. D. Zeiler, G. W. Taylor, R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in: Proceedings of the International Conference on Computer Vision (ICCV), 2011, pp. 2018–2025.
- J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 39 (4) (2017) 640–651.
- F. Visin, K. Kastner, A. Courville, Y. Bengio, M. Matteucci, K. Cho, Reseg: A recurrent neural network for object segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2015.
- H. Noh, S. Hong, B. Han, Learning deconvolution network for semantic segmentation, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1520–1528.
- C. Cao, X. Liu, Y. Yang, Y. Yu, J. Wang, Z. Wang, Y. Huang, L. Wang, C. Huang, W. Xu, et al., Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 2956–2964.
- J. Zhang, Z. Lin, J. Brandt, X. Shen, S. Sclaroff, Top-down neural attention by excitation backprop, in: Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 543–559.
- Y. Zhang, E. K. Lee, E. H. Lee, U. EDU, Augmenting supervised neural networks with unsupervised objectives for large-scale image classification, in: Proceedings of the International Conference on Machine Learning (ICML), 2016, pp. 612–621.
- B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2921–2929.
- A. Das, H. Agrawal, C. L. Zitnick, D. Parikh, D. Batra, Human attention in visual question answering: Do humans and deep networks look at the same regions?, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016, pp. 932–937.
- C. Dong, C. C. Loy, K. He, X. Tang, Image super-resolution using deep convolutional networks, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 38 (2) (2016) 295–307.
- F. Yu, V. Koltun, Multi-scale context aggregation by dilated convolutions, in: Proceedings of the International Conference on Learning Representations (ICLR), 2016.
- N. Kalchbrenner, L. Espeholt, K. Simonyan, A. v. d. Oord, A. Graves, K. Kavukcuoglu, Neural machine translation in linear time, CoRR abs/1610.10099.
- [40] T. Sercu, V. Goel, Dense prediction on sequences with time-dilated convolutions for speech recognition, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS) Workshops, 2016.
- [41] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826.
- [42] C. Szegedy, S. Ioffe, V. Vanhoucke, Inception-v4, inception-resnet and the impact of residual connections on learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2017, pp. 4278–4284.
- [43] A. Hyvarinen, U. Koster, Complex cell pooling and the statistics of natural images, Network: Computation in Neural Systems 18 (2) (2007) 81–100.
- [44] J. B. Estrach, A. Szlam, Y. Lecun, Signal recovery from pooling representations, in: Proceedings of the International Conference on Machine Learning (ICML), 2014, pp. 307–315.
- [45] L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, R. Fergus, Regularization of neural networks using dropconnect, in: Proceedings of the International Conference on Machine Learning (ICML), 2013, pp. 1058–1066.
- [46] D. Yu, H. Wang, P. Chen, Z. Wei, Mixed pooling for convolutional neural networks, in: Proceedings of the Rough Sets and Knowledge Technology (RSKT), 2014, pp. 364–375.
- [47] M. D. Zeiler, R. Fergus, Stochastic pooling for regularization of deep convolutional neural networks, in: Proceedings of the International Conference on Learning Representations (ICLR), 2013.
- [48] O. Rippel, J. Snoek, R. P. Adams, Spectral representations for convolutional neural networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 2449–2457.
- [49] M. Mathieu, M. Henaff, Y. LeCun, Fast training of convolutional networks through ffts, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
- [50] K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 37 (9) (2015) 1904–1916.
- [51] S. Singh, A. Gupta, A. Efros, Unsupervised discovery of mid-level discriminative patches, in: Proceedings of the European Conference on Computer Vision (ECCV), 2012, pp. 73–86.
- [52] Y. Gong, L. Wang, R. Guo, S. Lazebnik, Multi-scale orderless pooling of deep convolutional activation features, in: Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 392–407.
- [53] H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, C. Schmid, Aggregating local image descriptors into compact codes, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 34 (9) (2012) 1704–1716.
- [54] A. L. Maas, A. Y. Hannun, A. Y. Ng, Rectifier nonlinearities improve neural network acoustic models, in: Proceedings of the International Conference on Machine Learning (ICML), Vol. 30, 2013.
- [55] M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, et al., On rectified linear units for speech processing, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013, pp. 3517–3521.
- [56] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034.
- [57] B. Xu, N. Wang, T. Chen, M. Li, Empirical evaluation of rectified activations in convolutional network, in: Proceedings of the International Conference on Machine Learning (ICML) Workshop, 2015.
- [58] D.-A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (elus), in: Proceedings of the International Conference on Learning Representations (ICLR), 2016.
- [59] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, Y. Bengio, Maxout networks, in: Proceedings of the International Conference on Machine Learning (ICML), 2013, pp. 1319–1327.
- [60] J. T. Springenberg, M. Riedmiller, Improving deep neural networks with probabilistic maxout units, CoRR abs/1312.6116.
- [61] T. Zhang, Solving large scale linear prediction problems using stochastic gradient descent algorithms, in: Proceedings of the International Conference on Machine Learning (ICML), 2004.
- [62] L. Deng, The mnist database of handwritten digit images for machine learning research, IEEE Signal Processing Magazine 29 (6) (2012) 141–142.
- [63] W. Liu, Y. Wen, Z. Yu, M. Yang, Large-margin softmax loss for convolutional neural networks, in: Proceedings of the International Conference on Machine Learning (ICML), 2016, pp. 507–516.
- [64] J. Bromley, J. W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore, E. Sackinger, R. Shah, Signature verification using a siamese time delay neural network, International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI) 7 (4) (1993) 669–688.
- [65] S. Chopra, R. Hadsell, Y. LeCun, Learning a similarity metric discriminatively, with application to face verification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005, pp. 539–546.
- [66] R. Hadsell, S. Chopra, Y. LeCun, Dimensionality reduction by learning an invariant mapping, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006, pp. 1735–1742.
- [67] U. Shaham, R. R. Lederman, Learning by coincidence: Siamese networks and common variable learning, Pattern Recognition.
- [68] J. Lin, O. Morere, V. Chandrasekhar, A. Veillard, H. Goh, Deephash: Getting regularization, depth and fine-tuning right, in: Proceedings of the International Conference on Multimedia Retrieval (ICMR), 2017, pp. 133–141.
- [69] F. Schroff, D. Kalenichenko, J. Philbin, Facenet: A unified embedding for face recognition and clustering, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 815–823.
- [70] H. Liu, Y. Tian, Y. Yang, L. Pang, T. Huang, Deep relative distance learning: Tell the difference between similar vehicles, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2167–2175.
- [71] S. Ding, L. Lin, G. Wang, H. Chao, Deep feature learning with relative distance comparison for person re-identification, Pattern Recognition 48 (10) (2015) 2993–3003.
- [72] Z. Liu, P. Luo, S. Qiu, X. Wang, X. Tang, Deepfashion: Powering robust clothes recognition and retrieval with rich annotations, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1096–1104.
- [73] D. P. Kingma, M. Welling, Auto-encoding variational bayes, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
- [74] D. J. Im, S. Ahn, R. Memisevic, Y. Bengio, Denoising criterion for variational auto-encoding framework, in: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), 2017, pp. 2059–2065.
- [75] D. P. Kingma, S. Mohamed, D. J. Rezende, M. Welling, Semi-supervised learning with deep generative models, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 3581–3589.
- [76] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 2672– 2680.
- [77] M. Mirza, S. Osindero, Conditional generative adversarial nets, CoRR abs/1411.1784.
- [78] P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders, in: Proceedings of the International Conference on Machine Learning (ICML), 2008, pp. 1096–1103.
- [79] W. W. Ng, G. Zeng, J. Zhang, D. S. Yeung, W. Pedrycz, Dual autoencoders features for imbalance classification problem, Pattern Recognition 60 (2016) 875–889.
- [80] J. Mehta, A. Majumdar, Rodeo: robust de-aliasing autoencoder for real-time medical image reconstruction, Pattern Recognition 63 (2017) 499–510.
- [81] B. A. Olshausen, et al., Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature 381 (6583) (1996) 607.
- [82] H. Lee, A. Battle, R. Raina, A. Y. Ng, Efficient sparse coding algorithms, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2006, pp. 801–808.
- [83] S. Eslami, N. Heess, T. Weber, Y. Tassa, K. Kavukcuoglu, G. E. Hinton, Attend, infer, repeat: Fast scene understanding with generative models, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016, pp. 3225–3233.
- [84] K. Sohn, H. Lee, X. Yan, Learning structured output representation using deep conditional generative models, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 3483–3491.
- [85] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial text to image synthesis, in: Proceedings of the International Conference on Machine Learning (ICML), 2016, pp. 1060–1069.
- [86] E. L. Denton, S. Chintala, R. Fergus, et al., Deep generative image models using a laplacian pyramid of adversarial networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 1486–1494.
- [87] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training gans, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016, pp. 2226–2234.
- [88] A. Dosovitskiy, T. Brox, Generating images with perceptual similarity metrics based on deep networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016, pp. 658–666.
- [89] A. N. Tikhonov, On the stability of inverse problems, in: Dokl. Akad. Nauk SSSR, Vol. 39, 1943, pp. 195–198.
- [90] S. Wang, C. Manning, Fast dropout training, in: Proceedings of the International Conference on Machine Learning (ICML), 2013, pp. 118–126.
- [91] J. Ba, B. Frey, Adaptive dropout for training deep neural networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2013, pp. 3084–3092.
- [92] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, C. Bregler, Efficient object localization using convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 648–656.
- [93] H. Yang, I. Patras, Mirror, mirror on the wall, tell me, is the error small?, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4685–4693.
- [94] S. Xie, Z. Tu, Holistically-nested edge detection, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1395–1403.
- [95] J. Salamon, J. P. Bello, Deep convolutional neural networks and data augmentation for environmental sound classification, Signal Processing Letters (SPL) 24 (3) (2017) 279–283.
- [96] D. Eigen, R. Fergus, Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 2650–2658.
- [97] M. Paulin, J. Revaud, Z. Harchaoui, F. Perronnin, C. Schmid, Transformation pursuit for image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 3646–3653.
- [98] S. Hauberg, O. Freifeld, A. B. L. Larsen, J. W. Fisher III, L. K. Hansen, Dreaming more data: Class-dependent distributions over diffeomorphisms for learned data augmentation, in: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2016, pp. 342–350.
- [99] S. Xie, T. Yang, X. Wang, Y. Lin, Hyper-class augmented and regularized deep learning for fine-grained image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 2645–2654.
- [100] Z. Xu, S. Huang, Y. Zhang, D. Tao, Augmenting strong supervision using web data for fine-grained categorization, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 2524–2532.
- [101] A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, Y. LeCun, The loss surfaces of multilayer networks, in: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2015.
- [102] D. Mishkin, J. Matas, All you need is a good init, in: Proceedings of the International Conference on Learning Representations (ICLR), 2016.
- [103] I. Sutskever, J. Martens, G. Dahl, G. Hinton, On the importance of initialization and momentum in deep learning, in: Proceedings of the International Conference on Machine Learning (ICML), 2013, pp. 1139–1147.
- [104] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2010, pp. 249–256.
- [105] A. M. Saxe, J. L. McClelland, S. Ganguli, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
- [106] C. Doersch, A. Gupta, A. A. Efros, Unsupervised visual representation learning by context prediction, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1422–1430.
- [107] P. Agrawal, J. Carreira, J. Malik, Learning to see by moving, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 37–45.
- [108] N. Qian, On the momentum term in gradient descent learning algorithms, Neural Networks 12 (1) (1999) 145–151.
- [109] D. Kingma, J. Ba, Adam: A method for stochastic optimization, in: Proceedings of the International Conference on Learning Representations (ICLR), 2015.
- [110] I. Loshchilov, F. Hutter, Sgdr: Stochastic gradient descent with warm restarts, in: Proceedings of the International Conference on Learning Representations (ICLR), 2017.
- [111] T. Schaul, S. Zhang, Y. LeCun, No more pesky learning rates, in: Proceedings of the International Conference on Machine Learning (ICML), 2013, pp. 343–351.
- [112] S. Zhang, A. E. Choromanska, Y. LeCun, Deep learning with elastic averaging sgd, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 685–693.
- [113] B. Recht, C. Re, S. Wright, F. Niu, Hogwild: A lock-free approach to parallelizing stochastic gradient descent, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2011, pp. 693–701.
- [114] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, et al., Large scale distributed deep networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2012, pp. 1232–1240.
- [115] T. Paine, H. Jin, J. Yang, Z. Lin, T. Huang, Gpu asynchronous stochastic gradient descent to speed up neural network training, CoRR abs/1107.2490.
- [116] Y. Zhuang, W.-S. Chin, Y.-C. Juan, C.-J. Lin, A fast parallel sgd for matrix factorization in shared memory systems, in: Proceedings of the ACM conference on Recommender systems RecSys, 2013, pp. 249–256.
- [117] Y. Yao, L. Rosasco, A. Caponnetto, On early stopping in gradient descent learning, Constructive Approximation 26 (2) (2007) 289–315.
- [118] L. Prechelt, Early stopping - but when?, in: Neural Networks: Tricks of the Trade - Second Edition, 2012, pp. 53–67.
- [119] C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, Understanding deep learning requires rethinking generalization, in: Proceedings of the International Conference on Learning Representations (ICLR), 2017.
- [120] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Journal of Machine Learning Research (JMLR) (2015) 448–456.
- [121] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation 9 (8) (1997) 1735–1780.
- [122] R. K. Srivastava, K. Greff, J. Schmidhuber, Training very deep networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 2377–2385.
- [123] K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, in: Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 630–645.
- [124] F. Shen, R. Gan, G. Zeng, Weighted residuals for very deep networks, in: Proceedings of the International Conference on Systems and Informatics (ICSAI), 2016, pp. 936–941.
- [125] S. Zagoruyko, N. Komodakis, Wide residual networks, in: Proceedings of the British Machine Vision Conference (BMVC), 2016, pp. 87.1–87.12.
- [126] S. Singh, D. Hoiem, D. Forsyth, Swapout: Learning an ensemble of deep architectures, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016, pp. 28–36.
- [127] S. Targ, D. Almeida, K. Lyman, Resnet in resnet: Generalizing residual architectures, CoRR abs/1603.08029.
- [128] K. Zhang, M. Sun, T. X. Han, X. Yuan, L. Guo, T. Liu, Residual networks of residual networks: Multilevel residual networks, IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) PP (99) (2016) 1–1.
- [129] G. Huang, Z. Liu, K. Q. Weinberger, Densely connected convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4700–4708.
- [130] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, E. Shelhamer, cudnn: Efficient primitives for deep learning abs/1410.0759.
- [131] N. Vasilache, J. Johnson, M. Mathieu, S. Chintala, S. Piantino, Y. LeCun, Fast convolutional nets with fbfft: A gpu performance evaluation, in: Proceedings of the International Conference on Learning Representations (ICLR), 2015.
- [132] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, Overfeat: Integrated recognition, localization and detection using convolutional networks.
- [133] A. Lavin, Fast algorithms for convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4013–4021.
- [134] T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, B. Ramabhadran, Low-rank matrix factorization for deep neural network training with high-dimensional output targets, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013, pp. 6655–6659.
- [135] J. Xue, J. Li, Y. Gong, Restructuring of deep neural network acoustic models with singular value decomposition, in: Proceedings of the International Speech Communication Association (INTERSPEECH), 2013, pp. 2365–2369.
- [136] M. Denil, B. Shakibi, L. Dinh, N. de Freitas, et al., Predicting parameters in deep learning, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2013, pp. 2148–2156.
- [137] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, R. Fergus, Exploiting linear structure within convolutional networks for efficient evaluation, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 1269–1277.
- [138] M. Jaderberg, A. Vedaldi, A. Zisserman, Speeding up convolutional neural networks with low rank expansions, in: Proceedings of the British Machine Vision Conference (BMVC), 2014.
- [139] A. Novikov, D. Podoprikhin, A. Osokin, D. P. Vetrov, Tensorizing neural networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 442–450.
- [140] I. V. Oseledets, Tensor-train decomposition, SIAM J. Scientific Computing 33 (5) (2011) 2295–2317.
- [141] Q. Le, T. Sarlos, A. Smola, Fastfood-approximating kernel expansions in loglinear time, in: Proceedings of the International Conference on Machine Learning (ICML), Vol. 85, 2013.
- [142] A. Dasgupta, R. Kumar, T. Sarlos, Fast locality-sensitive hashing, in: Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2011, pp. 1073–1081.
- [143] F. X. Yu, S. Kumar, Y. Gong, S.-F. Chang, Circulant binary embedding, in: Proceedings of the International Conference on Machine Learning (ICML), 2014, pp. 946–954.
- [144] Y. Cheng, F. X. Yu, R. S. Feris, S. Kumar, A. Choudhary, S.-F. Chang, An exploration of parameter redundancy in deep networks with circulant projections, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 2857–2865.
- [145] M. Moczulski, M. Denil, J. Appleyard, N. de Freitas, Acdc: A structured efficient linear layer, in: Proceedings of the International Conference on Learning Representations (ICLR), 2016.
- [146] S. Han, H. Mao, W. J. Dally, Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding, in: Proceedings of the International Conference on Learning Representations (ICLR), 2016.
- [147] M. Kim, P. Smaragdis, Bitwise neural networks, in: Proceedings of the International Conference on Machine Learning (ICML) Workshops, 2016.
- [148] M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, Xnor-net: Imagenet classification using binary convolutional neural networks, in: Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 525–542.
- [149] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, Y. Zou, Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients, CoRR abs/1606.06160.
- [150] M. Courbariaux, Y. Bengio, Binarynet: Training deep neural networks with weights and activations constrained to+ 1 or-1, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016.
- [151] G. J. Sullivan, Efficient scalar quantization of exponential and laplacian random variables, IEEE Trans. Information Theory 42 (5) (1996) 1365–1374.
- [152] Y. Gong, L. Liu, M. Yang, L. Bourdev, Compressing deep convolutional networks using vector quantization, in: arXiv preprint arXiv:1412.6115, Vol. abs/1412.6115, 2014.
- [153] Y. Chen, T. Guan, C. Wang, Approximate nearest neighbor search by residual vector quantization, Sensors 10 (12) (2010) 11259–11273.
- [154] W. Zhou, Y. Lu, H. Li, Q. Tian, Scalar quantization for large scale image search, in: Proceedings of the 20th ACM international conference on Multimedia, 2012, pp. 169–178.
- [155] L. Y. Pratt, Comparing biases for minimal network construction with back-propagation, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 1988, pp. 177–185.
- [156] S. Han, J. Pool, J. Tran, W. Dally, Learning both weights and connections for efficient neural network, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 1135–1143.
- [157] Y. Guo, A. Yao, Y. Chen, Dynamic network surgery for efficient dnns, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016, pp. 1379–1387.
- [158] T.-J. Yang, Y.-H. Chen, V. Sze, Designing energy-efficient convolutional neural networks using energy-aware pruning, CoRR abs/1611.05128.
- [159] H. Hu, R. Peng, Y.-W. Tai, C.-K. Tang, Network trimming: A data-driven neuron pruning approach towards efficient deep architectures, Vol. abs/1607.03250, 2016.
- [160] S. Srinivas, R. V. Babu, Data-free parameter pruning for deep neural networks, in: Proceedings of the British Machine Vision Conference (BMVC), 2015.
- [161] Z. Mariet, S. Sra, Diversity networks, in: Proceedings of the International Conference on Learning Representations (ICLR), 2015.
- [162] W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, Y. Chen, Compressing neural networks with the hashing trick, in: Proceedings of the International Conference on Machine Learning (ICML), 2015, pp. 2285–2294.
- [163] Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, S. Vishwanathan, Hash kernels for structured data, Journal of Machine Learning Research (JMLR) 10 (2009) 2615–2637.
- [164] K. Weinberger, A. Dasgupta, J. Langford, A. Smola, J. Attenberg, Feature hashing for large scale multitask learning, in: Proceedings of the International Conference on Machine Learning (ICML), 2009, pp. 1113–1120.
- [165] B. Liu, M. Wang, H. Foroosh, M. Tappen, M. Pensky, Sparse convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 806–814.
- [166] W. Wen, C. Wu, Y. Wang, Y. Chen, H. Li, Learning structured sparsity in deep neural networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016, pp. 2074–2082.
- [167] H. Bagherinezhad, M. Rastegari, A. Farhadi, Lcnn: Lookup-based convolutional neural network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- [168] M. Egmont-Petersen, D. de Ridder, H. Handels, Image processing with neural networksa review, Pattern recognition 35 (10) (2002) 2279–2301.
- [169] K. Nogueira, O. A. Penatti, J. A. dos Santos, Towards better exploiting convolutional neural networks for remote sensing scene classification, Pattern Recognition 61 (2017) 539–556.
- [170] Z. Zuo, G. Wang, B. Shuai, L. Zhao, Q. Yang, Exemplar based deep discriminative and shareable feature learning for scene image classification, Pattern Recognition 48 (10) (2015) 3004–3015.
- [171] A. T. Lopes, E. de Aguiar, A. F. De Souza, T. Oliveira-Santos, Facial expression recognition with convolutional neural networks: Coping with few data and the training sample order, Pattern Recognition 61 (2017) 610–628.
- [172] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, A. Zisserman, The pascal visual object classes challenge: A retrospective, International Journal of Conflict and Violence (IJCV) 111 (1) (2015) 98–136.
- [173] A.-M. Tousch, S. Herbin, J.-Y. Audibert, Semantic hierarchies for image annotation: A survey, Pattern Recognition 45 (1) (2012) 333–345.
- [174] N. Srivastava, R. R. Salakhutdinov, Discriminative transfer learning with tree-based priors, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2013, pp. 2094–2102.
- [175] Z. Wang, X. Wang, G. Wang, Learning fine-grained features via a cnn tree for large-scale classification, CoRR abs/1511.04534.
- [176] T. Xiao, J. Zhang, K. Yang, Y. Peng, Z. Zhang, Error-driven incremental learning in deep convolutional neural network for large-scale image classification, in: Proceedings of the ACM Multimedia Conference, 2014, pp. 177–186.
- [177] Z. Yan, V. Jagadeesh, D. DeCoste, W. Di, R. Piramuthu, Hd-cnn: Hierarchical deep convolutional neural network for image classification, in: Proceedings of the International Conference on Computer Vision (ICCV), pp. 2740–2748.
- [178] T. Berg, J. Liu, S. W. Lee, M. L. Alexander, D. W. Jacobs, P. N. Belhumeur, Birdsnap: Large-scale fine-grained visual categorization of birds, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 2019–2026.
- [179] A. Khosla, N. Jayadevaprakash, B. Yao, F.-F. Li, Novel dataset for fine-grained image categorization: Stanford dogs, in: Proceedings of the IEEE International Conference on Computer Vision (CVPR Workshops, Vol. 2, 2011, p. 1.
- [180] L. Yang, P. Luo, C. C. Loy, X. Tang, A large-scale car dataset for fine-grained categorization and verification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3973–3981.
- [181] M. Minervini, A. Fischbach, H. Scharr, S. A. Tsaftaris, Finely-grained annotated datasets for image-based plant phenotyping, Pattern recognition letters 81 (2016) 80–89.
- [182] G.-S. Xie, X.-Y. Zhang, W. Yang, M.-L. Xu, S. Yan, C.-L. Liu, Lg-cnn: From local parts to global discrimination for fine-grained recognition, Pattern Recognition 71 (2017) 118–131.
- [183] S. Branson, G. Van Horn, P. Perona, S. Belongie, Improved bird species recognition using pose normalized deep convolutional nets, in: Proceedings of the British Machine Vision Conference (BMVC), 2014.
- [184] N. Zhang, J. Donahue, R. Girshick, T. Darrell, Part-based r-cnns for fine-grained category detection, in: Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 834–849.
- [185] J. R. Uijlings, K. E. van de Sande, T. Gevers, A. W. Smeulders, Selective search for object recognition, International Journal of Conflict and Violence (IJCV) 104 (2) (2013) 154–171.
- [186] D. Lin, X. Shen, C. Lu, J. Jia, Deep lac: Deep localization, alignment and classification for fine-grained recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1666–1674.
- [187] J. P. Pluim, J. A. Maintz, M. Viergever, et al., Mutual-information-based registration of medical images: a survey, IEEE Trans. Med. Imaging 22 (8) (2003) 986–1004.
- [188] J. Krause, T. Gebru, J. Deng, L.-J. Li, L. Fei-Fei, Learning features and parts for fine-grained recognition, in: Proceedings of the International Conference on Pattern Recognition (ICPR), 2014, pp. 26–33.
- [189] J. Krause, H. Jin, J. Yang, L. Fei-Fei, Fine-grained recognition without part annotations, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5546–5555.
- [190] Y. Zhang, X.-S. Wei, J. Wu, J. Cai, J. Lu, V.-A. Nguyen, M. N. Do, Weakly supervised fine-grained categorization with part-based image representation, IEEE Transactions on Image Processing 25 (4) (2016) 1713–1725.
- [191] T. Xiao, Y. Xu, K. Yang, J. Zhang, Y. Peng, Z. Zhang, The application of two-level attention models in deep convolutional neural network for fine-grained image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 842–850.
- [192] T.-Y. Lin, A. RoyChowdhury, S. Maji, Bilinear cnn models for fine-grained visual recognition, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1449–1457.
- [193] D. T. Nguyen, W. Li, P. O. Ogunbona, Human detection from images and videos: A survey, Pattern Recognition 51 (2016) 148–175.
- [194] Y. Li, S. Wang, Q. Tian, X. Ding, Feature representation for statistical-learning-based object detection: A review, Pattern Recognition 48 (11) (2015) 3542–3559.
- [195] M. Pedersoli, A. Vedaldi, J. Gonzalez, X. Roca, A coarse-to-fine approach for fast deformable object detection, Pattern Recognition 48 (5) (2015) 1844–1853.
- [196] S. J. Nowlan, J. C. Platt, A convolutional neural network hand tracker, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 1994, pp. 901–908.
- [197] R. Girshick, F. Iandola, T. Darrell, J. Malik, Deformable part models are convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 437–446.
- [198] R. Vaillant, C. Monrocq, Y. Le Cun, Original approach for the localisation of objects in images, IEE Proceedings-Vision, Image and Signal Processing 141 (4) (1994) 245–250.
- [199] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, C. L. Zitnick, Microsoft coco: Common objects in context, in: Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 740–755.
- [200] I. Endres, D. Hoiem, Category independent object proposals, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 36 (2) (2014) 222–234.
- [201] L. Gomez, D. Karatzas, Textproposals: a text-specific selective search algorithm for word spotting in the wild, Pattern Recognition 70 (2017) 60–74.
- [202] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 580–587.
- [203] K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 37 (9) (2015) 1904–1916.
- [204] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 39 (6) (2017) 1137–1149.
- [205] S. Gidaris, N. Komodakis, Object detection via a multi-region and semantic segmentation-aware cnn model, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1134–1142.
- [206] D. Yoo, S. Park, J.-Y. Lee, A. S. Paek, I. So Kweon, Attentionnet: Aggregating weak directions for accurate object detection, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 2659–2667.
- [207] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part-based models, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 32 (9) (2010) 1627–1645.
- [208] E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, F. Moreno-Noguer, Fracking deep convolutional image descriptors, CoRR abs/1412.6537.
- [209] A. Shrivastava, A. Gupta, R. Girshick, Training region-based object detectors with online hard example mining, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 761–769.
- [210] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788.
- [211] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, Ssd: Single shot multibox detector, in: Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 21–37.
- [212] Y. Lu, T. Javidi, S. Lazebnik, Adaptive object detection using adjacency and zoom prediction, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2351–2359.
- [213] K. Zhang, H. Song, Real-time visual tracking via online weighted multiple instance learning, Pattern Recognition 46 (1) (2013) 397–411.
- [214] S. Zhang, H. Yao, X. Sun, X. Lu, Sparse coding based visual tracking: Review and experimental comparison, Pattern Recognition 46 (7) (2013) 1772–1788.
- [215] S. Zhang, J. Wang, Z. Wang, Y. Gong, Y. Liu, Multi-target tracking by learning local-to-global trajectory models, Pattern Recognition 48 (2) (2015) 580–590.
- [216] J. Fan, W. Xu, Y. Wu, Y. Gong, Human tracking using convolutional neural networks, IEEE Trans. Neural Networks (TNN) 21 (10) (2010) 1610–1623.
- [217] H. Li, Y. Li, F. Porikli, Deeptrack: Learning discriminative feature representations by convolutional neural networks for visual tracking, in: Proceedings of the British Machine Vision Conference (BMVC), 2014.
- [218] Y. Chen, X. Yang, B. Zhong, S. Pan, D. Chen, H. Zhang, Cnntracker: Online discriminative object tracking via deep convolutional neural network, Appl. Soft Comput. 38 (2016) 1088–1098.
- [219] S. Hong, T. You, S. Kwak, B. Han, Online tracking by learning discriminative saliency map with convolutional neural network, in: Proceedings of the International Conference on Machine Learning (ICML), 2015, pp. 597–606.
- [220] M. Patacchiola, A. Cangelosi, Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods, Pattern Recognition 71 (2017) 132–143.
- [221] K. Nishi, J. Miura, Generation of human depth images with body part labels for complex human pose recognition, Pattern Recognition.
- [222] A. Toshev, C. Szegedy, Deeppose: Human pose estimation via deep neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1653–1660.
- [223] A. Jain, J. Tompson, M. Andriluka, G. W. Taylor, C. Bregler, Learning human pose estimation features with convolutional networks, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
- [224] J. J. Tompson, A. Jain, Y. LeCun, C. Bregler, Joint training of a convolutional network and a graphical model for human pose estimation, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 1799–1807.
- [225] X. Chen, A. L. Yuille, Articulated pose estimation by a graphical model with image dependent pairwise relations, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 1736–1744.
- [226] X. Chen, A. Yuille, Parsing occluded people by flexible compositions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3945–3954.
- [227] X. Fan, K. Zheng, Y. Lin, S. Wang, Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1347–1355.
- [228] A. Jain, J. Tompson, Y. LeCun, C. Bregler, Modeep: A deep learning framework using motion features for human pose estimation, in: Proceedings of the Asian Conference on Computer Vision (ACCV), 2014, pp. 302–315.
- [229] Y. Y. Tang, S.-W. Lee, C. Y. Suen, Automatic document processing: a survey, Pattern recognition 29 (12) (1996) 1931–1952.
- [230] A. Vinciarelli, A survey on off-line cursive word recognition, Pattern recognition 35 (7) (2002) 1433–1446.
- [231] K. Jung, K. I. Kim, A. K. Jain, Text information extraction in images and video: a survey, Pattern recognition 37 (5) (2004) 977–997.
- [232] S. Eskenazi, P. Gomez-Kramer, J.-M. Ogier, A comprehensive survey of mostly textual document segmentation algorithms since 2008, Pattern Recognition 64 (2017) 1–14.
- [233] X. Bai, B. Shi, C. Zhang, X. Cai, L. Qi, Text/non-text image classification in the wild with convolutional neural networks, Pattern Recognition 66 (2017) 437–446.
- [234] L. Gomez, A. Nicolaou, D. Karatzas, Improving patch-based scene text script identification with ensembles of conjoined networks, Pattern Recognition 67 (2017) 85–96.
- [235] M. Delakis, C. Garcia, Text detection with convolutional neural networks, in: Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), 2008, pp. 290–294.
- [236] H. Xu, F. Su, Robust seed localization and growing with deep convolutional features for scene text detection, in: Proceedings of the International Conference on Multimedia Retrieval (ICMR), 2015, pp. 387–394.
- [237] W. Huang, Y. Qiao, X. Tang, Robust scene text detection with convolution neural network induced mser trees, in: Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 497–511.
- [238] C. Zhang, C. Yao, B. Shi, X. Bai, Automatic discrimination of text and non-text natural images, in: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2015, pp. 886–890.
- [239] I. J. Goodfellow, J. Ibarz, S. Arnoud, V. Shet, Multi-digit number recognition from street view imagery using deep convolutional neural networks, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
- [240] M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Deep structured output learning for unconstrained text recognition, in: Proceedings of the International Conference on Learning Representations (ICLR), 2015.
- [241] P. He, W. Huang, Y. Qiao, C. C. Loy, X. Tang, Reading scene text in deep convolutional sequences, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2016, pp. 3501–3508.
- [242] F. A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: Continual prediction with lstm, Neural Computation 12 (10) (2000) 2451–2471.
- [243] B. Shi, X. Bai, C. Yao, An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition, CoRR abs/1507.05717.
- [244] M. Jaderberg, A. Vedaldi, A. Zisserman, Deep features for text spotting, in: Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 512–528.
- [245] M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Reading text in the wild with convolutional neural networks, Vol. 116, 2016, pp. 1–20.
- [246] L. Wang, H. Lu, X. Ruan, M.-H. Yang, Deep networks for saliency detection via local estimation and global search, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3183–3192.
- [247] R. Zhao, W. Ouyang, H. Li, X. Wang, Saliency detection by multi-context deep learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1265–1274.
- [248] G. Li, Y. Yu, Visual saliency based on multiscale deep features, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5455–5463.
- [249] N. Liu, J. Han, D. Zhang, S. Wen, T. Liu, Predicting eye fixations using convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 362–370.
- [250] S. He, R. W. Lau, W. Liu, Z. Huang, Q. Yang, Supercnn: A superpixelwise convolutional neural network for salient object detection, International Journal of Conflict and Violence (IJCV) 115 (3) (2015) 330–344.
- [251] E. Vig, M. Dorr, D. Cox, Large-scale optimization of hierarchical features for saliency prediction in natural images, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 2798–2805.
- [252] M. Kmmerer, L. Theis, M. Bethge, Deep gaze i: Boosting saliency prediction with feature maps trained on imagenet, in: Proceedings of the International Conference on Learning Representations (ICLR) Workshops, 2015.
- [253] J. Pan, X. Gir-i Nieto, End-to-end convolutional network for saliency prediction, CoRR abs/1507.01422.
- [254] G. Guo, A. Lai, A survey on still image based human action recognition, Pattern Recognition 47 (10) (2014) 3343–3361.
- [255] L. L. Presti, M. La Cascia, 3d skeleton-based human action classification: A survey, Pattern Recognition 53 (2016) 130–147.
- [256] J. Zhang, W. Li, P. O. Ogunbona, P. Wang, C. Tang, Rgb-d-based action recognition datasets: A survey, Pattern Recognition 60 (2016) 86–105.
- [257] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell, Decaf: A deep convolutional activation feature for generic visual recognition (2014).
- [258] M. Oquab, L. Bottou, I. Laptev, J. Sivic, Learning and transferring mid-level image representations using convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1717–1724.
- [259] G. Gkioxari, R. Girshick, J. Malik, Actions and attributes from wholes and parts, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 2470–2478.
- [260] L. Pishchulin, M. Andriluka, P. Gehler, B. Schiele, Poselet conditioned pictorial structures, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 588–595.
- [261] G. Gkioxari, R. B. Girshick, J. Malik, Contextual action recognition with r*cnn, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1080–1088.
- [262] G. Gkioxari, R. Girshick, J. Malik, Actions and attributes from wholes and parts, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 2470–2478.
- [263] Y. Zhang, L. Cheng, J. Wu, J. Cai, M. N. Do, J. Lu, Action recognition in still images with minimum annotation efforts, IEEE Transactions on Image Processing 25 (11) (2016) 5479–5490.
- [264] L. Wang, L. Ge, R. Li, Y. Fang, Three-stream cnns for action recognition, Pattern Recognition Letters 92 (2017) 33–40.
- [265] S. Ji, W. Xu, M. Yang, K. Yu, 3d convolutional neural networks for human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 35 (1) (2013) 221–231.
- [266] D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, Learning spatiotemporal features with 3d convolutional networks, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 4489–4497.
- [267] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1725–1732.
- [268] K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos, in: Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, K. Weinberger (Eds.), Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 568–576.
- [269] G. Cheron, I. Laptev, C. Schmid, P-CNN: pose-based CNN features for action recognition, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 3218–3226.
- [270] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell, Long-term recurrent convolutional networks for visual recognition and description, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 39 (4) (2017) 677–691.
- [271] K.-S. Fu, J. Mui, A survey on image segmentation, Pattern recognition 13 (1) (1981) 3–16.
- [272] Q. Zhou, B. Zheng, W. Zhu, L. J. Latecki, Multi-scale context for scene labeling via flexible segmentation graph, Pattern Recognition 59 (2016) 312–324.
- [273] F. Liu, G. Lin, C. Shen, Crf learning with cnn features for image segmentation, Pattern Recognition 48 (10) (2015) 2983–2992.
- [274] S. Bu, P. Han, Z. Liu, J. Han, Scene parsing using inference embedded deep networks, Pattern Recognition 59 (2016) 188–198.
- [275] B. Peng, L. Zhang, D. Zhang, A survey of graph theoretical approaches to image segmentation, Pattern Recognition 46 (3) (2013) 1020–1038.
- [276] C. Farabet, C. Couprie, L. Najman, Y. LeCun, Learning hierarchical features for scene labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 35 (8) (2013) 1915–1929.
- [277] C. Couprie, C. Farabet, L. Najman, Y. LeCun, Indoor semantic segmentation using depth information, in: Proceedings of the International Conference on Learning Representations (ICLR), 2013.
- [278] P. Pinheiro, R. Collobert, Recurrent convolutional neural networks for scene labeling, in: Proceedings of the International Conference on Machine Learning (ICML), 2014, pp. 82–90.
- [279] B. Shuai, G. Wang, Z. Zuo, B. Wang, L. Zhao, Integrating parametric and non-parametric models for scene labeling, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4249–4258.
- [280] B. Shuai, Z. Zuo, W. Gang, Quaddirectional 2d-recurrent neural networks for image labeling 22 (11) (2015) 1990–1994.
- [281] B. Shuai, Z. Zuo, G. Wang, B. Wang, Dag-recurrent neural networks for scene labeling, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3620–3629.
- [282] M. Mostajabi, P. Yadollahpour, G. Shakhnarovich, Feedforward semantic segmentation with zoom-out features, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3376–3385.
- [283] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected crfs, in: Proceedings of the International Conference on Learning Representations (ICLR), 2015.
- [284] M. El Ayadi, M. S. Kamel, F. Karray, Survey on speech emotion recognition: Features, classification schemes, and databases, Pattern Recognition 44 (3) (2011) 572–587.
- [285] L. Deng, P. Kenny, M. Lennig, V. Gupta, F. Seitz, P. Mermelstein, Phonemic hidden markov models with continuous mixture output densities for large vocabulary word recognition, IEEE Trans. Signal Processing 39 (7) (1991) 1677–1681.
- [286] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag. 29 (6) (2012) 82–97.
- [287] L. Deng, J. Li, J.-T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams, et al., Recent advances in deep learning for speech research at microsoft, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013, pp. 8604–8608.
- [288] K. Yao, D. Yu, F. Seide, H. Su, L. Deng, Y. Gong, Adaptation of context-dependent deep neural networks for automatic speech recognition, in: Proceedings of the Spoken Language Technology (SLT), 2012, pp. 366–369.
- [289] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, G. Penn, Applying convolutional neural networks concepts to hybrid nnhmm model for speech recognition, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012, pp. 4277–4280.
- [290] O. Abdel-Hamid, A.-R. Mohamed, H. Jiang, L. Deng, G. Penn, D. Yu, Convolutional neural networks for speech recognition, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
- [291] D. Palaz, R. Collobert, M. M. Doss, Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks, in: Proceedings of the International Speech Communication Association (INTERSPEECH), 2013, pp. 1766–1770.
- [292] Y. Hoshen, R. J. Weiss, K. W. Wilson, Speech acoustic modeling from raw multichannel waveforms, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015, pp. 4624–4628.
- [293] D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, et al., Deep speech 2: End-to-end speech recognition in english and mandarin, 2016, pp. 173–182.
- [294] T. Sercu, V. Goel, Advances in very deep convolutional neural networks for lvcsr, in: Proceedings of the International Speech Communication Association (INTERSPEECH), 2016, pp. 3429–3433.
- [295] L. Toth, Convolutional deep maxout networks for phone recognition., in: Proceedings of the International Speech Communication Association (INTERSPEECH), 2014, pp. 1078–1082.
- [296] T. N. Sainath, B. Kingsbury, A. Mohamed, G. E. Dahl, G. Saon, H. Soltau, T. Beran, A. Y. Aravkin, B. Ramabhadran, Improvements to deep convolutional neural networks for LVCSR, in: Proceedings of the Automatic Speech Recognition and Understanding (ASRU) Workshops, 2013, pp. 315–320.
- [297] D. Yu, W. Xiong, J. Droppo, A. Stolcke, G. Ye, J. Li, G. Zweig, Deep convolutional neural networks with layer-wise context expansion and attention, in: Proceedings of the International Speech Communication Association (INTERSPEECH), 2016, pp. 17–21.
- [298] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K. J. Lang, Phoneme recognition using time-delay neural networks, IEEE Trans. Acoustics, Speech, and Signal Processing 37 (3) (1989) 328–339.
- [299] L.-H. Chen, T. Raitio, C. Valentini-Botinhao, J. Yamagishi, Z.-H. Ling, Dnn-based stochastic postfilter for hmm-based speech synthesis., in: Proceedings of the International Speech Communication Association (INTERSPEECH), 2014, pp. 1954–1958.
- [300] B. Uria, I. Murray, S. Renals, C. Valentini-Botinhao, J. Bridle, Modelling acoustic feature dependencies with artificial neural networks: Trajectory-rnade, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015, pp. 4465–4469.
- [301] Z. Huang, S. M. Siniscalchi, C.-H. Lee, Hierarchical bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation, Pattern Recognition Letters.
- [302] A. van den Oord, N. Kalchbrenner, K. Kavukcuoglu, Pixel recurrent neural networks, in: Proceedings of the International Conference on Machine Learning (ICML), 2016, pp. 1747–1756.
- [303] R. Jozefowicz, O. Vinyals, M. Schuster, N. Shazeer, Y. Wu, Exploring the limits of language modeling, in: Proceedings of the International Conference on Learning Representations (ICLR), 2016.
- [304] Y. Kim, Y. Jernite, D. Sontag, A. M. Rush, Character-aware neural language models, in: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), 2016, pp. 2741–2749.
- [305] J. Gu, C. Jianfei, G. Wang, T. Chen, Stack-captioning: Coarse-to-fine learning for image captioning, Vol. abs/1709.03376, 2017.
- [306] M. Wang, Z. Lu, H. Li, W. Jiang, Q. Liu, gen cnn: A convolutional architecture for word sequence prediction, in: Proceedings of the Association for Computational Linguistics (ACL), 2015, pp. 1567–1576.
- [307] J. Gu, G. Wang, C. Jianfei, T. Chen, An empirical study of language cnn for image captioning, in: Proceedings of the International Conference on Computer Vision (ICCV), 2017.
- [308] M. A. D. G. Yann N. Dauphin, Angela Fan, Language modeling with gated convolutional networks, in: Proceedings of the International Conference on Machine Learning (ICML), 2017, pp. 933–941.
- [309] R. Collobert, J. Weston, A unified architecture for natural language processing: Deep neural networks with multitask learning, in: Proceedings of the International Conference on Machine Learning (ICML), 2008, pp. 160–167.
- [310] L. Yu, K. M. Hermann, P. Blunsom, S. Pulman, Deep learning for answer sentence selection, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS) Workshop, 2014.
- [311] N. Kalchbrenner, E. Grefenstette, P. Blunsom, A convolutional neural network for modelling sentences, in: Proceedings of the Association for Computational Linguistics (ACL), 2014, pp. 655–665.
- [312] Y. Kim, Convolutional neural networks for sentence classification, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1746–1751.
- [313] W. Yin, H. Schutze, Multichannel variable-size convolution for sentence classification, in: Proceedings of the Conference on Natural Language Learning (CoNLL), 2015, pp. 204–214.
- [314] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch, Journal of Machine Learning Research (JMLR) 12 (2011) 2493–2537.
- [315] A. Conneau, H. Schwenk, L. Barrault, Y. Lecun, Very deep convolutional networks for natural language processing, CoRR abs/1606.01781.
- [316] G. Huang, Y. Sun, Z. Liu, D. Sedra, K. Weinberger, Deep networks with stochastic depth, in: Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 646–661.
Full Text
Tags
Comments