Recent advances in convolutional neural networks

Pattern Recognition, pp. 354-377, 2018.

Cited by: 1299|Bibtex|Views268|DOI:https://doi.org/10.1016/j.patcog.2017.10.013
EI
Other Links: academic.microsoft.com|dblp.uni-trier.de|arxiv.org
Weibo:
Beyond surveying the advances of each aspect of Convolutional Neural Network, we have introduced the application of Convolutional Neural Network on many tasks, including image classification, object detection, object tracking, pose estimation, text detection, visual saliency dete...

Abstract:

We give an overview of the basic components of CNN.We discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization an...

Code:

Data:

0
Introduction
  • Convolutional Neural Network (CNN) is a well-known deep learning architecture inspired by the natural visual perception mechanism of the living creatures.
  • [3] published the seminal paper establishing the modern framework of CNN, and later improved it in [4]
  • They developed a multi-layer artificial neural network called LeNet-5 which could classify handwritten digits.
  • LeNet-5 has multiple layers and can be trained with the backpropagation algorithm [5]
  • It can obtain effective representations of the original image, which makes it possible to recognize visual patterns directly from raw pixels with little-to-none preprocessing.
  • Due to the lack of large training data and computing power at that time, their networks can not perform well on more complex problems, e.g., large-scale image and video classification
Highlights
  • Convolutional Neural Network (CNN) is a well-known deep learning architecture inspired by the natural visual perception mechanism of the living creatures
  • Kunihiko Fukushima proposed the neocognitron in 1980 [2], which could be regarded as the predecessor of Convolutional Neural Network
  • XNOR-Net [148] applies convolutional Binarized Neural Networks on the ImageNet dataset with topologies inspired by AlexNet, Residual Nets and GoogLeNet, reporting top-1 accuracies of up to 51.2% for full binarization and 65.5% for partial binarization
  • Beyond surveying the advances of each aspect of Convolutional Neural Network, we have introduced the application of Convolutional Neural Network on many tasks, including image classification, object detection, object tracking, pose estimation, text detection, visual saliency detection, action recognition, scene labeling, speech and natural language processing
  • To speed up training procedure, there are already some asynchronous Stochastic Gradient Descent algorithms which have shown promising result by using CPU and GPU clusters, it is still worth to develop effective and scalable parallel training algorithms. These deep models are highly memory demanding and timeconsuming, which makes them not suitable to be deployed on mobile platforms that have limited resources
Results
  • Xue et al . [135] apply singular value decomposition on each layer of a deep CNN to reduce the model size by 71% with less than 1% relative accuracy loss.
  • [135] apply singular value decomposition on each layer of a deep CNN to reduce the model size by 71% with less than 1% relative accuracy loss.
  • XNOR-Net [148] applies convolutional BNNs on the ImageNet dataset with topologies inspired by AlexNet, ResNet and GoogLeNet, reporting top-1 accuracies of up to 51.2% for full binarization and 65.5% for partial binarization.
  • The authors introduce some recent works that apply CNNs to achieve state-of-the-art performance, including image classification, object tracking, pose estimation, text detection, visual saliency detection, action recognition, scene labeling, speech and natural language processing
Conclusion
  • Conclusions and Outlook

    Deep CNNs have made breakthroughs in processing image, video, speech and text.
  • To speed up training procedure, there are already some asynchronous SGD algorithms which have shown promising result by using CPU and GPU clusters, it is still worth to develop effective and scalable parallel training algorithms
  • At testing time, these deep models are highly memory demanding and timeconsuming, which makes them not suitable to be deployed on mobile platforms that have limited resources.
  • It is important to investigate how to reduce the complexity and obtain fast-to-execute models without loss of accuracy
Summary
  • Introduction:

    Convolutional Neural Network (CNN) is a well-known deep learning architecture inspired by the natural visual perception mechanism of the living creatures.
  • [3] published the seminal paper establishing the modern framework of CNN, and later improved it in [4]
  • They developed a multi-layer artificial neural network called LeNet-5 which could classify handwritten digits.
  • LeNet-5 has multiple layers and can be trained with the backpropagation algorithm [5]
  • It can obtain effective representations of the original image, which makes it possible to recognize visual patterns directly from raw pixels with little-to-none preprocessing.
  • Due to the lack of large training data and computing power at that time, their networks can not perform well on more complex problems, e.g., large-scale image and video classification
  • Results:

    Xue et al . [135] apply singular value decomposition on each layer of a deep CNN to reduce the model size by 71% with less than 1% relative accuracy loss.
  • [135] apply singular value decomposition on each layer of a deep CNN to reduce the model size by 71% with less than 1% relative accuracy loss.
  • XNOR-Net [148] applies convolutional BNNs on the ImageNet dataset with topologies inspired by AlexNet, ResNet and GoogLeNet, reporting top-1 accuracies of up to 51.2% for full binarization and 65.5% for partial binarization.
  • The authors introduce some recent works that apply CNNs to achieve state-of-the-art performance, including image classification, object tracking, pose estimation, text detection, visual saliency detection, action recognition, scene labeling, speech and natural language processing
  • Conclusion:

    Conclusions and Outlook

    Deep CNNs have made breakthroughs in processing image, video, speech and text.
  • To speed up training procedure, there are already some asynchronous SGD algorithms which have shown promising result by using CPU and GPU clusters, it is still worth to develop effective and scalable parallel training algorithms
  • At testing time, these deep models are highly memory demanding and timeconsuming, which makes them not suitable to be deployed on mobile platforms that have limited resources.
  • It is important to investigate how to reduce the complexity and obtain fast-to-execute models without loss of accuracy
Funding
  • The ROSE Lab is supported by the Infocomm Media Development Authority, Singapore
Reference
  • D. H. Hubel, T. N. Wiesel, Receptive fields and functional architecture of monkey striate cortex, The Journal of physiology (1968) 215–243.
    Google ScholarLocate open access versionFindings
  • K. Fukushima, S. Miyake, Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition, in: Competition and cooperation in neural nets, 1982, pp. 267–285.
    Google ScholarFindings
  • B. B. Le Cun, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, Handwritten digit recognition with a back-propagation network, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 1989, pp. 396–404.
    Google ScholarLocate open access versionFindings
  • Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of IEEE 86 (11) (1998) 2278–2324.
    Google ScholarLocate open access versionFindings
  • R. Hecht-Nielsen, Theory of the backpropagation neural network, Neural Networks 1 (Supplement-1) (1988) 445–448.
    Google ScholarFindings
  • W. Zhang, K. Itoh, J. Tanida, Y. Ichioka, Parallel distributed processing model with local space-invariant interconnections and its optical architecture, Applied optics 29 (32) (1990) 4790–4797.
    Google ScholarLocate open access versionFindings
  • X.-X. Niu, C. Y. Suen, A novel hybrid cnn–svm classifier for recognizing handwritten digits, Pattern Recognition 45 (4) (2012) 1318–1325.
    Google ScholarLocate open access versionFindings
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., Imagenet large scale visual recognition challenge, International Journal of Conflict and Violence (IJCV) 115 (3) (2015) 211–252.
    Google ScholarLocate open access versionFindings
  • K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: Proceedings of the International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
    Google ScholarLocate open access versionFindings
  • M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in: Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 818–833.
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
    Google ScholarLocate open access versionFindings
  • Y. A. LeCun, L. Bottou, G. B. Orr, K.-R. Muller, Efficient backprop, in: Neural Networks: Tricks of the Trade - Second Edition, 2012, pp. 9–48.
    Google ScholarFindings
  • V. Nair, G. E. Hinton, Rectified linear units improve restricted boltzmann machines, in: Proceedings of the International Conference on Machine Learning (ICML), 2010, pp. 807–814.
    Google ScholarLocate open access versionFindings
  • T. Wang, D. J. Wu, A. Coates, A. Y. Ng, End-to-end text recognition with convolutional neural networks, in: Proceedings of the International Conference on Pattern Recognition (ICPR), 2012, pp. 3304–3308.
    Google ScholarLocate open access versionFindings
  • Y. Boureau, J. Ponce, Y. LeCun, A theoretical analysis of feature pooling in visual recognition, in: Proceedings of the International Conference on Machine Learning (ICML), 2010, pp. 111–118.
    Google ScholarLocate open access versionFindings
  • G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors, CoRR abs/1207.0580.
    Findings
  • M. Lin, Q. Chen, S. Yan, Network in network, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
    Google ScholarLocate open access versionFindings
  • Y. Tang, Deep learning using linear support vector machines, in: Proceedings of the International Conference on Machine Learning (ICML) Workshops, 2013.
    Google ScholarLocate open access versionFindings
  • G. Madjarov, D. Kocev, D. Gjorgjevikj, S. Dzeroski, An extensive experimental comparison of methods for multi-label learning, Pattern Recognition 45 (9) (2012) 3084–3104.
    Google ScholarLocate open access versionFindings
  • R. G. J. Wijnhoven, P. H. N. de With, Fast training of object detection using stochastic gradient descent, in: International Conference on Pattern Recognition (ICPR), 2010, pp. 424–427.
    Google ScholarLocate open access versionFindings
  • M. Zinkevich, M. Weimer, L. Li, A. J. Smola, Parallelized stochastic gradient descent, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2010, pp. 2595–2603.
    Google ScholarLocate open access versionFindings
  • J. Ngiam, Z. Chen, D. Chia, P. W. Koh, Q. V. Le, A. Y. Ng, Tiled convolutional neural networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2010, pp. 1279–1287.
    Google ScholarLocate open access versionFindings
  • Z. Wang, T. Oates, Encoding time series as images for visual inspection and classification using tiled convolutional neural networks, in: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) Workshops, 2015.
    Google ScholarLocate open access versionFindings
  • Y. Zheng, Q. Liu, E. Chen, Y. Ge, J. L. Zhao, Time series classification using multi-channels deep convolutional neural networks, in: Proceedings of the International Conference on Web-Age Information Management (WAIM), 2014, pp. 298–310.
    Google ScholarLocate open access versionFindings
  • M. D. Zeiler, D. Krishnan, G. W. Taylor, R. Fergus, Deconvolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2528–2535.
    Google ScholarLocate open access versionFindings
  • M. D. Zeiler, G. W. Taylor, R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in: Proceedings of the International Conference on Computer Vision (ICCV), 2011, pp. 2018–2025.
    Google ScholarLocate open access versionFindings
  • J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 39 (4) (2017) 640–651.
    Google ScholarLocate open access versionFindings
  • F. Visin, K. Kastner, A. Courville, Y. Bengio, M. Matteucci, K. Cho, Reseg: A recurrent neural network for object segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2015.
    Google ScholarLocate open access versionFindings
  • H. Noh, S. Hong, B. Han, Learning deconvolution network for semantic segmentation, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1520–1528.
    Google ScholarLocate open access versionFindings
  • C. Cao, X. Liu, Y. Yang, Y. Yu, J. Wang, Z. Wang, Y. Huang, L. Wang, C. Huang, W. Xu, et al., Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 2956–2964.
    Google ScholarLocate open access versionFindings
  • J. Zhang, Z. Lin, J. Brandt, X. Shen, S. Sclaroff, Top-down neural attention by excitation backprop, in: Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 543–559.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, E. K. Lee, E. H. Lee, U. EDU, Augmenting supervised neural networks with unsupervised objectives for large-scale image classification, in: Proceedings of the International Conference on Machine Learning (ICML), 2016, pp. 612–621.
    Google ScholarLocate open access versionFindings
  • B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2921–2929.
    Google ScholarLocate open access versionFindings
  • A. Das, H. Agrawal, C. L. Zitnick, D. Parikh, D. Batra, Human attention in visual question answering: Do humans and deep networks look at the same regions?, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016, pp. 932–937.
    Google ScholarLocate open access versionFindings
  • C. Dong, C. C. Loy, K. He, X. Tang, Image super-resolution using deep convolutional networks, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 38 (2) (2016) 295–307.
    Google ScholarLocate open access versionFindings
  • F. Yu, V. Koltun, Multi-scale context aggregation by dilated convolutions, in: Proceedings of the International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • N. Kalchbrenner, L. Espeholt, K. Simonyan, A. v. d. Oord, A. Graves, K. Kavukcuoglu, Neural machine translation in linear time, CoRR abs/1610.10099.
    Findings
  • [40] T. Sercu, V. Goel, Dense prediction on sequences with time-dilated convolutions for speech recognition, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS) Workshops, 2016.
    Google ScholarLocate open access versionFindings
  • [41] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826.
    Google ScholarLocate open access versionFindings
  • [42] C. Szegedy, S. Ioffe, V. Vanhoucke, Inception-v4, inception-resnet and the impact of residual connections on learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2017, pp. 4278–4284.
    Google ScholarLocate open access versionFindings
  • [43] A. Hyvarinen, U. Koster, Complex cell pooling and the statistics of natural images, Network: Computation in Neural Systems 18 (2) (2007) 81–100.
    Google ScholarLocate open access versionFindings
  • [44] J. B. Estrach, A. Szlam, Y. Lecun, Signal recovery from pooling representations, in: Proceedings of the International Conference on Machine Learning (ICML), 2014, pp. 307–315.
    Google ScholarLocate open access versionFindings
  • [45] L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, R. Fergus, Regularization of neural networks using dropconnect, in: Proceedings of the International Conference on Machine Learning (ICML), 2013, pp. 1058–1066.
    Google ScholarLocate open access versionFindings
  • [46] D. Yu, H. Wang, P. Chen, Z. Wei, Mixed pooling for convolutional neural networks, in: Proceedings of the Rough Sets and Knowledge Technology (RSKT), 2014, pp. 364–375.
    Google ScholarLocate open access versionFindings
  • [47] M. D. Zeiler, R. Fergus, Stochastic pooling for regularization of deep convolutional neural networks, in: Proceedings of the International Conference on Learning Representations (ICLR), 2013.
    Google ScholarLocate open access versionFindings
  • [48] O. Rippel, J. Snoek, R. P. Adams, Spectral representations for convolutional neural networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 2449–2457.
    Google ScholarLocate open access versionFindings
  • [49] M. Mathieu, M. Henaff, Y. LeCun, Fast training of convolutional networks through ffts, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
    Google ScholarLocate open access versionFindings
  • [50] K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 37 (9) (2015) 1904–1916.
    Google ScholarLocate open access versionFindings
  • [51] S. Singh, A. Gupta, A. Efros, Unsupervised discovery of mid-level discriminative patches, in: Proceedings of the European Conference on Computer Vision (ECCV), 2012, pp. 73–86.
    Google ScholarLocate open access versionFindings
  • [52] Y. Gong, L. Wang, R. Guo, S. Lazebnik, Multi-scale orderless pooling of deep convolutional activation features, in: Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 392–407.
    Google ScholarLocate open access versionFindings
  • [53] H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, C. Schmid, Aggregating local image descriptors into compact codes, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 34 (9) (2012) 1704–1716.
    Google ScholarLocate open access versionFindings
  • [54] A. L. Maas, A. Y. Hannun, A. Y. Ng, Rectifier nonlinearities improve neural network acoustic models, in: Proceedings of the International Conference on Machine Learning (ICML), Vol. 30, 2013.
    Google ScholarLocate open access versionFindings
  • [55] M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, et al., On rectified linear units for speech processing, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013, pp. 3517–3521.
    Google ScholarLocate open access versionFindings
  • [56] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034.
    Google ScholarLocate open access versionFindings
  • [57] B. Xu, N. Wang, T. Chen, M. Li, Empirical evaluation of rectified activations in convolutional network, in: Proceedings of the International Conference on Machine Learning (ICML) Workshop, 2015.
    Google ScholarLocate open access versionFindings
  • [58] D.-A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (elus), in: Proceedings of the International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • [59] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, Y. Bengio, Maxout networks, in: Proceedings of the International Conference on Machine Learning (ICML), 2013, pp. 1319–1327.
    Google ScholarLocate open access versionFindings
  • [60] J. T. Springenberg, M. Riedmiller, Improving deep neural networks with probabilistic maxout units, CoRR abs/1312.6116.
    Findings
  • [61] T. Zhang, Solving large scale linear prediction problems using stochastic gradient descent algorithms, in: Proceedings of the International Conference on Machine Learning (ICML), 2004.
    Google ScholarLocate open access versionFindings
  • [62] L. Deng, The mnist database of handwritten digit images for machine learning research, IEEE Signal Processing Magazine 29 (6) (2012) 141–142.
    Google ScholarLocate open access versionFindings
  • [63] W. Liu, Y. Wen, Z. Yu, M. Yang, Large-margin softmax loss for convolutional neural networks, in: Proceedings of the International Conference on Machine Learning (ICML), 2016, pp. 507–516.
    Google ScholarLocate open access versionFindings
  • [64] J. Bromley, J. W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore, E. Sackinger, R. Shah, Signature verification using a siamese time delay neural network, International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI) 7 (4) (1993) 669–688.
    Google ScholarLocate open access versionFindings
  • [65] S. Chopra, R. Hadsell, Y. LeCun, Learning a similarity metric discriminatively, with application to face verification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005, pp. 539–546.
    Google ScholarLocate open access versionFindings
  • [66] R. Hadsell, S. Chopra, Y. LeCun, Dimensionality reduction by learning an invariant mapping, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006, pp. 1735–1742.
    Google ScholarLocate open access versionFindings
  • [67] U. Shaham, R. R. Lederman, Learning by coincidence: Siamese networks and common variable learning, Pattern Recognition.
    Google ScholarFindings
  • [68] J. Lin, O. Morere, V. Chandrasekhar, A. Veillard, H. Goh, Deephash: Getting regularization, depth and fine-tuning right, in: Proceedings of the International Conference on Multimedia Retrieval (ICMR), 2017, pp. 133–141.
    Google ScholarLocate open access versionFindings
  • [69] F. Schroff, D. Kalenichenko, J. Philbin, Facenet: A unified embedding for face recognition and clustering, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 815–823.
    Google ScholarLocate open access versionFindings
  • [70] H. Liu, Y. Tian, Y. Yang, L. Pang, T. Huang, Deep relative distance learning: Tell the difference between similar vehicles, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2167–2175.
    Google ScholarLocate open access versionFindings
  • [71] S. Ding, L. Lin, G. Wang, H. Chao, Deep feature learning with relative distance comparison for person re-identification, Pattern Recognition 48 (10) (2015) 2993–3003.
    Google ScholarLocate open access versionFindings
  • [72] Z. Liu, P. Luo, S. Qiu, X. Wang, X. Tang, Deepfashion: Powering robust clothes recognition and retrieval with rich annotations, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1096–1104.
    Google ScholarLocate open access versionFindings
  • [73] D. P. Kingma, M. Welling, Auto-encoding variational bayes, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
    Google ScholarLocate open access versionFindings
  • [74] D. J. Im, S. Ahn, R. Memisevic, Y. Bengio, Denoising criterion for variational auto-encoding framework, in: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), 2017, pp. 2059–2065.
    Google ScholarLocate open access versionFindings
  • [75] D. P. Kingma, S. Mohamed, D. J. Rezende, M. Welling, Semi-supervised learning with deep generative models, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 3581–3589.
    Google ScholarLocate open access versionFindings
  • [76] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 2672– 2680.
    Google ScholarLocate open access versionFindings
  • [77] M. Mirza, S. Osindero, Conditional generative adversarial nets, CoRR abs/1411.1784.
    Findings
  • [78] P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders, in: Proceedings of the International Conference on Machine Learning (ICML), 2008, pp. 1096–1103.
    Google ScholarLocate open access versionFindings
  • [79] W. W. Ng, G. Zeng, J. Zhang, D. S. Yeung, W. Pedrycz, Dual autoencoders features for imbalance classification problem, Pattern Recognition 60 (2016) 875–889.
    Google ScholarLocate open access versionFindings
  • [80] J. Mehta, A. Majumdar, Rodeo: robust de-aliasing autoencoder for real-time medical image reconstruction, Pattern Recognition 63 (2017) 499–510.
    Google ScholarLocate open access versionFindings
  • [81] B. A. Olshausen, et al., Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature 381 (6583) (1996) 607.
    Google ScholarLocate open access versionFindings
  • [82] H. Lee, A. Battle, R. Raina, A. Y. Ng, Efficient sparse coding algorithms, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2006, pp. 801–808.
    Google ScholarLocate open access versionFindings
  • [83] S. Eslami, N. Heess, T. Weber, Y. Tassa, K. Kavukcuoglu, G. E. Hinton, Attend, infer, repeat: Fast scene understanding with generative models, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016, pp. 3225–3233.
    Google ScholarLocate open access versionFindings
  • [84] K. Sohn, H. Lee, X. Yan, Learning structured output representation using deep conditional generative models, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 3483–3491.
    Google ScholarLocate open access versionFindings
  • [85] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial text to image synthesis, in: Proceedings of the International Conference on Machine Learning (ICML), 2016, pp. 1060–1069.
    Google ScholarLocate open access versionFindings
  • [86] E. L. Denton, S. Chintala, R. Fergus, et al., Deep generative image models using a laplacian pyramid of adversarial networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 1486–1494.
    Google ScholarLocate open access versionFindings
  • [87] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training gans, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016, pp. 2226–2234.
    Google ScholarLocate open access versionFindings
  • [88] A. Dosovitskiy, T. Brox, Generating images with perceptual similarity metrics based on deep networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016, pp. 658–666.
    Google ScholarLocate open access versionFindings
  • [89] A. N. Tikhonov, On the stability of inverse problems, in: Dokl. Akad. Nauk SSSR, Vol. 39, 1943, pp. 195–198.
    Google ScholarLocate open access versionFindings
  • [90] S. Wang, C. Manning, Fast dropout training, in: Proceedings of the International Conference on Machine Learning (ICML), 2013, pp. 118–126.
    Google ScholarLocate open access versionFindings
  • [91] J. Ba, B. Frey, Adaptive dropout for training deep neural networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2013, pp. 3084–3092.
    Google ScholarLocate open access versionFindings
  • [92] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, C. Bregler, Efficient object localization using convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 648–656.
    Google ScholarLocate open access versionFindings
  • [93] H. Yang, I. Patras, Mirror, mirror on the wall, tell me, is the error small?, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4685–4693.
    Google ScholarLocate open access versionFindings
  • [94] S. Xie, Z. Tu, Holistically-nested edge detection, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1395–1403.
    Google ScholarLocate open access versionFindings
  • [95] J. Salamon, J. P. Bello, Deep convolutional neural networks and data augmentation for environmental sound classification, Signal Processing Letters (SPL) 24 (3) (2017) 279–283.
    Google ScholarLocate open access versionFindings
  • [96] D. Eigen, R. Fergus, Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 2650–2658.
    Google ScholarLocate open access versionFindings
  • [97] M. Paulin, J. Revaud, Z. Harchaoui, F. Perronnin, C. Schmid, Transformation pursuit for image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 3646–3653.
    Google ScholarLocate open access versionFindings
  • [98] S. Hauberg, O. Freifeld, A. B. L. Larsen, J. W. Fisher III, L. K. Hansen, Dreaming more data: Class-dependent distributions over diffeomorphisms for learned data augmentation, in: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2016, pp. 342–350.
    Google ScholarLocate open access versionFindings
  • [99] S. Xie, T. Yang, X. Wang, Y. Lin, Hyper-class augmented and regularized deep learning for fine-grained image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 2645–2654.
    Google ScholarLocate open access versionFindings
  • [100] Z. Xu, S. Huang, Y. Zhang, D. Tao, Augmenting strong supervision using web data for fine-grained categorization, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 2524–2532.
    Google ScholarLocate open access versionFindings
  • [101] A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, Y. LeCun, The loss surfaces of multilayer networks, in: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2015.
    Google ScholarLocate open access versionFindings
  • [102] D. Mishkin, J. Matas, All you need is a good init, in: Proceedings of the International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • [103] I. Sutskever, J. Martens, G. Dahl, G. Hinton, On the importance of initialization and momentum in deep learning, in: Proceedings of the International Conference on Machine Learning (ICML), 2013, pp. 1139–1147.
    Google ScholarLocate open access versionFindings
  • [104] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2010, pp. 249–256.
    Google ScholarLocate open access versionFindings
  • [105] A. M. Saxe, J. L. McClelland, S. Ganguli, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
    Google ScholarLocate open access versionFindings
  • [106] C. Doersch, A. Gupta, A. A. Efros, Unsupervised visual representation learning by context prediction, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1422–1430.
    Google ScholarLocate open access versionFindings
  • [107] P. Agrawal, J. Carreira, J. Malik, Learning to see by moving, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 37–45.
    Google ScholarLocate open access versionFindings
  • [108] N. Qian, On the momentum term in gradient descent learning algorithms, Neural Networks 12 (1) (1999) 145–151.
    Google ScholarLocate open access versionFindings
  • [109] D. Kingma, J. Ba, Adam: A method for stochastic optimization, in: Proceedings of the International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • [110] I. Loshchilov, F. Hutter, Sgdr: Stochastic gradient descent with warm restarts, in: Proceedings of the International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • [111] T. Schaul, S. Zhang, Y. LeCun, No more pesky learning rates, in: Proceedings of the International Conference on Machine Learning (ICML), 2013, pp. 343–351.
    Google ScholarLocate open access versionFindings
  • [112] S. Zhang, A. E. Choromanska, Y. LeCun, Deep learning with elastic averaging sgd, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 685–693.
    Google ScholarLocate open access versionFindings
  • [113] B. Recht, C. Re, S. Wright, F. Niu, Hogwild: A lock-free approach to parallelizing stochastic gradient descent, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2011, pp. 693–701.
    Google ScholarLocate open access versionFindings
  • [114] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, et al., Large scale distributed deep networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2012, pp. 1232–1240.
    Google ScholarLocate open access versionFindings
  • [115] T. Paine, H. Jin, J. Yang, Z. Lin, T. Huang, Gpu asynchronous stochastic gradient descent to speed up neural network training, CoRR abs/1107.2490.
    Findings
  • [116] Y. Zhuang, W.-S. Chin, Y.-C. Juan, C.-J. Lin, A fast parallel sgd for matrix factorization in shared memory systems, in: Proceedings of the ACM conference on Recommender systems RecSys, 2013, pp. 249–256.
    Google ScholarLocate open access versionFindings
  • [117] Y. Yao, L. Rosasco, A. Caponnetto, On early stopping in gradient descent learning, Constructive Approximation 26 (2) (2007) 289–315.
    Google ScholarLocate open access versionFindings
  • [118] L. Prechelt, Early stopping - but when?, in: Neural Networks: Tricks of the Trade - Second Edition, 2012, pp. 53–67.
    Google ScholarFindings
  • [119] C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, Understanding deep learning requires rethinking generalization, in: Proceedings of the International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • [120] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Journal of Machine Learning Research (JMLR) (2015) 448–456.
    Google ScholarLocate open access versionFindings
  • [121] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation 9 (8) (1997) 1735–1780.
    Google ScholarLocate open access versionFindings
  • [122] R. K. Srivastava, K. Greff, J. Schmidhuber, Training very deep networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 2377–2385.
    Google ScholarLocate open access versionFindings
  • [123] K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, in: Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 630–645.
    Google ScholarLocate open access versionFindings
  • [124] F. Shen, R. Gan, G. Zeng, Weighted residuals for very deep networks, in: Proceedings of the International Conference on Systems and Informatics (ICSAI), 2016, pp. 936–941.
    Google ScholarLocate open access versionFindings
  • [125] S. Zagoruyko, N. Komodakis, Wide residual networks, in: Proceedings of the British Machine Vision Conference (BMVC), 2016, pp. 87.1–87.12.
    Google ScholarLocate open access versionFindings
  • [126] S. Singh, D. Hoiem, D. Forsyth, Swapout: Learning an ensemble of deep architectures, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016, pp. 28–36.
    Google ScholarLocate open access versionFindings
  • [127] S. Targ, D. Almeida, K. Lyman, Resnet in resnet: Generalizing residual architectures, CoRR abs/1603.08029.
    Findings
  • [128] K. Zhang, M. Sun, T. X. Han, X. Yuan, L. Guo, T. Liu, Residual networks of residual networks: Multilevel residual networks, IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) PP (99) (2016) 1–1.
    Google ScholarFindings
  • [129] G. Huang, Z. Liu, K. Q. Weinberger, Densely connected convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4700–4708.
    Google ScholarLocate open access versionFindings
  • [130] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, E. Shelhamer, cudnn: Efficient primitives for deep learning abs/1410.0759.
    Google ScholarFindings
  • [131] N. Vasilache, J. Johnson, M. Mathieu, S. Chintala, S. Piantino, Y. LeCun, Fast convolutional nets with fbfft: A gpu performance evaluation, in: Proceedings of the International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • [132] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, Overfeat: Integrated recognition, localization and detection using convolutional networks.
    Google ScholarFindings
  • [133] A. Lavin, Fast algorithms for convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4013–4021.
    Google ScholarLocate open access versionFindings
  • [134] T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, B. Ramabhadran, Low-rank matrix factorization for deep neural network training with high-dimensional output targets, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013, pp. 6655–6659.
    Google ScholarLocate open access versionFindings
  • [135] J. Xue, J. Li, Y. Gong, Restructuring of deep neural network acoustic models with singular value decomposition, in: Proceedings of the International Speech Communication Association (INTERSPEECH), 2013, pp. 2365–2369.
    Google ScholarLocate open access versionFindings
  • [136] M. Denil, B. Shakibi, L. Dinh, N. de Freitas, et al., Predicting parameters in deep learning, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2013, pp. 2148–2156.
    Google ScholarLocate open access versionFindings
  • [137] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, R. Fergus, Exploiting linear structure within convolutional networks for efficient evaluation, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 1269–1277.
    Google ScholarLocate open access versionFindings
  • [138] M. Jaderberg, A. Vedaldi, A. Zisserman, Speeding up convolutional neural networks with low rank expansions, in: Proceedings of the British Machine Vision Conference (BMVC), 2014.
    Google ScholarLocate open access versionFindings
  • [139] A. Novikov, D. Podoprikhin, A. Osokin, D. P. Vetrov, Tensorizing neural networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 442–450.
    Google ScholarLocate open access versionFindings
  • [140] I. V. Oseledets, Tensor-train decomposition, SIAM J. Scientific Computing 33 (5) (2011) 2295–2317.
    Google ScholarLocate open access versionFindings
  • [141] Q. Le, T. Sarlos, A. Smola, Fastfood-approximating kernel expansions in loglinear time, in: Proceedings of the International Conference on Machine Learning (ICML), Vol. 85, 2013.
    Google ScholarLocate open access versionFindings
  • [142] A. Dasgupta, R. Kumar, T. Sarlos, Fast locality-sensitive hashing, in: Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2011, pp. 1073–1081.
    Google ScholarLocate open access versionFindings
  • [143] F. X. Yu, S. Kumar, Y. Gong, S.-F. Chang, Circulant binary embedding, in: Proceedings of the International Conference on Machine Learning (ICML), 2014, pp. 946–954.
    Google ScholarLocate open access versionFindings
  • [144] Y. Cheng, F. X. Yu, R. S. Feris, S. Kumar, A. Choudhary, S.-F. Chang, An exploration of parameter redundancy in deep networks with circulant projections, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 2857–2865.
    Google ScholarLocate open access versionFindings
  • [145] M. Moczulski, M. Denil, J. Appleyard, N. de Freitas, Acdc: A structured efficient linear layer, in: Proceedings of the International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • [146] S. Han, H. Mao, W. J. Dally, Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding, in: Proceedings of the International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • [147] M. Kim, P. Smaragdis, Bitwise neural networks, in: Proceedings of the International Conference on Machine Learning (ICML) Workshops, 2016.
    Google ScholarLocate open access versionFindings
  • [148] M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, Xnor-net: Imagenet classification using binary convolutional neural networks, in: Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 525–542.
    Google ScholarLocate open access versionFindings
  • [149] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, Y. Zou, Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients, CoRR abs/1606.06160.
    Findings
  • [150] M. Courbariaux, Y. Bengio, Binarynet: Training deep neural networks with weights and activations constrained to+ 1 or-1, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016.
    Google ScholarLocate open access versionFindings
  • [151] G. J. Sullivan, Efficient scalar quantization of exponential and laplacian random variables, IEEE Trans. Information Theory 42 (5) (1996) 1365–1374.
    Google ScholarLocate open access versionFindings
  • [152] Y. Gong, L. Liu, M. Yang, L. Bourdev, Compressing deep convolutional networks using vector quantization, in: arXiv preprint arXiv:1412.6115, Vol. abs/1412.6115, 2014.
    Findings
  • [153] Y. Chen, T. Guan, C. Wang, Approximate nearest neighbor search by residual vector quantization, Sensors 10 (12) (2010) 11259–11273.
    Google ScholarLocate open access versionFindings
  • [154] W. Zhou, Y. Lu, H. Li, Q. Tian, Scalar quantization for large scale image search, in: Proceedings of the 20th ACM international conference on Multimedia, 2012, pp. 169–178.
    Google ScholarLocate open access versionFindings
  • [155] L. Y. Pratt, Comparing biases for minimal network construction with back-propagation, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 1988, pp. 177–185.
    Google ScholarLocate open access versionFindings
  • [156] S. Han, J. Pool, J. Tran, W. Dally, Learning both weights and connections for efficient neural network, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2015, pp. 1135–1143.
    Google ScholarLocate open access versionFindings
  • [157] Y. Guo, A. Yao, Y. Chen, Dynamic network surgery for efficient dnns, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016, pp. 1379–1387.
    Google ScholarLocate open access versionFindings
  • [158] T.-J. Yang, Y.-H. Chen, V. Sze, Designing energy-efficient convolutional neural networks using energy-aware pruning, CoRR abs/1611.05128.
    Findings
  • [159] H. Hu, R. Peng, Y.-W. Tai, C.-K. Tang, Network trimming: A data-driven neuron pruning approach towards efficient deep architectures, Vol. abs/1607.03250, 2016.
    Google ScholarLocate open access versionFindings
  • [160] S. Srinivas, R. V. Babu, Data-free parameter pruning for deep neural networks, in: Proceedings of the British Machine Vision Conference (BMVC), 2015.
    Google ScholarLocate open access versionFindings
  • [161] Z. Mariet, S. Sra, Diversity networks, in: Proceedings of the International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • [162] W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, Y. Chen, Compressing neural networks with the hashing trick, in: Proceedings of the International Conference on Machine Learning (ICML), 2015, pp. 2285–2294.
    Google ScholarLocate open access versionFindings
  • [163] Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, S. Vishwanathan, Hash kernels for structured data, Journal of Machine Learning Research (JMLR) 10 (2009) 2615–2637.
    Google ScholarLocate open access versionFindings
  • [164] K. Weinberger, A. Dasgupta, J. Langford, A. Smola, J. Attenberg, Feature hashing for large scale multitask learning, in: Proceedings of the International Conference on Machine Learning (ICML), 2009, pp. 1113–1120.
    Google ScholarLocate open access versionFindings
  • [165] B. Liu, M. Wang, H. Foroosh, M. Tappen, M. Pensky, Sparse convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 806–814.
    Google ScholarLocate open access versionFindings
  • [166] W. Wen, C. Wu, Y. Wang, Y. Chen, H. Li, Learning structured sparsity in deep neural networks, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2016, pp. 2074–2082.
    Google ScholarLocate open access versionFindings
  • [167] H. Bagherinezhad, M. Rastegari, A. Farhadi, Lcnn: Lookup-based convolutional neural network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    Google ScholarLocate open access versionFindings
  • [168] M. Egmont-Petersen, D. de Ridder, H. Handels, Image processing with neural networksa review, Pattern recognition 35 (10) (2002) 2279–2301.
    Google ScholarLocate open access versionFindings
  • [169] K. Nogueira, O. A. Penatti, J. A. dos Santos, Towards better exploiting convolutional neural networks for remote sensing scene classification, Pattern Recognition 61 (2017) 539–556.
    Google ScholarLocate open access versionFindings
  • [170] Z. Zuo, G. Wang, B. Shuai, L. Zhao, Q. Yang, Exemplar based deep discriminative and shareable feature learning for scene image classification, Pattern Recognition 48 (10) (2015) 3004–3015.
    Google ScholarLocate open access versionFindings
  • [171] A. T. Lopes, E. de Aguiar, A. F. De Souza, T. Oliveira-Santos, Facial expression recognition with convolutional neural networks: Coping with few data and the training sample order, Pattern Recognition 61 (2017) 610–628.
    Google ScholarLocate open access versionFindings
  • [172] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, A. Zisserman, The pascal visual object classes challenge: A retrospective, International Journal of Conflict and Violence (IJCV) 111 (1) (2015) 98–136.
    Google ScholarLocate open access versionFindings
  • [173] A.-M. Tousch, S. Herbin, J.-Y. Audibert, Semantic hierarchies for image annotation: A survey, Pattern Recognition 45 (1) (2012) 333–345.
    Google ScholarLocate open access versionFindings
  • [174] N. Srivastava, R. R. Salakhutdinov, Discriminative transfer learning with tree-based priors, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2013, pp. 2094–2102.
    Google ScholarLocate open access versionFindings
  • [175] Z. Wang, X. Wang, G. Wang, Learning fine-grained features via a cnn tree for large-scale classification, CoRR abs/1511.04534.
    Findings
  • [176] T. Xiao, J. Zhang, K. Yang, Y. Peng, Z. Zhang, Error-driven incremental learning in deep convolutional neural network for large-scale image classification, in: Proceedings of the ACM Multimedia Conference, 2014, pp. 177–186.
    Google ScholarLocate open access versionFindings
  • [177] Z. Yan, V. Jagadeesh, D. DeCoste, W. Di, R. Piramuthu, Hd-cnn: Hierarchical deep convolutional neural network for image classification, in: Proceedings of the International Conference on Computer Vision (ICCV), pp. 2740–2748.
    Google ScholarLocate open access versionFindings
  • [178] T. Berg, J. Liu, S. W. Lee, M. L. Alexander, D. W. Jacobs, P. N. Belhumeur, Birdsnap: Large-scale fine-grained visual categorization of birds, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 2019–2026.
    Google ScholarLocate open access versionFindings
  • [179] A. Khosla, N. Jayadevaprakash, B. Yao, F.-F. Li, Novel dataset for fine-grained image categorization: Stanford dogs, in: Proceedings of the IEEE International Conference on Computer Vision (CVPR Workshops, Vol. 2, 2011, p. 1.
    Google ScholarLocate open access versionFindings
  • [180] L. Yang, P. Luo, C. C. Loy, X. Tang, A large-scale car dataset for fine-grained categorization and verification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3973–3981.
    Google ScholarLocate open access versionFindings
  • [181] M. Minervini, A. Fischbach, H. Scharr, S. A. Tsaftaris, Finely-grained annotated datasets for image-based plant phenotyping, Pattern recognition letters 81 (2016) 80–89.
    Google ScholarLocate open access versionFindings
  • [182] G.-S. Xie, X.-Y. Zhang, W. Yang, M.-L. Xu, S. Yan, C.-L. Liu, Lg-cnn: From local parts to global discrimination for fine-grained recognition, Pattern Recognition 71 (2017) 118–131.
    Google ScholarLocate open access versionFindings
  • [183] S. Branson, G. Van Horn, P. Perona, S. Belongie, Improved bird species recognition using pose normalized deep convolutional nets, in: Proceedings of the British Machine Vision Conference (BMVC), 2014.
    Google ScholarLocate open access versionFindings
  • [184] N. Zhang, J. Donahue, R. Girshick, T. Darrell, Part-based r-cnns for fine-grained category detection, in: Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 834–849.
    Google ScholarLocate open access versionFindings
  • [185] J. R. Uijlings, K. E. van de Sande, T. Gevers, A. W. Smeulders, Selective search for object recognition, International Journal of Conflict and Violence (IJCV) 104 (2) (2013) 154–171.
    Google ScholarLocate open access versionFindings
  • [186] D. Lin, X. Shen, C. Lu, J. Jia, Deep lac: Deep localization, alignment and classification for fine-grained recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1666–1674.
    Google ScholarLocate open access versionFindings
  • [187] J. P. Pluim, J. A. Maintz, M. Viergever, et al., Mutual-information-based registration of medical images: a survey, IEEE Trans. Med. Imaging 22 (8) (2003) 986–1004.
    Google ScholarLocate open access versionFindings
  • [188] J. Krause, T. Gebru, J. Deng, L.-J. Li, L. Fei-Fei, Learning features and parts for fine-grained recognition, in: Proceedings of the International Conference on Pattern Recognition (ICPR), 2014, pp. 26–33.
    Google ScholarLocate open access versionFindings
  • [189] J. Krause, H. Jin, J. Yang, L. Fei-Fei, Fine-grained recognition without part annotations, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5546–5555.
    Google ScholarLocate open access versionFindings
  • [190] Y. Zhang, X.-S. Wei, J. Wu, J. Cai, J. Lu, V.-A. Nguyen, M. N. Do, Weakly supervised fine-grained categorization with part-based image representation, IEEE Transactions on Image Processing 25 (4) (2016) 1713–1725.
    Google ScholarLocate open access versionFindings
  • [191] T. Xiao, Y. Xu, K. Yang, J. Zhang, Y. Peng, Z. Zhang, The application of two-level attention models in deep convolutional neural network for fine-grained image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 842–850.
    Google ScholarLocate open access versionFindings
  • [192] T.-Y. Lin, A. RoyChowdhury, S. Maji, Bilinear cnn models for fine-grained visual recognition, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1449–1457.
    Google ScholarLocate open access versionFindings
  • [193] D. T. Nguyen, W. Li, P. O. Ogunbona, Human detection from images and videos: A survey, Pattern Recognition 51 (2016) 148–175.
    Google ScholarLocate open access versionFindings
  • [194] Y. Li, S. Wang, Q. Tian, X. Ding, Feature representation for statistical-learning-based object detection: A review, Pattern Recognition 48 (11) (2015) 3542–3559.
    Google ScholarLocate open access versionFindings
  • [195] M. Pedersoli, A. Vedaldi, J. Gonzalez, X. Roca, A coarse-to-fine approach for fast deformable object detection, Pattern Recognition 48 (5) (2015) 1844–1853.
    Google ScholarLocate open access versionFindings
  • [196] S. J. Nowlan, J. C. Platt, A convolutional neural network hand tracker, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 1994, pp. 901–908.
    Google ScholarLocate open access versionFindings
  • [197] R. Girshick, F. Iandola, T. Darrell, J. Malik, Deformable part models are convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 437–446.
    Google ScholarLocate open access versionFindings
  • [198] R. Vaillant, C. Monrocq, Y. Le Cun, Original approach for the localisation of objects in images, IEE Proceedings-Vision, Image and Signal Processing 141 (4) (1994) 245–250.
    Google ScholarLocate open access versionFindings
  • [199] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, C. L. Zitnick, Microsoft coco: Common objects in context, in: Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 740–755.
    Google ScholarLocate open access versionFindings
  • [200] I. Endres, D. Hoiem, Category independent object proposals, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 36 (2) (2014) 222–234.
    Google ScholarLocate open access versionFindings
  • [201] L. Gomez, D. Karatzas, Textproposals: a text-specific selective search algorithm for word spotting in the wild, Pattern Recognition 70 (2017) 60–74.
    Google ScholarLocate open access versionFindings
  • [202] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 580–587.
    Google ScholarLocate open access versionFindings
  • [203] K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 37 (9) (2015) 1904–1916.
    Google ScholarLocate open access versionFindings
  • [204] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 39 (6) (2017) 1137–1149.
    Google ScholarLocate open access versionFindings
  • [205] S. Gidaris, N. Komodakis, Object detection via a multi-region and semantic segmentation-aware cnn model, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1134–1142.
    Google ScholarLocate open access versionFindings
  • [206] D. Yoo, S. Park, J.-Y. Lee, A. S. Paek, I. So Kweon, Attentionnet: Aggregating weak directions for accurate object detection, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 2659–2667.
    Google ScholarLocate open access versionFindings
  • [207] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part-based models, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 32 (9) (2010) 1627–1645.
    Google ScholarLocate open access versionFindings
  • [208] E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, F. Moreno-Noguer, Fracking deep convolutional image descriptors, CoRR abs/1412.6537.
    Findings
  • [209] A. Shrivastava, A. Gupta, R. Girshick, Training region-based object detectors with online hard example mining, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 761–769.
    Google ScholarLocate open access versionFindings
  • [210] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788.
    Google ScholarLocate open access versionFindings
  • [211] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, Ssd: Single shot multibox detector, in: Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 21–37.
    Google ScholarLocate open access versionFindings
  • [212] Y. Lu, T. Javidi, S. Lazebnik, Adaptive object detection using adjacency and zoom prediction, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2351–2359.
    Google ScholarLocate open access versionFindings
  • [213] K. Zhang, H. Song, Real-time visual tracking via online weighted multiple instance learning, Pattern Recognition 46 (1) (2013) 397–411.
    Google ScholarLocate open access versionFindings
  • [214] S. Zhang, H. Yao, X. Sun, X. Lu, Sparse coding based visual tracking: Review and experimental comparison, Pattern Recognition 46 (7) (2013) 1772–1788.
    Google ScholarLocate open access versionFindings
  • [215] S. Zhang, J. Wang, Z. Wang, Y. Gong, Y. Liu, Multi-target tracking by learning local-to-global trajectory models, Pattern Recognition 48 (2) (2015) 580–590.
    Google ScholarLocate open access versionFindings
  • [216] J. Fan, W. Xu, Y. Wu, Y. Gong, Human tracking using convolutional neural networks, IEEE Trans. Neural Networks (TNN) 21 (10) (2010) 1610–1623.
    Google ScholarLocate open access versionFindings
  • [217] H. Li, Y. Li, F. Porikli, Deeptrack: Learning discriminative feature representations by convolutional neural networks for visual tracking, in: Proceedings of the British Machine Vision Conference (BMVC), 2014.
    Google ScholarLocate open access versionFindings
  • [218] Y. Chen, X. Yang, B. Zhong, S. Pan, D. Chen, H. Zhang, Cnntracker: Online discriminative object tracking via deep convolutional neural network, Appl. Soft Comput. 38 (2016) 1088–1098.
    Google ScholarLocate open access versionFindings
  • [219] S. Hong, T. You, S. Kwak, B. Han, Online tracking by learning discriminative saliency map with convolutional neural network, in: Proceedings of the International Conference on Machine Learning (ICML), 2015, pp. 597–606.
    Google ScholarLocate open access versionFindings
  • [220] M. Patacchiola, A. Cangelosi, Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods, Pattern Recognition 71 (2017) 132–143.
    Google ScholarLocate open access versionFindings
  • [221] K. Nishi, J. Miura, Generation of human depth images with body part labels for complex human pose recognition, Pattern Recognition.
    Google ScholarFindings
  • [222] A. Toshev, C. Szegedy, Deeppose: Human pose estimation via deep neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1653–1660.
    Google ScholarLocate open access versionFindings
  • [223] A. Jain, J. Tompson, M. Andriluka, G. W. Taylor, C. Bregler, Learning human pose estimation features with convolutional networks, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
    Google ScholarLocate open access versionFindings
  • [224] J. J. Tompson, A. Jain, Y. LeCun, C. Bregler, Joint training of a convolutional network and a graphical model for human pose estimation, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 1799–1807.
    Google ScholarLocate open access versionFindings
  • [225] X. Chen, A. L. Yuille, Articulated pose estimation by a graphical model with image dependent pairwise relations, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 1736–1744.
    Google ScholarLocate open access versionFindings
  • [226] X. Chen, A. Yuille, Parsing occluded people by flexible compositions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3945–3954.
    Google ScholarLocate open access versionFindings
  • [227] X. Fan, K. Zheng, Y. Lin, S. Wang, Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1347–1355.
    Google ScholarLocate open access versionFindings
  • [228] A. Jain, J. Tompson, Y. LeCun, C. Bregler, Modeep: A deep learning framework using motion features for human pose estimation, in: Proceedings of the Asian Conference on Computer Vision (ACCV), 2014, pp. 302–315.
    Google ScholarLocate open access versionFindings
  • [229] Y. Y. Tang, S.-W. Lee, C. Y. Suen, Automatic document processing: a survey, Pattern recognition 29 (12) (1996) 1931–1952.
    Google ScholarFindings
  • [230] A. Vinciarelli, A survey on off-line cursive word recognition, Pattern recognition 35 (7) (2002) 1433–1446.
    Google ScholarLocate open access versionFindings
  • [231] K. Jung, K. I. Kim, A. K. Jain, Text information extraction in images and video: a survey, Pattern recognition 37 (5) (2004) 977–997.
    Google ScholarFindings
  • [232] S. Eskenazi, P. Gomez-Kramer, J.-M. Ogier, A comprehensive survey of mostly textual document segmentation algorithms since 2008, Pattern Recognition 64 (2017) 1–14.
    Google ScholarLocate open access versionFindings
  • [233] X. Bai, B. Shi, C. Zhang, X. Cai, L. Qi, Text/non-text image classification in the wild with convolutional neural networks, Pattern Recognition 66 (2017) 437–446.
    Google ScholarLocate open access versionFindings
  • [234] L. Gomez, A. Nicolaou, D. Karatzas, Improving patch-based scene text script identification with ensembles of conjoined networks, Pattern Recognition 67 (2017) 85–96.
    Google ScholarLocate open access versionFindings
  • [235] M. Delakis, C. Garcia, Text detection with convolutional neural networks, in: Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), 2008, pp. 290–294.
    Google ScholarLocate open access versionFindings
  • [236] H. Xu, F. Su, Robust seed localization and growing with deep convolutional features for scene text detection, in: Proceedings of the International Conference on Multimedia Retrieval (ICMR), 2015, pp. 387–394.
    Google ScholarLocate open access versionFindings
  • [237] W. Huang, Y. Qiao, X. Tang, Robust scene text detection with convolution neural network induced mser trees, in: Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 497–511.
    Google ScholarLocate open access versionFindings
  • [238] C. Zhang, C. Yao, B. Shi, X. Bai, Automatic discrimination of text and non-text natural images, in: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2015, pp. 886–890.
    Google ScholarLocate open access versionFindings
  • [239] I. J. Goodfellow, J. Ibarz, S. Arnoud, V. Shet, Multi-digit number recognition from street view imagery using deep convolutional neural networks, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
    Google ScholarLocate open access versionFindings
  • [240] M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Deep structured output learning for unconstrained text recognition, in: Proceedings of the International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • [241] P. He, W. Huang, Y. Qiao, C. C. Loy, X. Tang, Reading scene text in deep convolutional sequences, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2016, pp. 3501–3508.
    Google ScholarLocate open access versionFindings
  • [242] F. A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: Continual prediction with lstm, Neural Computation 12 (10) (2000) 2451–2471.
    Google ScholarLocate open access versionFindings
  • [243] B. Shi, X. Bai, C. Yao, An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition, CoRR abs/1507.05717.
    Findings
  • [244] M. Jaderberg, A. Vedaldi, A. Zisserman, Deep features for text spotting, in: Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 512–528.
    Google ScholarLocate open access versionFindings
  • [245] M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Reading text in the wild with convolutional neural networks, Vol. 116, 2016, pp. 1–20.
    Google ScholarLocate open access versionFindings
  • [246] L. Wang, H. Lu, X. Ruan, M.-H. Yang, Deep networks for saliency detection via local estimation and global search, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3183–3192.
    Google ScholarLocate open access versionFindings
  • [247] R. Zhao, W. Ouyang, H. Li, X. Wang, Saliency detection by multi-context deep learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1265–1274.
    Google ScholarLocate open access versionFindings
  • [248] G. Li, Y. Yu, Visual saliency based on multiscale deep features, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5455–5463.
    Google ScholarLocate open access versionFindings
  • [249] N. Liu, J. Han, D. Zhang, S. Wen, T. Liu, Predicting eye fixations using convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 362–370.
    Google ScholarLocate open access versionFindings
  • [250] S. He, R. W. Lau, W. Liu, Z. Huang, Q. Yang, Supercnn: A superpixelwise convolutional neural network for salient object detection, International Journal of Conflict and Violence (IJCV) 115 (3) (2015) 330–344.
    Google ScholarLocate open access versionFindings
  • [251] E. Vig, M. Dorr, D. Cox, Large-scale optimization of hierarchical features for saliency prediction in natural images, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 2798–2805.
    Google ScholarLocate open access versionFindings
  • [252] M. Kmmerer, L. Theis, M. Bethge, Deep gaze i: Boosting saliency prediction with feature maps trained on imagenet, in: Proceedings of the International Conference on Learning Representations (ICLR) Workshops, 2015.
    Google ScholarLocate open access versionFindings
  • [253] J. Pan, X. Gir-i Nieto, End-to-end convolutional network for saliency prediction, CoRR abs/1507.01422.
    Findings
  • [254] G. Guo, A. Lai, A survey on still image based human action recognition, Pattern Recognition 47 (10) (2014) 3343–3361.
    Google ScholarLocate open access versionFindings
  • [255] L. L. Presti, M. La Cascia, 3d skeleton-based human action classification: A survey, Pattern Recognition 53 (2016) 130–147.
    Google ScholarLocate open access versionFindings
  • [256] J. Zhang, W. Li, P. O. Ogunbona, P. Wang, C. Tang, Rgb-d-based action recognition datasets: A survey, Pattern Recognition 60 (2016) 86–105.
    Google ScholarLocate open access versionFindings
  • [257] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell, Decaf: A deep convolutional activation feature for generic visual recognition (2014).
    Google ScholarFindings
  • [258] M. Oquab, L. Bottou, I. Laptev, J. Sivic, Learning and transferring mid-level image representations using convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1717–1724.
    Google ScholarLocate open access versionFindings
  • [259] G. Gkioxari, R. Girshick, J. Malik, Actions and attributes from wholes and parts, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 2470–2478.
    Google ScholarLocate open access versionFindings
  • [260] L. Pishchulin, M. Andriluka, P. Gehler, B. Schiele, Poselet conditioned pictorial structures, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 588–595.
    Google ScholarLocate open access versionFindings
  • [261] G. Gkioxari, R. B. Girshick, J. Malik, Contextual action recognition with r*cnn, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 1080–1088.
    Google ScholarLocate open access versionFindings
  • [262] G. Gkioxari, R. Girshick, J. Malik, Actions and attributes from wholes and parts, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 2470–2478.
    Google ScholarLocate open access versionFindings
  • [263] Y. Zhang, L. Cheng, J. Wu, J. Cai, M. N. Do, J. Lu, Action recognition in still images with minimum annotation efforts, IEEE Transactions on Image Processing 25 (11) (2016) 5479–5490.
    Google ScholarLocate open access versionFindings
  • [264] L. Wang, L. Ge, R. Li, Y. Fang, Three-stream cnns for action recognition, Pattern Recognition Letters 92 (2017) 33–40.
    Google ScholarLocate open access versionFindings
  • [265] S. Ji, W. Xu, M. Yang, K. Yu, 3d convolutional neural networks for human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 35 (1) (2013) 221–231.
    Google ScholarLocate open access versionFindings
  • [266] D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, Learning spatiotemporal features with 3d convolutional networks, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 4489–4497.
    Google ScholarLocate open access versionFindings
  • [267] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1725–1732.
    Google ScholarLocate open access versionFindings
  • [268] K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos, in: Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, K. Weinberger (Eds.), Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 568–576.
    Google ScholarLocate open access versionFindings
  • [269] G. Cheron, I. Laptev, C. Schmid, P-CNN: pose-based CNN features for action recognition, in: Proceedings of the International Conference on Computer Vision (ICCV), 2015, pp. 3218–3226.
    Google ScholarLocate open access versionFindings
  • [270] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell, Long-term recurrent convolutional networks for visual recognition and description, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 39 (4) (2017) 677–691.
    Google ScholarLocate open access versionFindings
  • [271] K.-S. Fu, J. Mui, A survey on image segmentation, Pattern recognition 13 (1) (1981) 3–16.
    Google ScholarLocate open access versionFindings
  • [272] Q. Zhou, B. Zheng, W. Zhu, L. J. Latecki, Multi-scale context for scene labeling via flexible segmentation graph, Pattern Recognition 59 (2016) 312–324.
    Google ScholarLocate open access versionFindings
  • [273] F. Liu, G. Lin, C. Shen, Crf learning with cnn features for image segmentation, Pattern Recognition 48 (10) (2015) 2983–2992.
    Google ScholarLocate open access versionFindings
  • [274] S. Bu, P. Han, Z. Liu, J. Han, Scene parsing using inference embedded deep networks, Pattern Recognition 59 (2016) 188–198.
    Google ScholarLocate open access versionFindings
  • [275] B. Peng, L. Zhang, D. Zhang, A survey of graph theoretical approaches to image segmentation, Pattern Recognition 46 (3) (2013) 1020–1038.
    Google ScholarLocate open access versionFindings
  • [276] C. Farabet, C. Couprie, L. Najman, Y. LeCun, Learning hierarchical features for scene labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 35 (8) (2013) 1915–1929.
    Google ScholarLocate open access versionFindings
  • [277] C. Couprie, C. Farabet, L. Najman, Y. LeCun, Indoor semantic segmentation using depth information, in: Proceedings of the International Conference on Learning Representations (ICLR), 2013.
    Google ScholarLocate open access versionFindings
  • [278] P. Pinheiro, R. Collobert, Recurrent convolutional neural networks for scene labeling, in: Proceedings of the International Conference on Machine Learning (ICML), 2014, pp. 82–90.
    Google ScholarLocate open access versionFindings
  • [279] B. Shuai, G. Wang, Z. Zuo, B. Wang, L. Zhao, Integrating parametric and non-parametric models for scene labeling, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4249–4258.
    Google ScholarLocate open access versionFindings
  • [280] B. Shuai, Z. Zuo, W. Gang, Quaddirectional 2d-recurrent neural networks for image labeling 22 (11) (2015) 1990–1994.
    Google ScholarFindings
  • [281] B. Shuai, Z. Zuo, G. Wang, B. Wang, Dag-recurrent neural networks for scene labeling, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3620–3629.
    Google ScholarLocate open access versionFindings
  • [282] M. Mostajabi, P. Yadollahpour, G. Shakhnarovich, Feedforward semantic segmentation with zoom-out features, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3376–3385.
    Google ScholarLocate open access versionFindings
  • [283] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected crfs, in: Proceedings of the International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • [284] M. El Ayadi, M. S. Kamel, F. Karray, Survey on speech emotion recognition: Features, classification schemes, and databases, Pattern Recognition 44 (3) (2011) 572–587.
    Google ScholarLocate open access versionFindings
  • [285] L. Deng, P. Kenny, M. Lennig, V. Gupta, F. Seitz, P. Mermelstein, Phonemic hidden markov models with continuous mixture output densities for large vocabulary word recognition, IEEE Trans. Signal Processing 39 (7) (1991) 1677–1681.
    Google ScholarLocate open access versionFindings
  • [286] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag. 29 (6) (2012) 82–97.
    Google ScholarLocate open access versionFindings
  • [287] L. Deng, J. Li, J.-T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams, et al., Recent advances in deep learning for speech research at microsoft, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013, pp. 8604–8608.
    Google ScholarLocate open access versionFindings
  • [288] K. Yao, D. Yu, F. Seide, H. Su, L. Deng, Y. Gong, Adaptation of context-dependent deep neural networks for automatic speech recognition, in: Proceedings of the Spoken Language Technology (SLT), 2012, pp. 366–369.
    Google ScholarLocate open access versionFindings
  • [289] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, G. Penn, Applying convolutional neural networks concepts to hybrid nnhmm model for speech recognition, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012, pp. 4277–4280.
    Google ScholarLocate open access versionFindings
  • [290] O. Abdel-Hamid, A.-R. Mohamed, H. Jiang, L. Deng, G. Penn, D. Yu, Convolutional neural networks for speech recognition, in: Proceedings of the International Conference on Learning Representations (ICLR), 2014.
    Google ScholarLocate open access versionFindings
  • [291] D. Palaz, R. Collobert, M. M. Doss, Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks, in: Proceedings of the International Speech Communication Association (INTERSPEECH), 2013, pp. 1766–1770.
    Google ScholarLocate open access versionFindings
  • [292] Y. Hoshen, R. J. Weiss, K. W. Wilson, Speech acoustic modeling from raw multichannel waveforms, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015, pp. 4624–4628.
    Google ScholarLocate open access versionFindings
  • [293] D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, et al., Deep speech 2: End-to-end speech recognition in english and mandarin, 2016, pp. 173–182.
    Google ScholarFindings
  • [294] T. Sercu, V. Goel, Advances in very deep convolutional neural networks for lvcsr, in: Proceedings of the International Speech Communication Association (INTERSPEECH), 2016, pp. 3429–3433.
    Google ScholarLocate open access versionFindings
  • [295] L. Toth, Convolutional deep maxout networks for phone recognition., in: Proceedings of the International Speech Communication Association (INTERSPEECH), 2014, pp. 1078–1082.
    Google ScholarLocate open access versionFindings
  • [296] T. N. Sainath, B. Kingsbury, A. Mohamed, G. E. Dahl, G. Saon, H. Soltau, T. Beran, A. Y. Aravkin, B. Ramabhadran, Improvements to deep convolutional neural networks for LVCSR, in: Proceedings of the Automatic Speech Recognition and Understanding (ASRU) Workshops, 2013, pp. 315–320.
    Google ScholarLocate open access versionFindings
  • [297] D. Yu, W. Xiong, J. Droppo, A. Stolcke, G. Ye, J. Li, G. Zweig, Deep convolutional neural networks with layer-wise context expansion and attention, in: Proceedings of the International Speech Communication Association (INTERSPEECH), 2016, pp. 17–21.
    Google ScholarLocate open access versionFindings
  • [298] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K. J. Lang, Phoneme recognition using time-delay neural networks, IEEE Trans. Acoustics, Speech, and Signal Processing 37 (3) (1989) 328–339.
    Google ScholarLocate open access versionFindings
  • [299] L.-H. Chen, T. Raitio, C. Valentini-Botinhao, J. Yamagishi, Z.-H. Ling, Dnn-based stochastic postfilter for hmm-based speech synthesis., in: Proceedings of the International Speech Communication Association (INTERSPEECH), 2014, pp. 1954–1958.
    Google ScholarLocate open access versionFindings
  • [300] B. Uria, I. Murray, S. Renals, C. Valentini-Botinhao, J. Bridle, Modelling acoustic feature dependencies with artificial neural networks: Trajectory-rnade, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015, pp. 4465–4469.
    Google ScholarLocate open access versionFindings
  • [301] Z. Huang, S. M. Siniscalchi, C.-H. Lee, Hierarchical bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation, Pattern Recognition Letters.
    Google ScholarFindings
  • [302] A. van den Oord, N. Kalchbrenner, K. Kavukcuoglu, Pixel recurrent neural networks, in: Proceedings of the International Conference on Machine Learning (ICML), 2016, pp. 1747–1756.
    Google ScholarLocate open access versionFindings
  • [303] R. Jozefowicz, O. Vinyals, M. Schuster, N. Shazeer, Y. Wu, Exploring the limits of language modeling, in: Proceedings of the International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • [304] Y. Kim, Y. Jernite, D. Sontag, A. M. Rush, Character-aware neural language models, in: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), 2016, pp. 2741–2749.
    Google ScholarLocate open access versionFindings
  • [305] J. Gu, C. Jianfei, G. Wang, T. Chen, Stack-captioning: Coarse-to-fine learning for image captioning, Vol. abs/1709.03376, 2017.
    Google ScholarFindings
  • [306] M. Wang, Z. Lu, H. Li, W. Jiang, Q. Liu, gen cnn: A convolutional architecture for word sequence prediction, in: Proceedings of the Association for Computational Linguistics (ACL), 2015, pp. 1567–1576.
    Google ScholarLocate open access versionFindings
  • [307] J. Gu, G. Wang, C. Jianfei, T. Chen, An empirical study of language cnn for image captioning, in: Proceedings of the International Conference on Computer Vision (ICCV), 2017.
    Google ScholarLocate open access versionFindings
  • [308] M. A. D. G. Yann N. Dauphin, Angela Fan, Language modeling with gated convolutional networks, in: Proceedings of the International Conference on Machine Learning (ICML), 2017, pp. 933–941.
    Google ScholarLocate open access versionFindings
  • [309] R. Collobert, J. Weston, A unified architecture for natural language processing: Deep neural networks with multitask learning, in: Proceedings of the International Conference on Machine Learning (ICML), 2008, pp. 160–167.
    Google ScholarLocate open access versionFindings
  • [310] L. Yu, K. M. Hermann, P. Blunsom, S. Pulman, Deep learning for answer sentence selection, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS) Workshop, 2014.
    Google ScholarLocate open access versionFindings
  • [311] N. Kalchbrenner, E. Grefenstette, P. Blunsom, A convolutional neural network for modelling sentences, in: Proceedings of the Association for Computational Linguistics (ACL), 2014, pp. 655–665.
    Google ScholarLocate open access versionFindings
  • [312] Y. Kim, Convolutional neural networks for sentence classification, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1746–1751.
    Google ScholarLocate open access versionFindings
  • [313] W. Yin, H. Schutze, Multichannel variable-size convolution for sentence classification, in: Proceedings of the Conference on Natural Language Learning (CoNLL), 2015, pp. 204–214.
    Google ScholarLocate open access versionFindings
  • [314] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch, Journal of Machine Learning Research (JMLR) 12 (2011) 2493–2537.
    Google ScholarLocate open access versionFindings
  • [315] A. Conneau, H. Schwenk, L. Barrault, Y. Lecun, Very deep convolutional networks for natural language processing, CoRR abs/1606.01781.
    Findings
  • [316] G. Huang, Y. Sun, Z. Liu, D. Sedra, K. Weinberger, Deep networks with stochastic depth, in: Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 646–661.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments