A convolutional neural network cascade for face detection

IEEE Conference on Computer Vision and Pattern Recognition, 2015.

Cited by: 928|Bibtex|Views143
EI
Other Links: academic.microsoft.com|dblp.uni-trier.de
Weibo:
We present a convolutional neural networks cascade for fast face detection

Abstract:

In real-world face detection, large visual variations, such as those due to pose, expression, and lighting, demand an advanced discriminative model to accurately differentiate faces from the backgrounds. Consequently, effective models for the problem tend to be computationally prohibitive. To address these two conflicting challenges, we p...More

Code:

Data:

0
Introduction
  • Face detection is a well studied problem in computer vision. Modern face detectors can detect near frontal faces.
  • The difficulties in face detection mainly come from two aspects: 1) the large visual variations of human faces in the cluttered backgrounds; 2) the large search space of possible face positions and face sizes.
  • The former one requires the face detector to accurately address a binary classification problem while the latter one further imposes a time efficiency requirement.
  • Due to the simple nature of the Haar feature, it is relatively weak in the uncontrolled environment where faces are in varied poses, expressions under unexpected lighting
Highlights
  • Face detection is a well studied problem in computer vision
  • Modern face detectors can detect near frontal faces. Recent research in this area focuses more on the uncontrolled face detection problem, where a number of factors such as pose changes, exaggerated expressions and extreme illuminations can lead to large visual variations in face appearance, and can severely degrade the robustness of the face detector
  • We propose to apply the Convolutional Neural Network (CNN) [13] to face detection
  • On the challenging Face Detection Data Set and Benchmark (FDDB) dataset [7], our detector outperforms the state-of-the-art methods in the discontinuous score evaluation
  • To achieve fast face detection, we present a CNN cascade, which rejects false detections quickly in the early, lowresolution stages and carefully verify the detections in the later, high-resolution stages. We show that this intuitive solution can outperform the state-of-the-art methods in face detection
  • Sharing the advantages of CNN, the proposed face detector is robust to large visual variations
Methods
  • The authors verify the proposed detector on two public face detection benchmarks.
  • On the Annotated Faces in the Wild (AFW) [33] test set, the detector is comparable to the stateof-the-art.
  • This small scale test set is almost saturated and the authors observe that the evaluation is biased due to the mismatched face annotations.
  • On the challenging Face Detection Data Set and Benchmark (FDDB) dataset [7], the detector outperforms the state-of-the-art methods in the discontinuous score evaluation.
Results
  • To achieve fast face detection, the authors present a CNN cascade, which rejects false detections quickly in the early, lowresolution stages and carefully verify the detections in the later, high-resolution stages.
  • The authors show that this intuitive solution can outperform the state-of-the-art methods in face detection.
  • On the challenging Face Detection Data Set and Benchmark (FDDB) dataset [7], the detector outperforms the state-of-the-art methods in the discontinuous score evaluation
Conclusion
  • The authors present a CNN cascade for fast face detection. The authors' detector evaluates the input image at low resolution to quickly reject non-face regions and carefully process the challenging regions at higher resolution for ac-

    curate detection.
  • The authors present a CNN cascade for fast face detection.
  • Calibration nets are introduced in the cascade to accelerate detection and improve bounding box quality.
  • Sharing the advantages of CNN, the proposed face detector is robust to large visual variations.
  • On the public face detection benchmark FDDB, the proposed detector outperforms the state-of-the-art methods.
  • The proposed detector is very fast, achieving 14 FPS for typical VGA images on CPU and can be accelerated to 100 FPS on GPU
Summary
  • Introduction:

    Face detection is a well studied problem in computer vision. Modern face detectors can detect near frontal faces.
  • The difficulties in face detection mainly come from two aspects: 1) the large visual variations of human faces in the cluttered backgrounds; 2) the large search space of possible face positions and face sizes.
  • The former one requires the face detector to accurately address a binary classification problem while the latter one further imposes a time efficiency requirement.
  • Due to the simple nature of the Haar feature, it is relatively weak in the uncontrolled environment where faces are in varied poses, expressions under unexpected lighting
  • Methods:

    The authors verify the proposed detector on two public face detection benchmarks.
  • On the Annotated Faces in the Wild (AFW) [33] test set, the detector is comparable to the stateof-the-art.
  • This small scale test set is almost saturated and the authors observe that the evaluation is biased due to the mismatched face annotations.
  • On the challenging Face Detection Data Set and Benchmark (FDDB) dataset [7], the detector outperforms the state-of-the-art methods in the discontinuous score evaluation.
  • Results:

    To achieve fast face detection, the authors present a CNN cascade, which rejects false detections quickly in the early, lowresolution stages and carefully verify the detections in the later, high-resolution stages.
  • The authors show that this intuitive solution can outperform the state-of-the-art methods in face detection.
  • On the challenging Face Detection Data Set and Benchmark (FDDB) dataset [7], the detector outperforms the state-of-the-art methods in the discontinuous score evaluation
  • Conclusion:

    The authors present a CNN cascade for fast face detection. The authors' detector evaluates the input image at low resolution to quickly reject non-face regions and carefully process the challenging regions at higher resolution for ac-

    curate detection.
  • The authors present a CNN cascade for fast face detection.
  • Calibration nets are introduced in the cascade to accelerate detection and improve bounding box quality.
  • Sharing the advantages of CNN, the proposed face detector is robust to large visual variations.
  • On the public face detection benchmark FDDB, the proposed detector outperforms the state-of-the-art methods.
  • The proposed detector is very fast, achieving 14 FPS for typical VGA images on CPU and can be accelerated to 100 FPS on GPU
Tables
  • Table1: Performance statistics of the cascade on FDDB: we show the average number of detection windows per image after each stage and the overall recall rate. We observe the number of detection windows decreases quickly and the calibration nets help further reduce the detection windows and improve the recall
Download tables as Excel
Related work
  • 2.1. Neural network based face detection

    Early in 1994 Vaillant et al [26] applied neural networks for face detection. In their work, they proposed to train a convolutional neural network to detect the presence or absence of a face in an image window and scan the whole image with the network at all possible locations. In 1996, Rowley et al [22] presented a retinally connected neural network for upright frontal face detection. The method was extended for rotation invariant face detection later in 1998 [23] with a “router” network to estimate the orientation and apply the proper detector network.

    In 2002 Garcia et al [5] developed a neural network to detect semi-frontal human faces in complex images; in 2005 Osadchy et al [20] trained a convolutional network for simultaneous face detection and pose estimation.
Funding
  • Research reported in this publication was partly supported by the National Institute Of Nursing Research of the National Institutes of Health under Award Number R01NR015371
  • This work is also partly supported by US National Science Foundation Grant IIS 1350763 and GH’s start-up funds from Stevens Institute of Technology
Reference
  • [2] D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun. Joint cascade face detection and alignment. In Computer Vision–ECCV 2014. 2014. 2, 7, 8
    Google ScholarLocate open access versionFindings
  • [3] M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge, 2009. 2
    Google ScholarFindings
  • [4] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2010. 2, 6
    Google ScholarLocate open access versionFindings
  • [5] C. Garcia and M. Delakis. A neural architecture for fast and robust face detection. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, 2002. 2
    Google ScholarLocate open access versionFindings
  • [6] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524, 2013. 2
    Findings
  • [7] V. Jain and E. Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. Technical Report UMCS-2010-009, University of Massachusetts, Amherst, 2010. 2, 6, 7
    Google ScholarFindings
  • [8] V. Jain and E. Learned-Miller. Online domain adaptation of a pre-trained cascade of classifiers. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 2011. 8
    Google ScholarLocate open access versionFindings
  • [9] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. 2
    Findings
  • [10] M. Jones and P. Viola. Fast multi-view face detection. Mitsubishi Electric Research Lab TR-20003-96, 2003. 2
    Google ScholarLocate open access versionFindings
  • [11] M. Koestinger, P. Wohlhart, P. M. Roth, and H. Bischof. Annotated facial landmarks in the wild: A large-scale, realworld database for facial landmark localization. In First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011. 6
    Google ScholarLocate open access versionFindings
  • [12] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 2012. 3
    Google ScholarLocate open access versionFindings
  • [13] Y. LeCun and Y. Bengio. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 1995. 1, 2
    Google ScholarLocate open access versionFindings
  • [14] H. Li, G. Hua, Z. Lin, J. Brandt, and J. Yang. Probabilistic elastic part model for unsupervised face detector adaptation. In Proc. IEEE International Conference on Computer Vision, 208
    Google ScholarLocate open access versionFindings
  • [15] H. Li, Z. Lin, J. Brandt, X. Shen, and G. Hua. Efficient boosted exemplar-based face detection. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, 202, 8
    Google ScholarLocate open access versionFindings
  • [16] J. Li, T. Wang, and Y. Zhang. Face detection using surf cascade. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, 2011. 8
    Google ScholarLocate open access versionFindings
  • [17] R. Lienhart and J. Maydt. An extended set of haar-like features for rapid object detection. In Image Processing. 2002. Proceedings. 2002 International Conference on, 2002. 2
    Google ScholarLocate open access versionFindings
  • [18] N. Markus, M. Frljak, I. S. Pandzic, J. Ahlberg, and R. Forchheimer. A method for object detection based on pixel intensity comparisons organized in decision trees. arXiv preprint arXiv:1305.4537, 2013. 8
    Findings
  • [19] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detection without bells and whistles. In Computer Vision–ECCV 2014. 2014. 2, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • [20] M. Osadchy, Y. L. Cun, M. L. Miller, and P. Perona. Synergistic face detection and pose estimation with energy-based model. In In Advances in Neural Information Processing Systems (NIPS), 2005. 2
    Google ScholarLocate open access versionFindings
  • [21] D. Park, D. Ramanan, and C. Fowlkes. Multiresolution models for object detection. In Computer Vision ECCV 2010. 2010. 2
    Google ScholarLocate open access versionFindings
  • [22] H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. In Computer Vision and Pattern Recognition, 1996. 2
    Google ScholarLocate open access versionFindings
  • [23] H. A. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural network-based face detection. In Computer Vision and Pattern Recognition, 1998. 2
    Google ScholarLocate open access versionFindings
  • [24] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge, 2014. 2
    Google ScholarFindings
  • [25] X. Shen, Z. Lin, J. Brandt, and Y. Wu. Detecting and aligning faces by image retrieval. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2013. 2, 6, 8
    Google ScholarLocate open access versionFindings
  • [26] R. Vaillant, C. Monrocq, and Y. Le Cun. Original approach for the localisation of objects in images. IEE ProceedingsVision, Image and Signal Processing, 1994. 2
    Google ScholarLocate open access versionFindings
  • [27] P. A. Viola and M. J. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2001. 1, 2
    Google ScholarLocate open access versionFindings
  • [28] J. Yan, X. Zhang, Z. Lei, and S. Z. Li. Face detection by structural models. Image and Vision Computing, 2013. 2, 6
    Google ScholarLocate open access versionFindings
  • [29] B. Yang, J. Yan, Z. Lei, and S. Z. Li. Aggregate channel features for multi-view face detection. arXiv preprint arXiv:1407.4023, 2014. 2, 8
    Findings
  • [30] C. Zhang and Z. Zhang. A survey of recent advances in face detection. Technical Report MSR-TR-2010-66, 2010. 1, 2
    Google ScholarFindings
  • [31] J. Zhang, S. Shan, M. Kan, and X. Chen. Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment. In Computer Vision–ECCV 2014. 2014. 2
    Google ScholarLocate open access versionFindings
  • [32] W. Zhang, G. Zelinsky, and D. Samaras. Real-time accurate object detection using multiple resolutions. In Proc. IEEE International Conference on Computer Vision, 2007. 2
    Google ScholarLocate open access versionFindings
  • [33] X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2012. 2, 6, 8
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments