A convolutional neural network cascade for face detection
IEEE Conference on Computer Vision and Pattern Recognition, 2015.
EI
Weibo:
Abstract:
In real-world face detection, large visual variations, such as those due to pose, expression, and lighting, demand an advanced discriminative model to accurately differentiate faces from the backgrounds. Consequently, effective models for the problem tend to be computationally prohibitive. To address these two conflicting challenges, we p...More
Code:
Data:
Introduction
- Face detection is a well studied problem in computer vision. Modern face detectors can detect near frontal faces.
- The difficulties in face detection mainly come from two aspects: 1) the large visual variations of human faces in the cluttered backgrounds; 2) the large search space of possible face positions and face sizes.
- The former one requires the face detector to accurately address a binary classification problem while the latter one further imposes a time efficiency requirement.
- Due to the simple nature of the Haar feature, it is relatively weak in the uncontrolled environment where faces are in varied poses, expressions under unexpected lighting
Highlights
- Face detection is a well studied problem in computer vision
- Modern face detectors can detect near frontal faces. Recent research in this area focuses more on the uncontrolled face detection problem, where a number of factors such as pose changes, exaggerated expressions and extreme illuminations can lead to large visual variations in face appearance, and can severely degrade the robustness of the face detector
- We propose to apply the Convolutional Neural Network (CNN) [13] to face detection
- On the challenging Face Detection Data Set and Benchmark (FDDB) dataset [7], our detector outperforms the state-of-the-art methods in the discontinuous score evaluation
- To achieve fast face detection, we present a CNN cascade, which rejects false detections quickly in the early, lowresolution stages and carefully verify the detections in the later, high-resolution stages. We show that this intuitive solution can outperform the state-of-the-art methods in face detection
- Sharing the advantages of CNN, the proposed face detector is robust to large visual variations
Methods
- The authors verify the proposed detector on two public face detection benchmarks.
- On the Annotated Faces in the Wild (AFW) [33] test set, the detector is comparable to the stateof-the-art.
- This small scale test set is almost saturated and the authors observe that the evaluation is biased due to the mismatched face annotations.
- On the challenging Face Detection Data Set and Benchmark (FDDB) dataset [7], the detector outperforms the state-of-the-art methods in the discontinuous score evaluation.
Results
- To achieve fast face detection, the authors present a CNN cascade, which rejects false detections quickly in the early, lowresolution stages and carefully verify the detections in the later, high-resolution stages.
- The authors show that this intuitive solution can outperform the state-of-the-art methods in face detection.
- On the challenging Face Detection Data Set and Benchmark (FDDB) dataset [7], the detector outperforms the state-of-the-art methods in the discontinuous score evaluation
Conclusion
- The authors present a CNN cascade for fast face detection. The authors' detector evaluates the input image at low resolution to quickly reject non-face regions and carefully process the challenging regions at higher resolution for ac-
curate detection. - The authors present a CNN cascade for fast face detection.
- Calibration nets are introduced in the cascade to accelerate detection and improve bounding box quality.
- Sharing the advantages of CNN, the proposed face detector is robust to large visual variations.
- On the public face detection benchmark FDDB, the proposed detector outperforms the state-of-the-art methods.
- The proposed detector is very fast, achieving 14 FPS for typical VGA images on CPU and can be accelerated to 100 FPS on GPU
Summary
Introduction:
Face detection is a well studied problem in computer vision. Modern face detectors can detect near frontal faces.- The difficulties in face detection mainly come from two aspects: 1) the large visual variations of human faces in the cluttered backgrounds; 2) the large search space of possible face positions and face sizes.
- The former one requires the face detector to accurately address a binary classification problem while the latter one further imposes a time efficiency requirement.
- Due to the simple nature of the Haar feature, it is relatively weak in the uncontrolled environment where faces are in varied poses, expressions under unexpected lighting
Methods:
The authors verify the proposed detector on two public face detection benchmarks.- On the Annotated Faces in the Wild (AFW) [33] test set, the detector is comparable to the stateof-the-art.
- This small scale test set is almost saturated and the authors observe that the evaluation is biased due to the mismatched face annotations.
- On the challenging Face Detection Data Set and Benchmark (FDDB) dataset [7], the detector outperforms the state-of-the-art methods in the discontinuous score evaluation.
Results:
To achieve fast face detection, the authors present a CNN cascade, which rejects false detections quickly in the early, lowresolution stages and carefully verify the detections in the later, high-resolution stages.- The authors show that this intuitive solution can outperform the state-of-the-art methods in face detection.
- On the challenging Face Detection Data Set and Benchmark (FDDB) dataset [7], the detector outperforms the state-of-the-art methods in the discontinuous score evaluation
Conclusion:
The authors present a CNN cascade for fast face detection. The authors' detector evaluates the input image at low resolution to quickly reject non-face regions and carefully process the challenging regions at higher resolution for ac-
curate detection.- The authors present a CNN cascade for fast face detection.
- Calibration nets are introduced in the cascade to accelerate detection and improve bounding box quality.
- Sharing the advantages of CNN, the proposed face detector is robust to large visual variations.
- On the public face detection benchmark FDDB, the proposed detector outperforms the state-of-the-art methods.
- The proposed detector is very fast, achieving 14 FPS for typical VGA images on CPU and can be accelerated to 100 FPS on GPU
Tables
- Table1: Performance statistics of the cascade on FDDB: we show the average number of detection windows per image after each stage and the overall recall rate. We observe the number of detection windows decreases quickly and the calibration nets help further reduce the detection windows and improve the recall
Related work
- 2.1. Neural network based face detection
Early in 1994 Vaillant et al [26] applied neural networks for face detection. In their work, they proposed to train a convolutional neural network to detect the presence or absence of a face in an image window and scan the whole image with the network at all possible locations. In 1996, Rowley et al [22] presented a retinally connected neural network for upright frontal face detection. The method was extended for rotation invariant face detection later in 1998 [23] with a “router” network to estimate the orientation and apply the proper detector network.
In 2002 Garcia et al [5] developed a neural network to detect semi-frontal human faces in complex images; in 2005 Osadchy et al [20] trained a convolutional network for simultaneous face detection and pose estimation.
Funding
- Research reported in this publication was partly supported by the National Institute Of Nursing Research of the National Institutes of Health under Award Number R01NR015371
- This work is also partly supported by US National Science Foundation Grant IIS 1350763 and GH’s start-up funds from Stevens Institute of Technology
Reference
- [2] D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun. Joint cascade face detection and alignment. In Computer Vision–ECCV 2014. 2014. 2, 7, 8
- [3] M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge, 2009. 2
- [4] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2010. 2, 6
- [5] C. Garcia and M. Delakis. A neural architecture for fast and robust face detection. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, 2002. 2
- [6] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524, 2013. 2
- [7] V. Jain and E. Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. Technical Report UMCS-2010-009, University of Massachusetts, Amherst, 2010. 2, 6, 7
- [8] V. Jain and E. Learned-Miller. Online domain adaptation of a pre-trained cascade of classifiers. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 2011. 8
- [9] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. 2
- [10] M. Jones and P. Viola. Fast multi-view face detection. Mitsubishi Electric Research Lab TR-20003-96, 2003. 2
- [11] M. Koestinger, P. Wohlhart, P. M. Roth, and H. Bischof. Annotated facial landmarks in the wild: A large-scale, realworld database for facial landmark localization. In First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011. 6
- [12] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 2012. 3
- [13] Y. LeCun and Y. Bengio. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 1995. 1, 2
- [14] H. Li, G. Hua, Z. Lin, J. Brandt, and J. Yang. Probabilistic elastic part model for unsupervised face detector adaptation. In Proc. IEEE International Conference on Computer Vision, 208
- [15] H. Li, Z. Lin, J. Brandt, X. Shen, and G. Hua. Efficient boosted exemplar-based face detection. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, 202, 8
- [16] J. Li, T. Wang, and Y. Zhang. Face detection using surf cascade. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, 2011. 8
- [17] R. Lienhart and J. Maydt. An extended set of haar-like features for rapid object detection. In Image Processing. 2002. Proceedings. 2002 International Conference on, 2002. 2
- [18] N. Markus, M. Frljak, I. S. Pandzic, J. Ahlberg, and R. Forchheimer. A method for object detection based on pixel intensity comparisons organized in decision trees. arXiv preprint arXiv:1305.4537, 2013. 8
- [19] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detection without bells and whistles. In Computer Vision–ECCV 2014. 2014. 2, 6, 7, 8
- [20] M. Osadchy, Y. L. Cun, M. L. Miller, and P. Perona. Synergistic face detection and pose estimation with energy-based model. In In Advances in Neural Information Processing Systems (NIPS), 2005. 2
- [21] D. Park, D. Ramanan, and C. Fowlkes. Multiresolution models for object detection. In Computer Vision ECCV 2010. 2010. 2
- [22] H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. In Computer Vision and Pattern Recognition, 1996. 2
- [23] H. A. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural network-based face detection. In Computer Vision and Pattern Recognition, 1998. 2
- [24] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge, 2014. 2
- [25] X. Shen, Z. Lin, J. Brandt, and Y. Wu. Detecting and aligning faces by image retrieval. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2013. 2, 6, 8
- [26] R. Vaillant, C. Monrocq, and Y. Le Cun. Original approach for the localisation of objects in images. IEE ProceedingsVision, Image and Signal Processing, 1994. 2
- [27] P. A. Viola and M. J. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2001. 1, 2
- [28] J. Yan, X. Zhang, Z. Lei, and S. Z. Li. Face detection by structural models. Image and Vision Computing, 2013. 2, 6
- [29] B. Yang, J. Yan, Z. Lei, and S. Z. Li. Aggregate channel features for multi-view face detection. arXiv preprint arXiv:1407.4023, 2014. 2, 8
- [30] C. Zhang and Z. Zhang. A survey of recent advances in face detection. Technical Report MSR-TR-2010-66, 2010. 1, 2
- [31] J. Zhang, S. Shan, M. Kan, and X. Chen. Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment. In Computer Vision–ECCV 2014. 2014. 2
- [32] W. Zhang, G. Zelinsky, and D. Samaras. Real-time accurate object detection using multiple resolutions. In Proc. IEEE International Conference on Computer Vision, 2007. 2
- [33] X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2012. 2, 6, 8
Full Text
Tags
Comments