WIDER FACE: A Face Detection Benchmark
computer vision and pattern recognition, 2016.
EI
Weibo:
Abstract:
Face detection is one of the most studied topics in the computer vision community. Much of the progresses have been made by the availability of face detection benchmark datasets. We show that there is a gap between current face detection performance and the real world requirements. To facilitate future face detection research, we introd...More
Code:
Data:
Introduction
- Face detection is a critical step to all facial analysis algorithms, including face alignment, face recognition, face verification, and face parsing.
- The goal of face detection is to determine the presence of faces in the image and, if present, return the image location and extent of each face [27].
- While this appears as an effortless task for human, it is a very difficult task for computers.
- Modern face detectors can detect near frontal faces and are widely used in real world applications, such as digital camera and electronic photo al-
Highlights
- Face detection is a critical step to all facial analysis algorithms, including face alignment, face recognition, face verification, and face parsing
- We show that WIDER FACE dataset is an effective training source for face detection
- (2) We show an example of using WIDER FACE through proposing a multi-scale two-stage cascade framework, which uses divide and conquer strategy to deal with large scale variations
- We have proposed a large, richly annotated WIDER FACE dataset for training and evaluating face detection algorithms
- We wish to encourage the community to focusing on some inherent challenges of face detection – small scale, occlusion, and extreme poses. These factors are ubiquitous in many real world applications
- As mentioned in Sec. 3.3, we classify faces into three categories: un-occluded, partially occluded (1%30% area occluded) and heavily occluded
- Faces captured by surveillance cameras in public spaces or events are typically small, occluded, and with atypical poses
Methods
- WIDER FACE dataset is a subset of the WIDER dataset [22]. The images in WIDER were collected in the following three steps: 1) Event categories were defined and chosen following the Large Scale Ontology for Multimedia (LSCOM) [18], which provides around 1, 000 concepts relevant to video event analysis. 2) Images are retrieved using search engines like Google and Bing.
- The images in WIDER were collected in the following three steps: 1) Event categories were defined and chosen following the Large Scale Ontology for Multimedia (LSCOM) [18], which provides around 1, 000 concepts relevant to video event analysis.
- The authors label the bounding boxes for all the recognizable faces in the WIDER FACE dataset.
- If a face is occluded, the authors still label it with a bounding box but with an estimation on the scale of occlusion.
- Similar to the PASCAL VOC dataset [5], the authors assign an ‘Ignore’ flag to any face
Results
- The authors select VJ [21], ACF [24], DPM [17], and Faceness [28] as baselines.
- The VJ [21], DPM [17], and Faceness [28] detectors are either obtained from the authors or from open source library (OpenCV).
- The ACF [24] detector is reimplemented using the open source code.
- Following previous work [17], the authors conduct linear transformation for each method to fit the annotation of WIDER FACE
Conclusion
- The authors have proposed a large, richly annotated WIDER FACE dataset for training and evaluating face detection algorithms.
- Even considering an easy subset, existing state-of-the-art algorithms reach only around 70% AP, as shown in Fig. 8
- With this new dataset, the authors wish to encourage the community to focusing on some inherent challenges of face detection – small scale, occlusion, and extreme poses.
- Faces captured by surveillance cameras in public spaces or events are typically small, occluded, and with atypical poses
- These faces are arguably the most interesting yet crucial to detect for further investigation
Summary
Introduction:
Face detection is a critical step to all facial analysis algorithms, including face alignment, face recognition, face verification, and face parsing.- The goal of face detection is to determine the presence of faces in the image and, if present, return the image location and extent of each face [27].
- While this appears as an effortless task for human, it is a very difficult task for computers.
- Modern face detectors can detect near frontal faces and are widely used in real world applications, such as digital camera and electronic photo al-
Methods:
WIDER FACE dataset is a subset of the WIDER dataset [22]. The images in WIDER were collected in the following three steps: 1) Event categories were defined and chosen following the Large Scale Ontology for Multimedia (LSCOM) [18], which provides around 1, 000 concepts relevant to video event analysis. 2) Images are retrieved using search engines like Google and Bing.- The images in WIDER were collected in the following three steps: 1) Event categories were defined and chosen following the Large Scale Ontology for Multimedia (LSCOM) [18], which provides around 1, 000 concepts relevant to video event analysis.
- The authors label the bounding boxes for all the recognizable faces in the WIDER FACE dataset.
- If a face is occluded, the authors still label it with a bounding box but with an estimation on the scale of occlusion.
- Similar to the PASCAL VOC dataset [5], the authors assign an ‘Ignore’ flag to any face
Results:
The authors select VJ [21], ACF [24], DPM [17], and Faceness [28] as baselines.- The VJ [21], DPM [17], and Faceness [28] detectors are either obtained from the authors or from open source library (OpenCV).
- The ACF [24] detector is reimplemented using the open source code.
- Following previous work [17], the authors conduct linear transformation for each method to fit the annotation of WIDER FACE
Conclusion:
The authors have proposed a large, richly annotated WIDER FACE dataset for training and evaluating face detection algorithms.- Even considering an easy subset, existing state-of-the-art algorithms reach only around 70% AP, as shown in Fig. 8
- With this new dataset, the authors wish to encourage the community to focusing on some inherent challenges of face detection – small scale, occlusion, and extreme poses.
- Faces captured by surveillance cameras in public spaces or events are typically small, occluded, and with atypical poses
- These faces are arguably the most interesting yet crucial to detect for further investigation
Tables
- Table1: Comparison of face detection datasets
- Table2: Summary of face scale for multi-scale proposal networks
- Table3: Comparison of per class AP. To save space, we only show abbreviations of category names here. The event category is organized based on the rank sequence in Fig. 4 (from hard to easy events based on scale measure). We compare the accuracy of Faceness and ACF models retrained on WIDER FACE training set with the baseline Faceness and ACF. With the help of WIDER FACE dataset, accuracies on 56 out of 60 categories have been improved. The re-trained Faceness model wins 30 out of 60 classes, followed by the ACF model with 26 classes. Faceness wins 1 medium class and 3 easy classes
Related work
- Brief review of recent face detection methods: Face detection has been studied for decades in the computer vision literature. Modern face detection algorithms can be categorized into four categories: cascade based methods [2, 10, 15, 16, 21], part based methods [19, 23, 30], channel feature based methods [25, 24], and neural network based methods [6, 14, 25, 28]. Here we highlight a few notable studies. A detailed survey can be found in [27, 29]. The seminal work by Viola and Jones [21] introduces integral image to compute Haar-like features in constant time. These features are then used to learn AdaBoost classifier with cascade structure for face detection. Various later studies follow a similar pipeline. Among those variants, SURF cascade [15] achieves competitive performance. Chen et al [2] learn face detection and alignment jointly in the same cascade framework and obtain promising detection performance.
Funding
- This work is partially supported by SenseTime Group Limited, the Hong Kong Innovation and Technology Support Programme, the General Research Fund sponsored by the Research Grants Council of the Kong Kong SAR (CUHK 416312), and the National Natural Science Foundation of China (61503366, 91320101, 61472410; Corresponding author: Ping Luo)
Reference
- P. Arbelaez, J. Pont-Tuset, J. Barron, F. Marques, and J. Malik. Multiscale combinatorial grouping. In CVPR, 2014. 3
- D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun. Joint cascade face detection and alignment. In ECCV. 2014. 1, 2
- P. Dollar, Z. Tu, P. Perona, and S. Belongie. Integral channel features. In BMVC, 2009. 2
- P. Dollar, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: A benchmark. In CVPR, 2009. 3
- M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The Pascal visual object classes VOC challenge. IJCV, 2010. 3, 6
- S. S. Farfade, M. Saberian, and L. Li. Multi-view face detection using deep convolutional neural networks. In ICMR, 2015. 2
- P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. TPAMI, 2010. 2
- A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR, 2012. 4
- J. Hosang, R. Benenson, and B. Schiele. How good are detection proposals, really? In BMVC, 2014. 3
- C. Huang, H. Ai, Y. Li, and S. Lao. High-performance rotation invariant multiview face detection. TPAMI, 2007. 2
- V. Jain and E. Learned-Miller. FDDB: A benchmark for face detection in unconstrained settings. Technical report, University of Massachusetts, Amherst, 2010. 1, 2
- B. F. Klare, B. Klein, E. Taborsky, A. Blanton, J. Cheney, K. Allen, P. Grother, A. Mah, M. Burge, and A. K. Jain. Pushing the frontiers of unconstrained face detection and recognition: IARPA janus benchmark A. In CVPR, 2015. 2
- M. Koestinger, P. Wohlhart, P. M. Roth, and H. Bischof. Annotated facial landmarks in the wild: A large-scale, realworld database for facial landmark localization. In First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011. 2, 4
- H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua. A convolutional neural network cascade for face detection. In CVPR, 2015. 1, 2
- J. Li and Y. Zhang. Learning SURF cascade for fast and accurate object detection. In CVPR, 2013. 2
- S. Liao, A. K. Jain, and S. Z. Li. A fast and accurate unconstrained face detector. TPAMI, 2015. 2
- M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detection without bells and whistles. In ECCV. 2014. 1, 2, 6
- M. Naphade, J. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis. Large-scale concept ontology for multimedia. MultiMedia, 2006. 3
- R. Ranjan, V. M. Patel, and R. Chellappa. A deep pyramid deformable part model for face detection. CoRR, 2015. 2
- J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders. Selective search for object recognition. IJCV, 2013. 3
- P. Viola and M. J. Jones. Robust real-time face detection. IJCV, 2004. 1, 2, 6
- Y. Xiong, K. Zhu, D. Lin, and X. Tang. Recognize complex events from static images by fusing deep channels. In CVPR, 2015. 3
- J. Yan, X. Zhang, Z. Lei, and S. Z. Li. Face detection by structural models. IVC, 2014. 2
- B. Yang, J. Yan, Z. Lei, and S. Z. Li. Aggregate channel features for multi-view face detection. CoRR, 2014. 1, 2, 6
- B. Yang, J. Yan, Z. Lei, and S. Z. Li. Convolutional channel features. In ICCV, 2015. 2
- B. Yang, J. Yan, Z. Lei, and S. Z. Li. Fine-grained evaluation on face detection in the wild. In FG, 2015. 2, 3, 4
- M.-H. Yang, D. Kriegman, and N. Ahuja. Detecting faces in images: a survey. TPAMI, 2002. 1, 2
- S. Yang, P. Luo, C. C. Loy, and X. Tang. From facial parts responses to face detection: A deep learning approach. In ICCV, 2015. 1, 2, 6
- C. Zhang and Z. Zhang. A survey of recent advances in face detection. Technical report, Tech. rep., Microsoft Research, 2010. 2
- X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In CVPR, 2012. 2
- C. Zitnick and P. Dollar. Edge boxes: Locating object proposals from edges. In ECCV, 2014. 3, 4
Full Text
Tags
Comments