WIDER FACE: A Face Detection Benchmark

computer vision and pattern recognition, 2016.

Cited by: 611|Bibtex|Views137|DOI:https://doi.org/10.1109/CVPR.2016.596
EI
Other Links: academic.microsoft.com|dblp.uni-trier.de|arxiv.org
Weibo:
We wish to encourage the community to focusing on some inherent challenges of face detection – small scale, occlusion, and extreme poses. These factors are ubiquitous in many real world applications

Abstract:

Face detection is one of the most studied topics in the computer vision community. Much of the progresses have been made by the availability of face detection benchmark datasets. We show that there is a gap between current face detection performance and the real world requirements. To facilitate future face detection research, we introd...More

Code:

Data:

0
Introduction
  • Face detection is a critical step to all facial analysis algorithms, including face alignment, face recognition, face verification, and face parsing.
  • The goal of face detection is to determine the presence of faces in the image and, if present, return the image location and extent of each face [27].
  • While this appears as an effortless task for human, it is a very difficult task for computers.
  • Modern face detectors can detect near frontal faces and are widely used in real world applications, such as digital camera and electronic photo al-
Highlights
  • Face detection is a critical step to all facial analysis algorithms, including face alignment, face recognition, face verification, and face parsing
  • We show that WIDER FACE dataset is an effective training source for face detection
  • (2) We show an example of using WIDER FACE through proposing a multi-scale two-stage cascade framework, which uses divide and conquer strategy to deal with large scale variations
  • We have proposed a large, richly annotated WIDER FACE dataset for training and evaluating face detection algorithms
  • We wish to encourage the community to focusing on some inherent challenges of face detection – small scale, occlusion, and extreme poses. These factors are ubiquitous in many real world applications
  • As mentioned in Sec. 3.3, we classify faces into three categories: un-occluded, partially occluded (1%30% area occluded) and heavily occluded
  • Faces captured by surveillance cameras in public spaces or events are typically small, occluded, and with atypical poses
Methods
  • WIDER FACE dataset is a subset of the WIDER dataset [22]. The images in WIDER were collected in the following three steps: 1) Event categories were defined and chosen following the Large Scale Ontology for Multimedia (LSCOM) [18], which provides around 1, 000 concepts relevant to video event analysis. 2) Images are retrieved using search engines like Google and Bing.
  • The images in WIDER were collected in the following three steps: 1) Event categories were defined and chosen following the Large Scale Ontology for Multimedia (LSCOM) [18], which provides around 1, 000 concepts relevant to video event analysis.
  • The authors label the bounding boxes for all the recognizable faces in the WIDER FACE dataset.
  • If a face is occluded, the authors still label it with a bounding box but with an estimation on the scale of occlusion.
  • Similar to the PASCAL VOC dataset [5], the authors assign an ‘Ignore’ flag to any face
Results
  • The authors select VJ [21], ACF [24], DPM [17], and Faceness [28] as baselines.
  • The VJ [21], DPM [17], and Faceness [28] detectors are either obtained from the authors or from open source library (OpenCV).
  • The ACF [24] detector is reimplemented using the open source code.
  • Following previous work [17], the authors conduct linear transformation for each method to fit the annotation of WIDER FACE
Conclusion
  • The authors have proposed a large, richly annotated WIDER FACE dataset for training and evaluating face detection algorithms.
  • Even considering an easy subset, existing state-of-the-art algorithms reach only around 70% AP, as shown in Fig. 8
  • With this new dataset, the authors wish to encourage the community to focusing on some inherent challenges of face detection – small scale, occlusion, and extreme poses.
  • Faces captured by surveillance cameras in public spaces or events are typically small, occluded, and with atypical poses
  • These faces are arguably the most interesting yet crucial to detect for further investigation
Summary
  • Introduction:

    Face detection is a critical step to all facial analysis algorithms, including face alignment, face recognition, face verification, and face parsing.
  • The goal of face detection is to determine the presence of faces in the image and, if present, return the image location and extent of each face [27].
  • While this appears as an effortless task for human, it is a very difficult task for computers.
  • Modern face detectors can detect near frontal faces and are widely used in real world applications, such as digital camera and electronic photo al-
  • Methods:

    WIDER FACE dataset is a subset of the WIDER dataset [22]. The images in WIDER were collected in the following three steps: 1) Event categories were defined and chosen following the Large Scale Ontology for Multimedia (LSCOM) [18], which provides around 1, 000 concepts relevant to video event analysis. 2) Images are retrieved using search engines like Google and Bing.
  • The images in WIDER were collected in the following three steps: 1) Event categories were defined and chosen following the Large Scale Ontology for Multimedia (LSCOM) [18], which provides around 1, 000 concepts relevant to video event analysis.
  • The authors label the bounding boxes for all the recognizable faces in the WIDER FACE dataset.
  • If a face is occluded, the authors still label it with a bounding box but with an estimation on the scale of occlusion.
  • Similar to the PASCAL VOC dataset [5], the authors assign an ‘Ignore’ flag to any face
  • Results:

    The authors select VJ [21], ACF [24], DPM [17], and Faceness [28] as baselines.
  • The VJ [21], DPM [17], and Faceness [28] detectors are either obtained from the authors or from open source library (OpenCV).
  • The ACF [24] detector is reimplemented using the open source code.
  • Following previous work [17], the authors conduct linear transformation for each method to fit the annotation of WIDER FACE
  • Conclusion:

    The authors have proposed a large, richly annotated WIDER FACE dataset for training and evaluating face detection algorithms.
  • Even considering an easy subset, existing state-of-the-art algorithms reach only around 70% AP, as shown in Fig. 8
  • With this new dataset, the authors wish to encourage the community to focusing on some inherent challenges of face detection – small scale, occlusion, and extreme poses.
  • Faces captured by surveillance cameras in public spaces or events are typically small, occluded, and with atypical poses
  • These faces are arguably the most interesting yet crucial to detect for further investigation
Tables
  • Table1: Comparison of face detection datasets
  • Table2: Summary of face scale for multi-scale proposal networks
  • Table3: Comparison of per class AP. To save space, we only show abbreviations of category names here. The event category is organized based on the rank sequence in Fig. 4 (from hard to easy events based on scale measure). We compare the accuracy of Faceness and ACF models retrained on WIDER FACE training set with the baseline Faceness and ACF. With the help of WIDER FACE dataset, accuracies on 56 out of 60 categories have been improved. The re-trained Faceness model wins 30 out of 60 classes, followed by the ACF model with 26 classes. Faceness wins 1 medium class and 3 easy classes
Download tables as Excel
Related work
  • Brief review of recent face detection methods: Face detection has been studied for decades in the computer vision literature. Modern face detection algorithms can be categorized into four categories: cascade based methods [2, 10, 15, 16, 21], part based methods [19, 23, 30], channel feature based methods [25, 24], and neural network based methods [6, 14, 25, 28]. Here we highlight a few notable studies. A detailed survey can be found in [27, 29]. The seminal work by Viola and Jones [21] introduces integral image to compute Haar-like features in constant time. These features are then used to learn AdaBoost classifier with cascade structure for face detection. Various later studies follow a similar pipeline. Among those variants, SURF cascade [15] achieves competitive performance. Chen et al [2] learn face detection and alignment jointly in the same cascade framework and obtain promising detection performance.
Funding
  • This work is partially supported by SenseTime Group Limited, the Hong Kong Innovation and Technology Support Programme, the General Research Fund sponsored by the Research Grants Council of the Kong Kong SAR (CUHK 416312), and the National Natural Science Foundation of China (61503366, 91320101, 61472410; Corresponding author: Ping Luo)
Reference
  • P. Arbelaez, J. Pont-Tuset, J. Barron, F. Marques, and J. Malik. Multiscale combinatorial grouping. In CVPR, 2014. 3
    Google ScholarLocate open access versionFindings
  • D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun. Joint cascade face detection and alignment. In ECCV. 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • P. Dollar, Z. Tu, P. Perona, and S. Belongie. Integral channel features. In BMVC, 2009. 2
    Google ScholarLocate open access versionFindings
  • P. Dollar, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: A benchmark. In CVPR, 2009. 3
    Google ScholarLocate open access versionFindings
  • M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The Pascal visual object classes VOC challenge. IJCV, 2010. 3, 6
    Google ScholarLocate open access versionFindings
  • S. S. Farfade, M. Saberian, and L. Li. Multi-view face detection using deep convolutional neural networks. In ICMR, 2015. 2
    Google ScholarLocate open access versionFindings
  • P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. TPAMI, 2010. 2
    Google ScholarLocate open access versionFindings
  • A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR, 2012. 4
    Google ScholarLocate open access versionFindings
  • J. Hosang, R. Benenson, and B. Schiele. How good are detection proposals, really? In BMVC, 2014. 3
    Google ScholarLocate open access versionFindings
  • C. Huang, H. Ai, Y. Li, and S. Lao. High-performance rotation invariant multiview face detection. TPAMI, 2007. 2
    Google ScholarFindings
  • V. Jain and E. Learned-Miller. FDDB: A benchmark for face detection in unconstrained settings. Technical report, University of Massachusetts, Amherst, 2010. 1, 2
    Google ScholarFindings
  • B. F. Klare, B. Klein, E. Taborsky, A. Blanton, J. Cheney, K. Allen, P. Grother, A. Mah, M. Burge, and A. K. Jain. Pushing the frontiers of unconstrained face detection and recognition: IARPA janus benchmark A. In CVPR, 2015. 2
    Google ScholarLocate open access versionFindings
  • M. Koestinger, P. Wohlhart, P. M. Roth, and H. Bischof. Annotated facial landmarks in the wild: A large-scale, realworld database for facial landmark localization. In First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011. 2, 4
    Google ScholarLocate open access versionFindings
  • H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua. A convolutional neural network cascade for face detection. In CVPR, 2015. 1, 2
    Google ScholarLocate open access versionFindings
  • J. Li and Y. Zhang. Learning SURF cascade for fast and accurate object detection. In CVPR, 2013. 2
    Google ScholarFindings
  • S. Liao, A. K. Jain, and S. Z. Li. A fast and accurate unconstrained face detector. TPAMI, 2015. 2
    Google ScholarLocate open access versionFindings
  • M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detection without bells and whistles. In ECCV. 2014. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • M. Naphade, J. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis. Large-scale concept ontology for multimedia. MultiMedia, 2006. 3
    Google ScholarLocate open access versionFindings
  • R. Ranjan, V. M. Patel, and R. Chellappa. A deep pyramid deformable part model for face detection. CoRR, 2015. 2
    Google ScholarLocate open access versionFindings
  • J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders. Selective search for object recognition. IJCV, 2013. 3
    Google ScholarLocate open access versionFindings
  • P. Viola and M. J. Jones. Robust real-time face detection. IJCV, 2004. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • Y. Xiong, K. Zhu, D. Lin, and X. Tang. Recognize complex events from static images by fusing deep channels. In CVPR, 2015. 3
    Google ScholarLocate open access versionFindings
  • J. Yan, X. Zhang, Z. Lei, and S. Z. Li. Face detection by structural models. IVC, 2014. 2
    Google ScholarLocate open access versionFindings
  • B. Yang, J. Yan, Z. Lei, and S. Z. Li. Aggregate channel features for multi-view face detection. CoRR, 2014. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • B. Yang, J. Yan, Z. Lei, and S. Z. Li. Convolutional channel features. In ICCV, 2015. 2
    Google ScholarLocate open access versionFindings
  • B. Yang, J. Yan, Z. Lei, and S. Z. Li. Fine-grained evaluation on face detection in the wild. In FG, 2015. 2, 3, 4
    Google ScholarLocate open access versionFindings
  • M.-H. Yang, D. Kriegman, and N. Ahuja. Detecting faces in images: a survey. TPAMI, 2002. 1, 2
    Google ScholarLocate open access versionFindings
  • S. Yang, P. Luo, C. C. Loy, and X. Tang. From facial parts responses to face detection: A deep learning approach. In ICCV, 2015. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • C. Zhang and Z. Zhang. A survey of recent advances in face detection. Technical report, Tech. rep., Microsoft Research, 2010. 2
    Google ScholarFindings
  • X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In CVPR, 2012. 2
    Google ScholarLocate open access versionFindings
  • C. Zitnick and P. Dollar. Edge boxes: Locating object proposals from edges. In ECCV, 2014. 3, 4
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments