Selective Refinement Network for High Performance Face Detection

national conference on artificial intelligence, 2019.

Cited by: 27|Bibtex|Views185
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We have presented Selective Refinement Network, a novel single shot face detector, which consists of two key modules, i.e., the step Classification and the step Regression

Abstract:

High performance face detection remains a very challenging problem, especially when there exists many tiny faces. This paper presents a novel single-shot face detector, named Selective Refinement Network (SRN), which introduces novel two-step classification and regression operations selectively into an anchor-based face detector to reduce...More

Code:

Data:

0
Introduction
  • Face detection is a long-standing problem in computer vision with extensive applications including face alignment, face analysis, face recognition, etc.
  • To further improve the performance of face detection has become a challenging issue.
  • There remains room for improvement in two aspects: (a) recall efficiency: number of false positives needs to be reduced at the high recall rates; (b) location accuracy: accuracy of the bounding box location needs to be improved
  • These two problems are elaborated as follows
Highlights
  • Face detection is a long-standing problem in computer vision with extensive applications including face alignment, face analysis, face recognition, etc
  • We investigate the effects of two-step classification and regression on different levels of detection layers and propose a novel face detection framework, named Selective Refinement Network (SRN), which selectively applies two-step classification and regression to specific levels of detection layers
  • The network structure of SRN is shown in Figure 2, which consists of two key modules, named as the Selective Two-step Classification (STC) module and the Selective Two-step Regression (STR) module
  • We present the precision-recall curves of the proposed SRN method and six state-of-the-art methods and three commercial face detectors (i.e., SkyBiometry, Face++ and Picasa) in Figure 4(b)
  • We have presented SRN, a novel single shot face detector, which consists of two key modules, i.e., the STC and the STR
  • We achieve state-of-the-art results on AFW, PASCAL face, FDDB, and WIDER FACE datasets
  • Extensive experiments on the AFW, PASCAL face, FDDB and WIDER FACE datasets demonstrate that SRN achieves the state-of-the-art detection performance
Methods
  • The authors first analyze the proposed method in detail to verify the effectiveness of the contributions.
  • The authors conduct a set of ablation experiments on the WIDER FACE dataset to analyze the model in detail.
  • The authors use the same parameter settings for all the experiments, except for specified changes to the components.
  • All models are trained on the WIDER FACE training set and evaluated on the validation set.
  • The results of ablation experiments are listed in Table 1 and some promising conclusions can be drawn as follows
Results
  • AFW Dataset
  • It consists of 205 images with 473 labeled faces.
  • The images in the dataset contain cluttered backgrounds with large variations in both face viewpoint and appearance.
  • The authors compare SRN against seven state-ofthe-art methods and three commercial face detectors (i.e., Face.com, Face++ and Picasa).
  • PASCAL Face Dataset
  • It has 1, 335 labeled faces in 851 images with large face appearance and pose variations.
  • The authors present the precision-recall curves of the proposed SRN method and six state-of-the-art methods and three commercial face detectors (i.e., SkyBiometry, Face++ and Picasa) in Figure 4(b).
  • SRN achieves the state-of-the-art results by improving 4.99% AP score compared to the second best method STN (Chen et al 2016)
Conclusion
  • The authors have presented SRN, a novel single shot face detector, which consists of two key modules, i.e., the STC and the STR.
  • The STC uses the first-step classifier to filter out most simple negative anchors from low level detection layers to reduce the search space for the second-step classifier, so as to reduce false positives.
  • The STR applies the first-step regressor to coarsely adjust the locations and sizes of anchors from high level detection layers to provide better initialization for the second-step regressor, in order to improve the location accuracy of bounding boxes.
  • Extensive experiments on the AFW, PASCAL face, FDDB and WIDER FACE datasets demonstrate that SRN achieves the state-of-the-art detection performance
Summary
  • Introduction:

    Face detection is a long-standing problem in computer vision with extensive applications including face alignment, face analysis, face recognition, etc.
  • To further improve the performance of face detection has become a challenging issue.
  • There remains room for improvement in two aspects: (a) recall efficiency: number of false positives needs to be reduced at the high recall rates; (b) location accuracy: accuracy of the bounding box location needs to be improved
  • These two problems are elaborated as follows
  • Methods:

    The authors first analyze the proposed method in detail to verify the effectiveness of the contributions.
  • The authors conduct a set of ablation experiments on the WIDER FACE dataset to analyze the model in detail.
  • The authors use the same parameter settings for all the experiments, except for specified changes to the components.
  • All models are trained on the WIDER FACE training set and evaluated on the validation set.
  • The results of ablation experiments are listed in Table 1 and some promising conclusions can be drawn as follows
  • Results:

    AFW Dataset
  • It consists of 205 images with 473 labeled faces.
  • The images in the dataset contain cluttered backgrounds with large variations in both face viewpoint and appearance.
  • The authors compare SRN against seven state-ofthe-art methods and three commercial face detectors (i.e., Face.com, Face++ and Picasa).
  • PASCAL Face Dataset
  • It has 1, 335 labeled faces in 851 images with large face appearance and pose variations.
  • The authors present the precision-recall curves of the proposed SRN method and six state-of-the-art methods and three commercial face detectors (i.e., SkyBiometry, Face++ and Picasa) in Figure 4(b).
  • SRN achieves the state-of-the-art results by improving 4.99% AP score compared to the second best method STN (Chen et al 2016)
  • Conclusion:

    The authors have presented SRN, a novel single shot face detector, which consists of two key modules, i.e., the STC and the STR.
  • The STC uses the first-step classifier to filter out most simple negative anchors from low level detection layers to reduce the search space for the second-step classifier, so as to reduce false positives.
  • The STR applies the first-step regressor to coarsely adjust the locations and sizes of anchors from high level detection layers to provide better initialization for the second-step regressor, in order to improve the location accuracy of bounding boxes.
  • Extensive experiments on the AFW, PASCAL face, FDDB and WIDER FACE datasets demonstrate that SRN achieves the state-of-the-art detection performance
Tables
  • Table1: Effectiveness of various designs on the AP performance
  • Table2: AP performance of the two-step classification applied to each pyramid level
  • Table3: Number of false positives at different recall rates
  • Table4: AP performance of the two-step regression applied to each pyramid level
  • Table5: AP at different IoU thresholds on the WIDER FACE Hard subset
Download tables as Excel
Related work
  • Face detection has been a challenging research field since its emergence in the 1990s. Viola and Jones pioneer to use Haar features and AdaBoost to train a face detector with promising accuracy and efficiency (Viola and Jones 2004), which inspires several different approaches afterwards (Liao, Jain, and Li 2016; Brubaker et al 2008). Apart from those, another important job is the introduction of Deformable Part Model (DPM) (Mathias et al 2014; Yan et al 2014a; Zhu and Ramanan 2012).

    Recently, face detection has been dominated by the CNNbased methods. CascadeCNN (Li et al 2015) improves detection accuracy by training a serious of interleaved CNN models and following work (Qin et al 2016) proposes to jointly train the cascaded CNNs to realize end-to-end optimization. MTCNN (Zhang et al 2016) proposes a joint face detection and alignment method using multi-task cascaded CNNs. Faceness (Yang et al 2015) formulates face detection as scoring facial parts responses to detect faces under severe occlusion. UnitBox (Yu et al 2016) introduces an IoU loss for bounding box prediction. EMO (Zhu et al 2018) proposes an Expected Max Overlapping score to evaluate the quality of anchor matching. SAFD (Hao et al 2017) develops a scale proposal stage which automatically normalizes face sizes prior to detection. S2AP (Song et al 2018) pays attention to specific scales in image pyramid and valid locations in each scales layer. PCN (Shi et al 2018) proposes a cascade-style structure to rotate faces in a coarse-to-fine manner. Recent work (Bai et al 2018) designs a novel network to directly generate a clear super-resolution face from a blurry small one.
Funding
  • The work is supported by the Young Thousand Talents Program, the Natural Science Foundation of China (Grant No.61672519, No.61876178), the research project from Huawei Inc. (Grant No YBN2018065193), and the independent research project of National Laboratory of Pattern Recognition
Reference
  • Bai, Y.; Zhang, Y.; Ding, M.; and Ghanem, B. 2018. Finding tiny faces in the wild with generative adversarial network. In CVPR.
    Google ScholarFindings
  • Brubaker, S. C.; Wu, J.; Sun, J.; Mullin, M. D.; and Rehg, J. M. 2008. On the design of cascades of boosted ensembles for face detection. IJCV.
    Google ScholarLocate open access versionFindings
  • Cai, Z., and Vasconcelos, N. 2018. Cascade R-CNN: delving into high quality object detection. In CVPR.
    Google ScholarFindings
  • Chen, D.; Hua, G.; Wen, F.; and Sun, J. 2016. Supervised transformer network for efficient face detection. In ECCV.
    Google ScholarFindings
  • Gidaris, S., and Komodakis, N. 201Object detection via a multi-region and semantic segmentation-aware CNN model. In ICCV.
    Google ScholarFindings
  • Girshick, R. B. 2015. Fast R-CNN. In ICCV.
    Google ScholarFindings
  • Hao, Z.; Liu, Y.; Qin, H.; Yan, J.; Li, X.; and Hu, X. 201Scale-aware face detection. In CVPR.
    Google ScholarFindings
  • He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR.
    Google ScholarFindings
  • Howard, A. G. 2013. Some improvements on deep convolutional neural network based image classification. CoRR.
    Google ScholarLocate open access versionFindings
  • Hu, P., and Ramanan, D. 2017. Finding tiny faces. In CVPR.
    Google ScholarFindings
  • Jain, V., and Learned-Miller, E. 2010. Fddb: A benchmark for face detection in unconstrained settings. Technical report, University of Massachusetts, Amherst.
    Google ScholarFindings
  • Li, H.; Lin, Z.; Shen, X.; Brandt, J.; and Hua, G. 2015. A convolutional neural network cascade for face detection. In CVPR.
    Google ScholarFindings
  • Liao, S.; Jain, A. K.; and Li, S. Z. 2016. A fast and accurate unconstrained face detector. TPAMI.
    Google ScholarLocate open access versionFindings
  • Lin, T.; Maire, M.; Belongie, S. J.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; and Zitnick, C. L. 20Microsoft COCO: common objects in context. In ECCV.
    Google ScholarLocate open access versionFindings
  • Lin, T.; Dollar, P.; Girshick, R. B.; He, K.; Hariharan, B.; and Belongie, S. J. 2017a. Feature pyramid networks for object detection. In CVPR.
    Google ScholarFindings
  • Lin, T.; Goyal, P.; Girshick, R. B.; He, K.; and Dollar, P. 2017b. Focal loss for dense object detection. In ICCV.
    Google ScholarFindings
  • Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S. E.; Fu, C.; and Berg, A. C. 2016. SSD: single shot multibox detector. In ECCV.
    Google ScholarFindings
  • Mathias, M.; Benenson, R.; Pedersoli, M.; and Gool, L. J. V. 2014. Face detection without bells and whistles. In ECCV.
    Google ScholarFindings
  • Najibi, M.; Samangouei, P.; Chellappa, R.; and Davis, L. S. 2017. SSH: single stage headless face detector. In ICCV.
    Google ScholarFindings
  • Paszke, A.; Gross, S.; Chintala, S.; and Chanan, G. 2017. Pytorch.
    Google ScholarFindings
  • Qin, H.; Yan, J.; Li, X.; and Hu, X. 2016. Joint training of cascaded CNN for face detection. In CVPR.
    Google ScholarFindings
  • Ren, S.; He, K.; Girshick, R. B.; and Sun, J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI.
    Google ScholarLocate open access versionFindings
  • Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M. S.; Berg, A. C.; and Li, F. 2015. Imagenet large scale visual recognition challenge. IJCV.
    Google ScholarLocate open access versionFindings
  • Shi, X.; Shan, S.; Kan, M.; Wu, S.; and Chen, X. 2018. Real-time rotation-invariant face detection with progressive calibration networks. In CVPR.
    Google ScholarFindings
  • Song, G.; Liu, Y.; Jiang, M.; Wang, Y.; Yan, J.; and Leng, B. 2018. Beyond trade-off: Accelerate fcn-based face detector with higher accuracy. In CVPR.
    Google ScholarFindings
  • Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S. E.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A. 2015. Going deeper with convolutions. In CVPR.
    Google ScholarFindings
  • Tang, X.; Du, D. K.; He, Z.; and Liu, J. 2018. Pyramidbox: A context-assisted single shot face detector. In ECCV.
    Google ScholarFindings
  • Viola, P. A., and Jones, M. J. 2004. Robust real-time face detection. IJCV.
    Google ScholarLocate open access versionFindings
  • Wang, H.; Li, Z.; Ji, X.; and Wang, Y. 2017a. Face r-cnn. CoRR.
    Google ScholarFindings
  • Wang, Y.; Ji, X.; Zhou, Z.; Wang, H.; and Li, Z. 2017b. Detecting faces using region-based fully convolutional networks. CoRR.
    Google ScholarLocate open access versionFindings
  • Wang, J.; Yuan, Y.; and Yu, G. 2017. Face attention network: An effective face detector for the occluded faces. CoRR.
    Google ScholarLocate open access versionFindings
  • Yan, J.; Lei, Z.; Wen, L.; and Li, S. Z. 2014a. The fastest deformable part model for object detection. In CVPR.
    Google ScholarFindings
  • Yan, J.; Zhang, X.; Lei, Z.; and Li, S. Z. 2014b. Face detection by structural models. IVC.
    Google ScholarLocate open access versionFindings
  • Yang, S.; Luo, P.; Loy, C. C.; and Tang, X. 2015. From facial parts responses to face detection: A deep learning approach. In ICCV.
    Google ScholarFindings
  • Yang, S.; Luo, P.; Loy, C. C.; and Tang, X. 2016. WIDER FACE: A face detection benchmark. In CVPR.
    Google ScholarFindings
  • Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; and Huang, T. S. 2016. Unitbox: An advanced object detection network. In ACMMM.
    Google ScholarFindings
  • Zhang, K.; Zhang, Z.; Li, Z.; and Qiao, Y. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. SPL.
    Google ScholarLocate open access versionFindings
  • Zhang, S.; Zhu, X.; Lei, Z.; Shi, H.; Wang, X.; and Li, S. Z. 2017a. Faceboxes: A CPU real-time face detector with high accuracy. In IJCB.
    Google ScholarFindings
  • Zhang, S.; Zhu, X.; Lei, Z.; Shi, H.; Wang, X.; and Li, S. Z. 2017b. S3FD: Single shot scale-invariant face detector. In ICCV.
    Google ScholarFindings
  • Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; and Li, S. Z. 2018. Single-shot refinement neural network for object detection. In CVPR.
    Google ScholarFindings
  • Zhu, X., and Ramanan, D. 2012. Face detection, pose estimation, and landmark localization in the wild. In CVPR.
    Google ScholarLocate open access versionFindings
  • Zhu, C.; Tao, R.; Luu, K.; and Savvides, M. 2018. Seeing small faces from robust anchor’s perspective. In CVPR.
    Google ScholarFindings
  • Zitnick, C. L., and Dollar, P. 2014. Edge boxes: Locating object proposals from edges. In ECCV.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments