Selective Refinement Network for High Performance Face Detection
national conference on artificial intelligence, 2019.
EI
Weibo:
Abstract:
High performance face detection remains a very challenging problem, especially when there exists many tiny faces. This paper presents a novel single-shot face detector, named Selective Refinement Network (SRN), which introduces novel two-step classification and regression operations selectively into an anchor-based face detector to reduce...More
Code:
Data:
Introduction
- Face detection is a long-standing problem in computer vision with extensive applications including face alignment, face analysis, face recognition, etc.
- To further improve the performance of face detection has become a challenging issue.
- There remains room for improvement in two aspects: (a) recall efficiency: number of false positives needs to be reduced at the high recall rates; (b) location accuracy: accuracy of the bounding box location needs to be improved
- These two problems are elaborated as follows
Highlights
- Face detection is a long-standing problem in computer vision with extensive applications including face alignment, face analysis, face recognition, etc
- We investigate the effects of two-step classification and regression on different levels of detection layers and propose a novel face detection framework, named Selective Refinement Network (SRN), which selectively applies two-step classification and regression to specific levels of detection layers
- The network structure of SRN is shown in Figure 2, which consists of two key modules, named as the Selective Two-step Classification (STC) module and the Selective Two-step Regression (STR) module
- We present the precision-recall curves of the proposed SRN method and six state-of-the-art methods and three commercial face detectors (i.e., SkyBiometry, Face++ and Picasa) in Figure 4(b)
- We have presented SRN, a novel single shot face detector, which consists of two key modules, i.e., the STC and the STR
- We achieve state-of-the-art results on AFW, PASCAL face, FDDB, and WIDER FACE datasets
- Extensive experiments on the AFW, PASCAL face, FDDB and WIDER FACE datasets demonstrate that SRN achieves the state-of-the-art detection performance
Methods
- The authors first analyze the proposed method in detail to verify the effectiveness of the contributions.
- The authors conduct a set of ablation experiments on the WIDER FACE dataset to analyze the model in detail.
- The authors use the same parameter settings for all the experiments, except for specified changes to the components.
- All models are trained on the WIDER FACE training set and evaluated on the validation set.
- The results of ablation experiments are listed in Table 1 and some promising conclusions can be drawn as follows
Results
- AFW Dataset
- It consists of 205 images with 473 labeled faces.
- The images in the dataset contain cluttered backgrounds with large variations in both face viewpoint and appearance.
- The authors compare SRN against seven state-ofthe-art methods and three commercial face detectors (i.e., Face.com, Face++ and Picasa).
- PASCAL Face Dataset
- It has 1, 335 labeled faces in 851 images with large face appearance and pose variations.
- The authors present the precision-recall curves of the proposed SRN method and six state-of-the-art methods and three commercial face detectors (i.e., SkyBiometry, Face++ and Picasa) in Figure 4(b).
- SRN achieves the state-of-the-art results by improving 4.99% AP score compared to the second best method STN (Chen et al 2016)
Conclusion
- The authors have presented SRN, a novel single shot face detector, which consists of two key modules, i.e., the STC and the STR.
- The STC uses the first-step classifier to filter out most simple negative anchors from low level detection layers to reduce the search space for the second-step classifier, so as to reduce false positives.
- The STR applies the first-step regressor to coarsely adjust the locations and sizes of anchors from high level detection layers to provide better initialization for the second-step regressor, in order to improve the location accuracy of bounding boxes.
- Extensive experiments on the AFW, PASCAL face, FDDB and WIDER FACE datasets demonstrate that SRN achieves the state-of-the-art detection performance
Summary
Introduction:
Face detection is a long-standing problem in computer vision with extensive applications including face alignment, face analysis, face recognition, etc.- To further improve the performance of face detection has become a challenging issue.
- There remains room for improvement in two aspects: (a) recall efficiency: number of false positives needs to be reduced at the high recall rates; (b) location accuracy: accuracy of the bounding box location needs to be improved
- These two problems are elaborated as follows
Methods:
The authors first analyze the proposed method in detail to verify the effectiveness of the contributions.- The authors conduct a set of ablation experiments on the WIDER FACE dataset to analyze the model in detail.
- The authors use the same parameter settings for all the experiments, except for specified changes to the components.
- All models are trained on the WIDER FACE training set and evaluated on the validation set.
- The results of ablation experiments are listed in Table 1 and some promising conclusions can be drawn as follows
Results:
AFW Dataset- It consists of 205 images with 473 labeled faces.
- The images in the dataset contain cluttered backgrounds with large variations in both face viewpoint and appearance.
- The authors compare SRN against seven state-ofthe-art methods and three commercial face detectors (i.e., Face.com, Face++ and Picasa).
- PASCAL Face Dataset
- It has 1, 335 labeled faces in 851 images with large face appearance and pose variations.
- The authors present the precision-recall curves of the proposed SRN method and six state-of-the-art methods and three commercial face detectors (i.e., SkyBiometry, Face++ and Picasa) in Figure 4(b).
- SRN achieves the state-of-the-art results by improving 4.99% AP score compared to the second best method STN (Chen et al 2016)
Conclusion:
The authors have presented SRN, a novel single shot face detector, which consists of two key modules, i.e., the STC and the STR.- The STC uses the first-step classifier to filter out most simple negative anchors from low level detection layers to reduce the search space for the second-step classifier, so as to reduce false positives.
- The STR applies the first-step regressor to coarsely adjust the locations and sizes of anchors from high level detection layers to provide better initialization for the second-step regressor, in order to improve the location accuracy of bounding boxes.
- Extensive experiments on the AFW, PASCAL face, FDDB and WIDER FACE datasets demonstrate that SRN achieves the state-of-the-art detection performance
Tables
- Table1: Effectiveness of various designs on the AP performance
- Table2: AP performance of the two-step classification applied to each pyramid level
- Table3: Number of false positives at different recall rates
- Table4: AP performance of the two-step regression applied to each pyramid level
- Table5: AP at different IoU thresholds on the WIDER FACE Hard subset
Related work
- Face detection has been a challenging research field since its emergence in the 1990s. Viola and Jones pioneer to use Haar features and AdaBoost to train a face detector with promising accuracy and efficiency (Viola and Jones 2004), which inspires several different approaches afterwards (Liao, Jain, and Li 2016; Brubaker et al 2008). Apart from those, another important job is the introduction of Deformable Part Model (DPM) (Mathias et al 2014; Yan et al 2014a; Zhu and Ramanan 2012).
Recently, face detection has been dominated by the CNNbased methods. CascadeCNN (Li et al 2015) improves detection accuracy by training a serious of interleaved CNN models and following work (Qin et al 2016) proposes to jointly train the cascaded CNNs to realize end-to-end optimization. MTCNN (Zhang et al 2016) proposes a joint face detection and alignment method using multi-task cascaded CNNs. Faceness (Yang et al 2015) formulates face detection as scoring facial parts responses to detect faces under severe occlusion. UnitBox (Yu et al 2016) introduces an IoU loss for bounding box prediction. EMO (Zhu et al 2018) proposes an Expected Max Overlapping score to evaluate the quality of anchor matching. SAFD (Hao et al 2017) develops a scale proposal stage which automatically normalizes face sizes prior to detection. S2AP (Song et al 2018) pays attention to specific scales in image pyramid and valid locations in each scales layer. PCN (Shi et al 2018) proposes a cascade-style structure to rotate faces in a coarse-to-fine manner. Recent work (Bai et al 2018) designs a novel network to directly generate a clear super-resolution face from a blurry small one.
Funding
- The work is supported by the Young Thousand Talents Program, the Natural Science Foundation of China (Grant No.61672519, No.61876178), the research project from Huawei Inc. (Grant No YBN2018065193), and the independent research project of National Laboratory of Pattern Recognition
Reference
- Bai, Y.; Zhang, Y.; Ding, M.; and Ghanem, B. 2018. Finding tiny faces in the wild with generative adversarial network. In CVPR.
- Brubaker, S. C.; Wu, J.; Sun, J.; Mullin, M. D.; and Rehg, J. M. 2008. On the design of cascades of boosted ensembles for face detection. IJCV.
- Cai, Z., and Vasconcelos, N. 2018. Cascade R-CNN: delving into high quality object detection. In CVPR.
- Chen, D.; Hua, G.; Wen, F.; and Sun, J. 2016. Supervised transformer network for efficient face detection. In ECCV.
- Gidaris, S., and Komodakis, N. 201Object detection via a multi-region and semantic segmentation-aware CNN model. In ICCV.
- Girshick, R. B. 2015. Fast R-CNN. In ICCV.
- Hao, Z.; Liu, Y.; Qin, H.; Yan, J.; Li, X.; and Hu, X. 201Scale-aware face detection. In CVPR.
- He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR.
- Howard, A. G. 2013. Some improvements on deep convolutional neural network based image classification. CoRR.
- Hu, P., and Ramanan, D. 2017. Finding tiny faces. In CVPR.
- Jain, V., and Learned-Miller, E. 2010. Fddb: A benchmark for face detection in unconstrained settings. Technical report, University of Massachusetts, Amherst.
- Li, H.; Lin, Z.; Shen, X.; Brandt, J.; and Hua, G. 2015. A convolutional neural network cascade for face detection. In CVPR.
- Liao, S.; Jain, A. K.; and Li, S. Z. 2016. A fast and accurate unconstrained face detector. TPAMI.
- Lin, T.; Maire, M.; Belongie, S. J.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; and Zitnick, C. L. 20Microsoft COCO: common objects in context. In ECCV.
- Lin, T.; Dollar, P.; Girshick, R. B.; He, K.; Hariharan, B.; and Belongie, S. J. 2017a. Feature pyramid networks for object detection. In CVPR.
- Lin, T.; Goyal, P.; Girshick, R. B.; He, K.; and Dollar, P. 2017b. Focal loss for dense object detection. In ICCV.
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S. E.; Fu, C.; and Berg, A. C. 2016. SSD: single shot multibox detector. In ECCV.
- Mathias, M.; Benenson, R.; Pedersoli, M.; and Gool, L. J. V. 2014. Face detection without bells and whistles. In ECCV.
- Najibi, M.; Samangouei, P.; Chellappa, R.; and Davis, L. S. 2017. SSH: single stage headless face detector. In ICCV.
- Paszke, A.; Gross, S.; Chintala, S.; and Chanan, G. 2017. Pytorch.
- Qin, H.; Yan, J.; Li, X.; and Hu, X. 2016. Joint training of cascaded CNN for face detection. In CVPR.
- Ren, S.; He, K.; Girshick, R. B.; and Sun, J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI.
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M. S.; Berg, A. C.; and Li, F. 2015. Imagenet large scale visual recognition challenge. IJCV.
- Shi, X.; Shan, S.; Kan, M.; Wu, S.; and Chen, X. 2018. Real-time rotation-invariant face detection with progressive calibration networks. In CVPR.
- Song, G.; Liu, Y.; Jiang, M.; Wang, Y.; Yan, J.; and Leng, B. 2018. Beyond trade-off: Accelerate fcn-based face detector with higher accuracy. In CVPR.
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S. E.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A. 2015. Going deeper with convolutions. In CVPR.
- Tang, X.; Du, D. K.; He, Z.; and Liu, J. 2018. Pyramidbox: A context-assisted single shot face detector. In ECCV.
- Viola, P. A., and Jones, M. J. 2004. Robust real-time face detection. IJCV.
- Wang, H.; Li, Z.; Ji, X.; and Wang, Y. 2017a. Face r-cnn. CoRR.
- Wang, Y.; Ji, X.; Zhou, Z.; Wang, H.; and Li, Z. 2017b. Detecting faces using region-based fully convolutional networks. CoRR.
- Wang, J.; Yuan, Y.; and Yu, G. 2017. Face attention network: An effective face detector for the occluded faces. CoRR.
- Yan, J.; Lei, Z.; Wen, L.; and Li, S. Z. 2014a. The fastest deformable part model for object detection. In CVPR.
- Yan, J.; Zhang, X.; Lei, Z.; and Li, S. Z. 2014b. Face detection by structural models. IVC.
- Yang, S.; Luo, P.; Loy, C. C.; and Tang, X. 2015. From facial parts responses to face detection: A deep learning approach. In ICCV.
- Yang, S.; Luo, P.; Loy, C. C.; and Tang, X. 2016. WIDER FACE: A face detection benchmark. In CVPR.
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; and Huang, T. S. 2016. Unitbox: An advanced object detection network. In ACMMM.
- Zhang, K.; Zhang, Z.; Li, Z.; and Qiao, Y. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. SPL.
- Zhang, S.; Zhu, X.; Lei, Z.; Shi, H.; Wang, X.; and Li, S. Z. 2017a. Faceboxes: A CPU real-time face detector with high accuracy. In IJCB.
- Zhang, S.; Zhu, X.; Lei, Z.; Shi, H.; Wang, X.; and Li, S. Z. 2017b. S3FD: Single shot scale-invariant face detector. In ICCV.
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; and Li, S. Z. 2018. Single-shot refinement neural network for object detection. In CVPR.
- Zhu, X., and Ramanan, D. 2012. Face detection, pose estimation, and landmark localization in the wild. In CVPR.
- Zhu, C.; Tao, R.; Luu, K.; and Savvides, M. 2018. Seeing small faces from robust anchor’s perspective. In CVPR.
- Zitnick, C. L., and Dollar, P. 2014. Edge boxes: Locating object proposals from edges. In ECCV.
Full Text
Tags
Comments