Face Detection with End-to-End Integration of a ConvNet and a 3D Model

Benyuan Sun
Benyuan Sun
Wen Gao 0001
Wen Gao 0001

ECCV, 2016.

Cited by: 100|Bibtex|Views83
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We have presented a method of end-to-end integration of a ConvNet and a 3D model for face detection in the wild

Abstract:

This paper presents a method for face detection in the wild, which integrates a ConvNet and a 3D mean face model in an end-to-end multi-task discriminative learning framework. The 3D mean face model is predefined and fixed (e.g., we used the one provided in the AFLW dataset). The ConvNet consists of two components: (i) The face proposal c...More

Code:

Data:

0
Introduction
  • 1.1 Motivation and Objective

    Face detection has been used as a core module in a wide spectrum of applications such as surveillance, mobile communication and human-computer interaction.
  • Sun contributed to this work and are joint first authors.
  • Face detection in the wild continues to play an important role in the era of visual big data.
  • It remains a challenging problem in computer vision due to the large appearance variations caused by nuisance variabilities including viewpoints, occlusion, facial expression, resolution, illumination and cosmetics, etc
Highlights
  • 1.1 Motivation and Objective

    Face detection has been used as a core module in a wide spectrum of applications such as surveillance, mobile communication and human-computer interaction
  • To address the two above issues in learning ConvNets for face detection, we propose to integrate a ConvNet and a 3D mean face model in an end-to-end multi-task discriminative learning framework
  • FDDB is a challenge benchmark for face detection in unconstrained environment, which contains the annotations for 5171 faces in a set of 2845 images
  • We have presented a method of end-to-end integration of a ConvNet and a 3D model for face detection in the wild
  • Our method is a clean and straightforward solution when taking into account a 3D model in face detection
  • We expect that we will have a unified model for cars and faces which can achieve state-of-the-art performance, which will be very useful in a lot of practical applications such as surveillance and driveless cars
Methods
  • Method Overview

    Figure 2 illustrates the proposed method. We use 10 facial key-points in this paper, including “LeftEyeLeftCorner”, “RightEyeRightCorner”, “LeftEar”, “NoseLeft”, “NoseRight”, “RightEar”, “MouthLeftCorner”, “MouthRightCorner”, “ChinCenter”, “CenterBetweenEyes” (see an example image in the left-top of Figure 2).
  • The authors assume a 3D mean face model is available and facial key-points are annotated in the training dataset.
  • The authors have multiple types of losses involved in the objective loss function, including classification Softmax loss and location smooth l1-loss [14] of facial key-points, and location smooth l1-loss of face bounding boxes respectively, so the authors formulate the learning of the proposed ConvNet under the multi-task discriminative deep learning framework
Results
  • To show the effectiveness of the method, the authors test the model on two popular face detection benchmarks: FDDB [19] and AFW [41].
  • The results on the FDDB dataset are shown in Figure 4.
  • It’s worth noting that the authors beat all other methods on continuous scores.
  • This is partly caused by the predefined.
  • AFW dataset contains 205 images with faces in various poses and view points.
  • More detection results on both datasets are shown in Figure 7 and Figure 8
Conclusion
  • The authors have presented a method of end-to-end integration of a ConvNet and a 3D model for face detection in the wild.
  • The authors' method is a clean and straightforward solution when taking into account a 3D model in face detection.
  • It addresses two issues in state-of-the-art generic object detection ConvNets: eliminating heuristic design of anchor boxes by leveraging a 3D model, and overcoming generic and predefined RoI pooling by configuration pooling which exploits underlying object configurations.
  • The authors expect that the authors will have a unified model for cars and faces which can achieve state-of-the-art performance, which will be very useful in a lot of practical applications such as surveillance and driveless cars
Summary
  • Introduction:

    1.1 Motivation and Objective

    Face detection has been used as a core module in a wide spectrum of applications such as surveillance, mobile communication and human-computer interaction.
  • Sun contributed to this work and are joint first authors.
  • Face detection in the wild continues to play an important role in the era of visual big data.
  • It remains a challenging problem in computer vision due to the large appearance variations caused by nuisance variabilities including viewpoints, occlusion, facial expression, resolution, illumination and cosmetics, etc
  • Methods:

    Method Overview

    Figure 2 illustrates the proposed method. We use 10 facial key-points in this paper, including “LeftEyeLeftCorner”, “RightEyeRightCorner”, “LeftEar”, “NoseLeft”, “NoseRight”, “RightEar”, “MouthLeftCorner”, “MouthRightCorner”, “ChinCenter”, “CenterBetweenEyes” (see an example image in the left-top of Figure 2).
  • The authors assume a 3D mean face model is available and facial key-points are annotated in the training dataset.
  • The authors have multiple types of losses involved in the objective loss function, including classification Softmax loss and location smooth l1-loss [14] of facial key-points, and location smooth l1-loss of face bounding boxes respectively, so the authors formulate the learning of the proposed ConvNet under the multi-task discriminative deep learning framework
  • Results:

    To show the effectiveness of the method, the authors test the model on two popular face detection benchmarks: FDDB [19] and AFW [41].
  • The results on the FDDB dataset are shown in Figure 4.
  • It’s worth noting that the authors beat all other methods on continuous scores.
  • This is partly caused by the predefined.
  • AFW dataset contains 205 images with faces in various poses and view points.
  • More detection results on both datasets are shown in Figure 7 and Figure 8
  • Conclusion:

    The authors have presented a method of end-to-end integration of a ConvNet and a 3D model for face detection in the wild.
  • The authors' method is a clean and straightforward solution when taking into account a 3D model in face detection.
  • It addresses two issues in state-of-the-art generic object detection ConvNets: eliminating heuristic design of anchor boxes by leveraging a 3D model, and overcoming generic and predefined RoI pooling by configuration pooling which exploits underlying object configurations.
  • The authors expect that the authors will have a unified model for cars and faces which can achieve state-of-the-art performance, which will be very useful in a lot of practical applications such as surveillance and driveless cars
Tables
  • Table1: Classification accuracy of the key-points in the AFLW validation set at the end training
Download tables as Excel
Related work
  • There are a tremendous amount of existing works on face detection or generic object detection. We refer to [40] for a more thorough survey on face detection. We discuss some of the most relevant ones in this section. 1 We use the open source deep learning package, MXNet [5], in our implementation. The full source code is released at https://github.com/tfwu/FaceDetectionConvNet-3D

    In human/animal vision, how the brain distills a representation of objects from retinal input is one of the central challenges for systems neuroscience, and many works have been focused on the ecologically important class of objects– faces. Studies using fMRI experiments in the macaque reveal that faces are represented by a system of six discrete, strongly interconnected regions which illustrates hierarchical information processing in the brain [12], as well as some other results [34]. These findings provide some biologically-plausible evidences for supporting the usage of deep learning based approaches in face detection and analysis.
Funding
  • Wang were supported in part by China 973 Program under Grant no. 2015CB351800, and NSFC-61231010, 61527804, 61421062, 61210005
  • Wu was supported by the ECE startup fund 201473-02119 at NCSU
  • Wu also gratefully acknowledge the support of NVIDIA Corporation with the donation of one GPU
Reference
  • Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: Application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)
    Google ScholarLocate open access versionFindings
  • Barbu, A., Gramajo, G.: Face detection using a 3d model on face keypoints. CoRR abs/1404.3596 (2014)
    Findings
  • Bay, H., Ess, A., Tuytelaars, T., Gool, L.J.V.: Speeded-up robust features (SURF). Computer Vision and Image Understanding 110(3), 346–359 (2008)
    Google ScholarLocate open access versionFindings
  • Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets. In: British Machine Vision Conference (2014)
    Google ScholarLocate open access versionFindings
  • Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., Zhang, Z.: Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274 (2015)
    Findings
  • Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR (2016)
    Google ScholarFindings
  • Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
    Google ScholarFindings
  • Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: Imagenet: A large-scale hierarchical image database. In: CVPR. pp. 248–255 (2009)
    Google ScholarFindings
  • Dollar, P., Appel, R., Belongie, S.J., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)
    Google ScholarLocate open access versionFindings
  • Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
    Google ScholarLocate open access versionFindings
  • Fleuret, F., Geman, D.: Coarse-to-fine face detection. International Journal of Computer Vision 41(1/2), 85–107 (2001)
    Google ScholarLocate open access versionFindings
  • Freiwald, W.A., Tsao, D.Y.: Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science 330(6005), 845–851 (2010)
    Google ScholarLocate open access versionFindings
  • Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Annals of Statistics 28, 2000 (1998)
    Google ScholarLocate open access versionFindings
  • Girshick, R.: Fast R-CNN. In: ICCV (2015)
    Google ScholarFindings
  • Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)
    Google ScholarLocate open access versionFindings
  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
    Google ScholarFindings
  • Hu, W., Zhu, S.: Learning 3d object templates by quantizing geometry and appearance spaces. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1190–1205 (2015)
    Google ScholarLocate open access versionFindings
  • Huang, C., Ai, H., Li, Y., Lao, S.: High-performance rotation invariant multiview face detection. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 671–686 (2007)
    Google ScholarLocate open access versionFindings
  • Jain, V., Learned-Miller, E.: Fddb: A benchmark for face detection in unconstrained settings. Tech. Rep. UM-CS-2010-009, University of Massachusetts, Amherst (2010)
    Google ScholarFindings
  • Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In: First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)
    Google ScholarLocate open access versionFindings
  • Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
    Google ScholarFindings
  • LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
    Google ScholarLocate open access versionFindings
  • Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: CVPR (2015)
    Google ScholarFindings
  • Liebelt, J., Schmid, C.: Multi-view object class detection with a 3d geometric model. In: CVPR (2010)
    Google ScholarFindings
  • Liu, C., Shum, H.: Kullback-leibler boosting. In: CVPR (2003)
    Google ScholarFindings
  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. arXiv preprint arXiv:1512.02325 (2015)
    Findings
  • Mathias, M., Benenson, R., Pedersoli, M., Gool, L.V.: Face detection without bells and whistles. In: ECCV (2014)
    Google ScholarFindings
  • Mita, T., Kaneko, T., Hori, O.: Joint haar-like features for face detection. In: ICCV Payet, N., Todorovic, S.: From contours to 3d object detection and pose estimation. In: ICCV (2011)
    Google ScholarLocate open access versionFindings
  • 30. Ranjan, R., Patel, V.M., Chellappa, R.: A deep pyramid deformable part model for face detection. In: IEEE 7th International Conference on Biometrics Theory, Applications and Systems (2015)
    Google ScholarLocate open access versionFindings
  • 31. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR (2016)
    Google ScholarFindings
  • 32. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS (2015)
    Google ScholarFindings
  • 33. Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)
    Google ScholarLocate open access versionFindings
  • 34. Sinha, P., Balas, B., Ostrovsky, Y., Russell, R.: Face recognition by humans: 19 results all computer vision researchers should know about. Proceedings of the IEEE 94(11), 1948–1962 (2006)
    Google ScholarLocate open access versionFindings
  • 35. Su, H., Sun, M., Li, F., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: ICCV (2009)
    Google ScholarFindings
  • 36. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: Closing the gap to humanlevel performance in face verification. In: CVPR (2014)
    Google ScholarFindings
  • 37. Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. International Journal of Computer Vision 104(2), 154–171 (2013)
    Google ScholarLocate open access versionFindings
  • 38. Viola, P.A., Jones, M.J.: Robust real-time face detection. International Journal of Computer Vision 57(2), 137–154 (2004)
    Google ScholarLocate open access versionFindings
  • 39. Yang, S., Luo, P., Loy, C.C., Tang, X.: From facial parts responses to face detection: A deep learning approach. In: ICCV (2015)
    Google ScholarFindings
  • 40. Zafeiriou, S., Zhang, C., Zhang, Z.: A survey on face detection in the wild. Comput. Vis. Image Underst. 138(C), 1–24 (Sep 2015) 41. Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: CVPR (2012)
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments