Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks

IEEE Signal Processing Letters, pp. 1499-1503, 2016.

Cited by: 1599|Bibtex|Views124|DOI:https://doi.org/10.1109/LSP.2016.2603342
EI WOS
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
Experimental results demonstrate that our methods consistently outperform the state-of-the-art methods across several challenging benchmarks while keeping real time performance

Abstract:

Face detection and alignment in unconstrained environment are challenging due to various poses, illuminations, and occlusions. Recent studies show that deep learning approaches can achieve impressive performance on these two tasks. In this letter, we propose a deep cascaded multitask framework that exploits the inherent correlation betwee...More

Code:

Data:

0
Introduction
  • FACE detection and alignment are essential to many face applications, such as face recognition and facial expression analysis.
  • Besides the cascade structure, [5, 6, 7] introduce deformable part models (DPM) for face detection and achieve remarkable performance.
  • They need high computational expense and may usually require expensive annotation in the training stage.
Highlights
  • FACE detection and alignment are essential to many face applications, such as face recognition and facial expression analysis
  • The major contributions of this paper are summarized as follows: (1) We propose a new cascaded convolutional neural networks (CNNs) based framework for joint face detection and alignment, and carefully
  • To evaluate the contribution of joint detection and alignment, we evaluate the performances of two different O-Nets on Face Detection Data Set and Benchmark (FDDB)
  • We have proposed a multi-task cascaded CNNs based framework for joint face detection and alignment
  • Experimental results demonstrate that our methods consistently outperform the state-of-the-art methods across several challenging benchmarks while keeping real time performance
Methods
  • The authors first evaluate the effectiveness of the proposed hard sample mining strategy.
  • The authors compare the face detector and alignment against the state-of-the-art methods in Face Detection Data Set and Benchmark (FDDB) [25], WIDER FACE [24], and Annotated Facial Landmarks in the Wild (AFLW) benchmark [8].
  • FDDB dataset contains the annotations for 5,171 faces in a set of 2,845 images.
  • AFLW contains the facial landmarks annotations for 24,386 faces and the authors use the same test subset as [22].
  • The authors evaluate the computational efficiency of the face detector
Results
  • The authors propose a new online hard sample mining strategy that can improve the performance automatically without manual sample selection.
  • The authors' method achieves superior accuracy over the state-of-the-art techniques on the challenging FDDB and WIDER FACE benchmark for face detection, and AFLW benchmark for face alignment, while keeps real time performance.
  • Experimental results demonstrate that the methods consistently outperform the state-of-the-art methods across several challenging benchmarks while keeping real time performance
Conclusion
  • The authors have proposed a multi-task cascaded CNNs based framework for joint face detection and alignment.
  • Experimental results demonstrate that the methods consistently outperform the state-of-the-art methods across several challenging benchmarks while keeping real time performance.
  • The authors will exploit the inherent correlation between face detection and other face analysis tasks, to further improve the performance
Summary
  • Introduction:

    FACE detection and alignment are essential to many face applications, such as face recognition and facial expression analysis.
  • Besides the cascade structure, [5, 6, 7] introduce deformable part models (DPM) for face detection and achieve remarkable performance.
  • They need high computational expense and may usually require expensive annotation in the training stage.
  • Objectives:

    Stage 3: This stage is similar to the second stage, but in this stage the authors aim to describe the face in more details.
  • Methods:

    The authors first evaluate the effectiveness of the proposed hard sample mining strategy.
  • The authors compare the face detector and alignment against the state-of-the-art methods in Face Detection Data Set and Benchmark (FDDB) [25], WIDER FACE [24], and Annotated Facial Landmarks in the Wild (AFLW) benchmark [8].
  • FDDB dataset contains the annotations for 5,171 faces in a set of 2,845 images.
  • AFLW contains the facial landmarks annotations for 24,386 faces and the authors use the same test subset as [22].
  • The authors evaluate the computational efficiency of the face detector
  • Results:

    The authors propose a new online hard sample mining strategy that can improve the performance automatically without manual sample selection.
  • The authors' method achieves superior accuracy over the state-of-the-art techniques on the challenging FDDB and WIDER FACE benchmark for face detection, and AFLW benchmark for face alignment, while keeps real time performance.
  • Experimental results demonstrate that the methods consistently outperform the state-of-the-art methods across several challenging benchmarks while keeping real time performance
  • Conclusion:

    The authors have proposed a multi-task cascaded CNNs based framework for joint face detection and alignment.
  • Experimental results demonstrate that the methods consistently outperform the state-of-the-art methods across several challenging benchmarks while keeping real time performance.
  • The authors will exploit the inherent correlation between face detection and other face analysis tasks, to further improve the performance
Tables
  • Table1: COMPARISON OF SPEED AND VALIDATION ACCURACY OF OUR CNNS AND
Download tables as Excel
Funding
  • In the learning process, we propose a new online hard sample mining strategy that can improve the performance automatically without manual sample selection
  • Our method achieves superior accuracy over the state-of-the-art techniques on the challenging FDDB and WIDER FACE benchmark for face detection, and AFLW benchmark for face alignment, while keeps real time performance
  • Experimental results demonstrate that our methods consistently outperform the state-of-the-art methods across several challenging benchmarks (including FDDB and WIDER FACE benchmarks for face detection, and AFLW benchmark for face alignment) while keeping real time performance
Reference
  • B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Aggregate channel eatures for multi-view face detection,” in IEEE International Joint Conference on Biometrics, 2014, pp. 1-8.
    Google ScholarLocate open access versionFindings
  • P. Viola and M. J. Jones, “Robust real-time face detection. International part model for face detection,” in IEEE International Conference on Biojournal of computer vision,” vol. 57, no. 2, pp. 137-154, 2004 metrics Theory, Applications and Systems, 2015, pp. 1-8.
    Google ScholarLocate open access versionFindings
  • M. T. Pham, Y. Gao, V. D. D. Hoang, and T. J. Cham, “Fast polygonal
    Google ScholarFindings
  • [28] G. Ghiasi, and C. C. Fowlkes, “Occlusion Coherence: Detecting and integration and its application in extending haar-like features to improve
    Google ScholarFindings
  • [29] S. S. Farfade, M. J. Saberian, and L. J. Li, “Multi-view face detection using Recognition, 2010, pp. 942-949.
    Google ScholarFindings
  • [4] Q. Zhu, M. C. Yeh, K. T. Cheng, and S. Avidan, “Fast human detection on Multimedia Retrieval, 2015, pp. 643-650. Conference on Computer Vision and Pattern Recognition, 2006, pp.
    Google ScholarLocate open access versionFindings
  • [5] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool, “Face detection without bells and whistles,” in European Conference on Computer Vision, 2014, pp. 720-735.
    Google ScholarLocate open access versionFindings
  • [6] J. Yan, Z. Lei, L. Wen, and S. Li, “The fastest deformable part model for object detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2497-2504.
    Google ScholarLocate open access versionFindings
  • [7] X. Zhu, and D. Ramanan, “Face detection, pose estimation, and landmark localization in the wild,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2879-2886.
    Google ScholarLocate open access versionFindings
  • [8] M. Köstinger, P. Wohlhart, P. M. Roth, and H. Bischof, “Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2011, pp. 2144-2151.
    Google ScholarLocate open access versionFindings
  • [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097-1105.
    Google ScholarFindings
  • [10] Y. Sun, Y. Chen, X. Wang, and X. Tang, “Deep learning face representation by joint identification-verification,” in Advances in Neural Information Processing Systems, 2014, pp. 1988-1996.
    Google ScholarLocate open access versionFindings
  • [11] S. Yang, P. Luo, C. C. Loy, and X. Tang, “From facial parts responses to face detection: A deep learning approach,” in IEEE International Conference on Computer Vision, 2015, pp. 3676-3684.
    Google ScholarLocate open access versionFindings
  • [12] X. P. Burgos-Artizzu, P. Perona, and P. Dollar, “Robust face landmark estimation under occlusion,” in IEEE International Conference on Computer Vision, 2013, pp. 1513-1520.
    Google ScholarLocate open access versionFindings
  • [13] X. Cao, Y. Wei, F. Wen, and J. Sun, “Face alignment by explicit shape regression,” International Journal of Computer Vision, vol 107, no. 2, pp.
    Google ScholarLocate open access versionFindings
  • [14] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001.
    Google ScholarLocate open access versionFindings
  • [15] X. Yu, J. Huang, S. Zhang, W. Yan, and D. Metaxas, “Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model,” in IEEE International Conference on Computer Vision, 2013, pp. 1944-1951.
    Google ScholarLocate open access versionFindings
  • [16] J. Zhang, S. Shan, M. Kan, and X. Chen, “Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment,” in European Conference on Computer Vision, 2014, pp. 1-16.
    Google ScholarLocate open access versionFindings
  • [17] Luxand Incorporated: Luxand face SDK, http://www.luxand.com/
    Findings
  • [18] D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun, “Joint cascade face detection and alignment,” in European Conference on Computer Vision, 2014, pp.
    Google ScholarLocate open access versionFindings
  • [19] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neural network cascade for face detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5325-5334.
    Google ScholarLocate open access versionFindings
  • [20] C. Zhang, and Z. Zhang, “Improving multiview face detection with multi-task deep convolutional neural networks,” IEEE Winter Conference on Applications of Computer Vision, 2014, pp. 1036-1041.
    Google ScholarLocate open access versionFindings
  • [21] X. Xiong, and F. Torre, “Supervised descent method and its applications to face alignment,” in IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 532-539.
    Google ScholarLocate open access versionFindings
  • [22] Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Facial landmark detection by deep multi-task learning,” in European Conference on Computer Vision, 2014, pp. 94-108.
    Google ScholarLocate open access versionFindings
  • [23] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in IEEE International Conference on Computer Vision, 2015, pp.
    Google ScholarLocate open access versionFindings
  • [24] S. Yang, P. Luo, C. C. Loy, and X. Tang, “WIDER FACE: A Face Detection Benchmark”. arXiv preprint arXiv:1511.06523.
    Findings
  • [25] V. Jain, and E. G. Learned-Miller, “FDDB: A benchmark for face detection in unconstrained settings,” Technical Report UMCS-2010-009, University of Massachusetts, Amherst, 2010.
    Google ScholarFindings
  • [26] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Convolutional channel features,” in IEEE International Conference on Computer Vision, 2015, pp. 82-90.
    Google ScholarLocate open access versionFindings
  • [27] R. Ranjan, V. M. Patel, and R. Chellappa, “A deep pyramid deformable
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments