Multi-view Face Detection Using Deep Convolutional Neural Networks

ICMR, 2015.

Cited by: 454|Bibtex|Views176|DOI:https://doi.org/10.1145/2671188.2749408
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|dl.acm.org|arxiv.org
Weibo:
We proposed a face detection method based on deep learning, called Deep Dense Face Detector

Abstract:

In this paper we consider the problem of multi-view face detection. While there has been significant research on this problem, current state-of-the-art approaches for this task require annotation of facial landmarks, e.g. TSM [25], or annotation of face poses [28, 22]. They also require training dozens of models to fully capture faces in ...More

Code:

Data:

0
Introduction
  • With the wide spread use of smartphones and fast mobile networks, millions of photos are uploaded everyday to the cloud storages such as Dropbox or social networks such as Facebook, Twitter, Instagram, Google+, and Flicker.
  • Users commonly look for photos that were taken at a particular location, at a particular time, or with a particular friend.
  • The last query, i.e. contextual query, is more challenging as there is no explicit signal about the identities of people in the photos.
  • The key for this identification is the detection of human faces.
  • This has made low complexity, rapid and accurate face detection an essential component for cloud based photo sharing/storage platforms
Highlights
  • With the wide spread use of smartphones and fast mobile networks, millions of photos are uploaded everyday to the cloud storages such as Dropbox or social networks such as Facebook, Twitter, Instagram, Google+, and Flicker
  • We propose a method based on deep learning, called Deep Dense Face Detector (DDFD), that does not require pose/landmark annotation and is able to detect faces in a wide range of orientations using a single model
  • We show that DDFD can achieve similar or better performance even without using pose annotation or information about facial landmarks
  • We proposed a face detection method based on deep learning, called Deep Dense Face Detector (DDFD)
  • DDFD is independent of common modules in recent deep learning object detection methods such as bounding-box regression, SVM, or image segmentation
  • We showed that our detector is able to achieve similar or better results even without using pose annotation or information about facial landmarks
Methods
  • The authors provide details of the algorithm and training process of the proposed face detector, called Deep Dense Face Detector (DDFD).
  • The authors start by fine-tuning AlexNet [19] for face detection
  • For this the authors extracted training examples from the AFLW dataset [21], which consists of 21K images with 24K face annotations.
  • The authors randomly flipped these training examples
  • This resulted in a total number of 200K positive and and 20 millions negative training examples.
  • For fine-tuning, the authors used 50K iterations and batch size of 128 images, where each batch contained 32 positive and 96 negative examples
Results
  • To increase the number of positive examples, the authors randomly sampled sub-windows of the images and used them as positive examples if they had more than a 50% IOU with the ground truth.
  • The authors removed all windows with score less than 90% of the maximum score of that cluster
Conclusion
  • The authors proposed a face detection method based on deep learning, called Deep Dense Face Detector (DDFD).
  • The proposed method does not require pose/landmark annotation and is able to detect faces in a wide range of ori-.
  • DDFD is independent of common modules in recent deep learning object detection methods such as bounding-box regression, SVM, or image segmentation.
  • The authors showed that the detector is able to achieve similar or better results even without using pose annotation or information about facial landmarks.
  • In future the authors are planning to use better sampling strategies and more sophisticated data augmentation techniques to further improve performance of the proposed method for detecting occluded and rotated faces
Summary
  • Introduction:

    With the wide spread use of smartphones and fast mobile networks, millions of photos are uploaded everyday to the cloud storages such as Dropbox or social networks such as Facebook, Twitter, Instagram, Google+, and Flicker.
  • Users commonly look for photos that were taken at a particular location, at a particular time, or with a particular friend.
  • The last query, i.e. contextual query, is more challenging as there is no explicit signal about the identities of people in the photos.
  • The key for this identification is the detection of human faces.
  • This has made low complexity, rapid and accurate face detection an essential component for cloud based photo sharing/storage platforms
  • Methods:

    The authors provide details of the algorithm and training process of the proposed face detector, called Deep Dense Face Detector (DDFD).
  • The authors start by fine-tuning AlexNet [19] for face detection
  • For this the authors extracted training examples from the AFLW dataset [21], which consists of 21K images with 24K face annotations.
  • The authors randomly flipped these training examples
  • This resulted in a total number of 200K positive and and 20 millions negative training examples.
  • For fine-tuning, the authors used 50K iterations and batch size of 128 images, where each batch contained 32 positive and 96 negative examples
  • Results:

    To increase the number of positive examples, the authors randomly sampled sub-windows of the images and used them as positive examples if they had more than a 50% IOU with the ground truth.
  • The authors removed all windows with score less than 90% of the maximum score of that cluster
  • Conclusion:

    The authors proposed a face detection method based on deep learning, called Deep Dense Face Detector (DDFD).
  • The proposed method does not require pose/landmark annotation and is able to detect faces in a wide range of ori-.
  • DDFD is independent of common modules in recent deep learning object detection methods such as bounding-box regression, SVM, or image segmentation.
  • The authors showed that the detector is able to achieve similar or better results even without using pose annotation or information about facial landmarks.
  • In future the authors are planning to use better sampling strategies and more sophisticated data augmentation techniques to further improve performance of the proposed method for detecting occluded and rotated faces
Funding
  • To increase the number of positive examples, we randomly sampled sub-windows of the images and used them as positive examples if they had more than a 50% IOU (intersection over union) with the ground truth
  • Within each cluster, we then removed all windows with score less than 90% of the maximum score of that cluster
Reference
  • L. Bourdev and J. Brandt. Robust object detection via soft cascade. In Proceedings of CVPR, 2005.
    Google ScholarLocate open access versionFindings
  • P. Dollar, R. Appel, S. Belongie, and P. Perona. Fast feature pyramids for object detection. IEEE
    Google ScholarFindings
  • P. Dollar, Z. Tu, P. Perona, and S. Belongie. Integral channel features. In Proceedings of the British Machine Vision Conference, 2009.
    Google ScholarLocate open access versionFindings
  • D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. 2014.
    Google ScholarFindings
  • P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In Proceedings of CVPR, 2008.
    Google ScholarLocate open access versionFindings
  • C. Garcia and M. Delakis. A Neural Architecture for Fast and Robust Face Detection. In Proceedings of IEEE-IAPR International Conference on Pattern Recognition, Aug. 2002.
    Google ScholarLocate open access versionFindings
  • C. Garcia and M. Delakis. Training Convolutional Filters for Robust Face Detection. In Proceedings of IEEE International Workshop of Neural Networks for Signal Processing, Sept. 2003.
    Google ScholarLocate open access versionFindings
  • C. Garcia and M. Delakis. Convolutional face finder: a neural architecture for fast and robust face detection.
    Google ScholarFindings
  • R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of CVPR, 2014.
    Google ScholarLocate open access versionFindings
  • R. B. Girshick, F. N. Iandola, T. Darrell, and J. Malik. Deformable part models are convolutional neural networks. CoRR, 2014.
    Google ScholarFindings
  • P. E. Hadjidoukas, V. V. Dimakopoulos, M. Delakis, and C. Garcia. A high-performance face detection system using openmp. Concurrency and Computation: Practice and Experience, 2009.
    Google ScholarLocate open access versionFindings
  • C. Huang, H. Ai, Y. Li, and S. Lao. Vector boosting for rotation invariant multi-view face detection. In Proceedings of ICCV, 2005.
    Google ScholarLocate open access versionFindings
  • C. Huang, H. Ai, Y. Li, and S. Lao. High-performance rotation invariant multiview face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007.
    Google ScholarLocate open access versionFindings
  • F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869, 2014.
    Findings
  • V. Jain and E. Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst, 2010.
    Google ScholarFindings
  • Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
    Findings
  • Y. L. Jonathan J. Tompson, Arjun Jain and C. Bregler. Joint training of a convolutional network and a graphical model for human pose estimation. In Proceedings of NIPS, 2014.
    Google ScholarLocate open access versionFindings
  • S. R. Kaiming He, Xiangyu Zhang and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of ECCV, 2014.
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of NIPS, 2012.
    Google ScholarLocate open access versionFindings
  • S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of CVPR, 2006.
    Google ScholarLocate open access versionFindings
  • [22] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detection without bells and whistles. In Proceedings of ECCV, 2014.
    Google ScholarLocate open access versionFindings
  • [23] M. Osadchy, Y. L. Cun, and M. L. Miller. Synergistic face detection and pose estimation with energy-based model. In Proceedings of NIPS, 2005.
    Google ScholarLocate open access versionFindings
  • [24] R. Osadchy, M. Miller, and Y. LeCun. Synergistic face detection and pose estimation with energy-based model. In Proceedings of NIPS, 2004.
    Google ScholarLocate open access versionFindings
  • [26] S. Roux, F. Mamalet, and C. Garcia. Embedded convolutional face finder. In Proceedings of IEEE International Conference on Multimedia and Expo, 2006.
    Google ScholarLocate open access versionFindings
  • [27] H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. In Proceedings of CVPR, 1996.
    Google ScholarLocate open access versionFindings
  • [28] M. Saberian and N. Vasconcelos. Multi-resolution cascades for multiclass object detection. In Proceedings of NIPS. 2014.
    Google ScholarLocate open access versionFindings
  • [29] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proceedings of International Conference on Learning Representations, 2014.
    Google ScholarLocate open access versionFindings
  • [30] Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In Proceedings of NIPS. 2014.
    Google ScholarLocate open access versionFindings
  • [31] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CoRR, 2014.
    Google ScholarFindings
  • [32] C. Szegedy, S. Reed, D. Erhan, and D. Anguelov. Scalable, high-quality object detection. CoRR, 2014.
    Google ScholarFindings
  • [33] C. Szegedy, S. Reed, D. Erhan, and D. Anguelov. Scalable, high-quality object detection. CoRR, 2014.
    Google ScholarFindings
  • [34] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of CVPR, 2014.
    Google ScholarLocate open access versionFindings
  • [35] A. Torralba, K. Murphy, and W. Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007.
    Google ScholarLocate open access versionFindings
  • [36] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. International Journal of Computer Vision, 2013.
    Google ScholarLocate open access versionFindings
  • [37] R. Vaillant, C. Monrocq, and Y. LeCun. An original approach for the localisation of objects in images. In Proceedings of International Conference on Artificial Neural Networks, 1993.
    Google ScholarLocate open access versionFindings
  • [38] R. Vaillant, C. Monrocq, and Y. LeCun. Original approach for the localisation of objects in images. IEE Proc on Vision, Image, and Signal Processing, 1994.
    Google ScholarLocate open access versionFindings
  • [39] M. Viola and P. Viola. Fast multi-view face detection. In Proceedings of CVPR, 2003.
    Google ScholarLocate open access versionFindings
  • [40] P. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision, 2004.
    Google ScholarLocate open access versionFindings
  • [41] B. Wu, H. Ai, C. Huang, and S. Lao. Fast rotation invariant multi-view face detection based on real adaboost. In IEEE International Conference on Automatic Face and Gesture Recognition, 2004.
    Google ScholarLocate open access versionFindings
  • [42] J. Yan, X. Zhang, Z. Lei, and S. Li. Face detection by structural models.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments