Multi-view Face Detection Using Deep Convolutional Neural Networks
ICMR, 2015.
EI
Weibo:
Abstract:
In this paper we consider the problem of multi-view face detection. While there has been significant research on this problem, current state-of-the-art approaches for this task require annotation of facial landmarks, e.g. TSM [25], or annotation of face poses [28, 22]. They also require training dozens of models to fully capture faces in ...More
Code:
Data:
Introduction
- With the wide spread use of smartphones and fast mobile networks, millions of photos are uploaded everyday to the cloud storages such as Dropbox or social networks such as Facebook, Twitter, Instagram, Google+, and Flicker.
- Users commonly look for photos that were taken at a particular location, at a particular time, or with a particular friend.
- The last query, i.e. contextual query, is more challenging as there is no explicit signal about the identities of people in the photos.
- The key for this identification is the detection of human faces.
- This has made low complexity, rapid and accurate face detection an essential component for cloud based photo sharing/storage platforms
Highlights
- With the wide spread use of smartphones and fast mobile networks, millions of photos are uploaded everyday to the cloud storages such as Dropbox or social networks such as Facebook, Twitter, Instagram, Google+, and Flicker
- We propose a method based on deep learning, called Deep Dense Face Detector (DDFD), that does not require pose/landmark annotation and is able to detect faces in a wide range of orientations using a single model
- We show that DDFD can achieve similar or better performance even without using pose annotation or information about facial landmarks
- We proposed a face detection method based on deep learning, called Deep Dense Face Detector (DDFD)
- DDFD is independent of common modules in recent deep learning object detection methods such as bounding-box regression, SVM, or image segmentation
- We showed that our detector is able to achieve similar or better results even without using pose annotation or information about facial landmarks
Methods
- The authors provide details of the algorithm and training process of the proposed face detector, called Deep Dense Face Detector (DDFD).
- The authors start by fine-tuning AlexNet [19] for face detection
- For this the authors extracted training examples from the AFLW dataset [21], which consists of 21K images with 24K face annotations.
- The authors randomly flipped these training examples
- This resulted in a total number of 200K positive and and 20 millions negative training examples.
- For fine-tuning, the authors used 50K iterations and batch size of 128 images, where each batch contained 32 positive and 96 negative examples
Results
- To increase the number of positive examples, the authors randomly sampled sub-windows of the images and used them as positive examples if they had more than a 50% IOU with the ground truth.
- The authors removed all windows with score less than 90% of the maximum score of that cluster
Conclusion
- The authors proposed a face detection method based on deep learning, called Deep Dense Face Detector (DDFD).
- The proposed method does not require pose/landmark annotation and is able to detect faces in a wide range of ori-.
- DDFD is independent of common modules in recent deep learning object detection methods such as bounding-box regression, SVM, or image segmentation.
- The authors showed that the detector is able to achieve similar or better results even without using pose annotation or information about facial landmarks.
- In future the authors are planning to use better sampling strategies and more sophisticated data augmentation techniques to further improve performance of the proposed method for detecting occluded and rotated faces
Summary
Introduction:
With the wide spread use of smartphones and fast mobile networks, millions of photos are uploaded everyday to the cloud storages such as Dropbox or social networks such as Facebook, Twitter, Instagram, Google+, and Flicker.- Users commonly look for photos that were taken at a particular location, at a particular time, or with a particular friend.
- The last query, i.e. contextual query, is more challenging as there is no explicit signal about the identities of people in the photos.
- The key for this identification is the detection of human faces.
- This has made low complexity, rapid and accurate face detection an essential component for cloud based photo sharing/storage platforms
Methods:
The authors provide details of the algorithm and training process of the proposed face detector, called Deep Dense Face Detector (DDFD).- The authors start by fine-tuning AlexNet [19] for face detection
- For this the authors extracted training examples from the AFLW dataset [21], which consists of 21K images with 24K face annotations.
- The authors randomly flipped these training examples
- This resulted in a total number of 200K positive and and 20 millions negative training examples.
- For fine-tuning, the authors used 50K iterations and batch size of 128 images, where each batch contained 32 positive and 96 negative examples
Results:
To increase the number of positive examples, the authors randomly sampled sub-windows of the images and used them as positive examples if they had more than a 50% IOU with the ground truth.- The authors removed all windows with score less than 90% of the maximum score of that cluster
Conclusion:
The authors proposed a face detection method based on deep learning, called Deep Dense Face Detector (DDFD).- The proposed method does not require pose/landmark annotation and is able to detect faces in a wide range of ori-.
- DDFD is independent of common modules in recent deep learning object detection methods such as bounding-box regression, SVM, or image segmentation.
- The authors showed that the detector is able to achieve similar or better results even without using pose annotation or information about facial landmarks.
- In future the authors are planning to use better sampling strategies and more sophisticated data augmentation techniques to further improve performance of the proposed method for detecting occluded and rotated faces
Funding
- To increase the number of positive examples, we randomly sampled sub-windows of the images and used them as positive examples if they had more than a 50% IOU (intersection over union) with the ground truth
- Within each cluster, we then removed all windows with score less than 90% of the maximum score of that cluster
Reference
- L. Bourdev and J. Brandt. Robust object detection via soft cascade. In Proceedings of CVPR, 2005.
- P. Dollar, R. Appel, S. Belongie, and P. Perona. Fast feature pyramids for object detection. IEEE
- P. Dollar, Z. Tu, P. Perona, and S. Belongie. Integral channel features. In Proceedings of the British Machine Vision Conference, 2009.
- D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. 2014.
- P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In Proceedings of CVPR, 2008.
- C. Garcia and M. Delakis. A Neural Architecture for Fast and Robust Face Detection. In Proceedings of IEEE-IAPR International Conference on Pattern Recognition, Aug. 2002.
- C. Garcia and M. Delakis. Training Convolutional Filters for Robust Face Detection. In Proceedings of IEEE International Workshop of Neural Networks for Signal Processing, Sept. 2003.
- C. Garcia and M. Delakis. Convolutional face finder: a neural architecture for fast and robust face detection.
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of CVPR, 2014.
- R. B. Girshick, F. N. Iandola, T. Darrell, and J. Malik. Deformable part models are convolutional neural networks. CoRR, 2014.
- P. E. Hadjidoukas, V. V. Dimakopoulos, M. Delakis, and C. Garcia. A high-performance face detection system using openmp. Concurrency and Computation: Practice and Experience, 2009.
- C. Huang, H. Ai, Y. Li, and S. Lao. Vector boosting for rotation invariant multi-view face detection. In Proceedings of ICCV, 2005.
- C. Huang, H. Ai, Y. Li, and S. Lao. High-performance rotation invariant multiview face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007.
- F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869, 2014.
- V. Jain and E. Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst, 2010.
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
- Y. L. Jonathan J. Tompson, Arjun Jain and C. Bregler. Joint training of a convolutional network and a graphical model for human pose estimation. In Proceedings of NIPS, 2014.
- S. R. Kaiming He, Xiangyu Zhang and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of ECCV, 2014.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of NIPS, 2012.
- S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of CVPR, 2006.
- [22] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detection without bells and whistles. In Proceedings of ECCV, 2014.
- [23] M. Osadchy, Y. L. Cun, and M. L. Miller. Synergistic face detection and pose estimation with energy-based model. In Proceedings of NIPS, 2005.
- [24] R. Osadchy, M. Miller, and Y. LeCun. Synergistic face detection and pose estimation with energy-based model. In Proceedings of NIPS, 2004.
- [26] S. Roux, F. Mamalet, and C. Garcia. Embedded convolutional face finder. In Proceedings of IEEE International Conference on Multimedia and Expo, 2006.
- [27] H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. In Proceedings of CVPR, 1996.
- [28] M. Saberian and N. Vasconcelos. Multi-resolution cascades for multiclass object detection. In Proceedings of NIPS. 2014.
- [29] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proceedings of International Conference on Learning Representations, 2014.
- [30] Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In Proceedings of NIPS. 2014.
- [31] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CoRR, 2014.
- [32] C. Szegedy, S. Reed, D. Erhan, and D. Anguelov. Scalable, high-quality object detection. CoRR, 2014.
- [33] C. Szegedy, S. Reed, D. Erhan, and D. Anguelov. Scalable, high-quality object detection. CoRR, 2014.
- [34] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of CVPR, 2014.
- [35] A. Torralba, K. Murphy, and W. Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007.
- [36] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. International Journal of Computer Vision, 2013.
- [37] R. Vaillant, C. Monrocq, and Y. LeCun. An original approach for the localisation of objects in images. In Proceedings of International Conference on Artificial Neural Networks, 1993.
- [38] R. Vaillant, C. Monrocq, and Y. LeCun. Original approach for the localisation of objects in images. IEE Proc on Vision, Image, and Signal Processing, 1994.
- [39] M. Viola and P. Viola. Fast multi-view face detection. In Proceedings of CVPR, 2003.
- [40] P. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision, 2004.
- [41] B. Wu, H. Ai, C. Huang, and S. Lao. Fast rotation invariant multi-view face detection based on real adaboost. In IEEE International Conference on Automatic Face and Gesture Recognition, 2004.
- [42] J. Yan, X. Zhang, Z. Lei, and S. Li. Face detection by structural models.
Full Text
Tags
Comments