Joint Training of Cascaded CNN for Face Detection

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), pp. 3456-3465, 2016.

Cited by: 177|Bibtex|Views83|DOI:https://doi.org/10.1109/CVPR.2016.376
EI SCOPUS WOS
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
We show that the back propagation algorithm used in training CNN can be naturally used in training CNN cascade

Abstract:

Cascade has been widely used in face detection, where classifier with low computation cost can be firstly used to shrink most of the background while keeping the recall. The cascade in detection is popularized by seminal Viola-Jones framework and then widely used in other pipelines, such as DPM and CNN. However, to our best knowledge, mos...More

Code:

Data:

0
Introduction
  • Face detection plays an important role in face based image analysis and is one of the fundamental problems in computer vision.
  • Recent works in face detection focus on faces in uncontrolled setting, which is challenging due to the variations in subject level, category level and image level.
  • The number of detected faces N always vary in different images.
  • Considering that the bi can possibly appear in any scale and position, the face detection problem has a output space of size (w∗h)2 2
Highlights
  • Face detection plays an important role in face based image analysis and is one of the fundamental problems in computer vision
  • We show that the back propagation algorithm used in training CNN can be naturally used in training CNN cascade
  • In training joint cascaded CNNs, we use Annotated Facial Landmarks in the Wild (AFLW) [15] and our dataset called 3R. 3R contains about 26000 images that have faces and 27000 images that have no faces. 3R is collected from conv layers
  • We have presented joint training as a novel way of training cascaded CNNs
  • We evaluate joint training on face detection datasets
  • Joint training can extend to general cascaded CNNs, and we show how to jointly train region proposal network and fast R-CNN as an example
Methods
  • The authors carry out experiments on face detection dataset to evaluate the joint training pipeline. 6.1.
  • The authors carry out experiments on face detection dataset to evaluate the joint training pipeline.
  • In training joint cascaded CNNs, the authors use Annotated Facial Landmarks in the Wild (AFLW) [15] and the dataset called 3R.
  • The authors use images in PASCAL VOC2012 [4] that do not contain persons as background image.
  • The dataset contain 47211 images with 82987 faces and about 32000 background images.
Results
  • The joint training result get a recall of 88.2% (1000 false positives), which is comparative with the state-of-the-art.
  • This is better than Cascaded CNN result (85.7%) reported in [16]
Conclusion
  • The authors have presented joint training as a novel way of training cascaded CNNs. By joint training, CNN cascade can achieve end-to-end optimization.
  • By jointly optimizing cascaded stages, the whole network get improved performance with smaller models for sharing convolutions.
  • The authors evaluate joint training on face detection datasets.
  • Joint training can extend to general cascaded CNNs, and the authors show how to jointly train RPN and fast R-CNN as an example
Summary
  • Introduction:

    Face detection plays an important role in face based image analysis and is one of the fundamental problems in computer vision.
  • Recent works in face detection focus on faces in uncontrolled setting, which is challenging due to the variations in subject level, category level and image level.
  • The number of detected faces N always vary in different images.
  • Considering that the bi can possibly appear in any scale and position, the face detection problem has a output space of size (w∗h)2 2
  • Methods:

    The authors carry out experiments on face detection dataset to evaluate the joint training pipeline. 6.1.
  • The authors carry out experiments on face detection dataset to evaluate the joint training pipeline.
  • In training joint cascaded CNNs, the authors use Annotated Facial Landmarks in the Wild (AFLW) [15] and the dataset called 3R.
  • The authors use images in PASCAL VOC2012 [4] that do not contain persons as background image.
  • The dataset contain 47211 images with 82987 faces and about 32000 background images.
  • Results:

    The joint training result get a recall of 88.2% (1000 false positives), which is comparative with the state-of-the-art.
  • This is better than Cascaded CNN result (85.7%) reported in [16]
  • Conclusion:

    The authors have presented joint training as a novel way of training cascaded CNNs. By joint training, CNN cascade can achieve end-to-end optimization.
  • By jointly optimizing cascaded stages, the whole network get improved performance with smaller models for sharing convolutions.
  • The authors evaluate joint training on face detection datasets.
  • Joint training can extend to general cascaded CNNs, and the authors show how to jointly train RPN and fast R-CNN as an example
Tables
  • Table1: Comparison of training methods of RPN + F-RCNN
Download tables as Excel
Related work
  • Numerous works have been proposed for face detection and some of them have been delivered to real applications. Similar to many other computer vision tasks, leading algorithms in face detection are based on convolutional neural network in the 1990s, then based on hand-craft feature and model, and recently based on convolutional neural network again. In this part, we briefly review the three kinds of methods and refer more detailed survey to [33, 37, 35].

    2.1. Early CNN based methods

    Face detection, as well as MNIST OCR recognition, are two tasks where CNN based approach achieve success in 1990s. In [26], CNN is used in a sliding window manner to traverse different locations and scales and classify faces from the background. In [22], CNN is used for frontal face detection and shows quite good performance. In [23], CNNs trained on faces from different poses are used for rotation invariant face detection. These methods are quite similar to modern CNN methods and get relatively good performance on easy datasets.
Funding
  • This work was partly supported by National Natural Science Foundation of China (Grant No 71171121), National 863 High Technology Research and Development Program of China (Grant No 2012AA09A408), and Shenzhen Science and Technology Project (Grant No JCYJ20151117173236192 and No CXZZ20140902110505864)
Reference
  • L. Bourdev and J. Brandt. Robust object detection via soft cascade. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 2, pages 236–243. IEEE, 2005. 2
    Google ScholarLocate open access versionFindings
  • D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun. Joint cascade face detection and alignment. In Computer Vision–ECCV 2014, pages 109–12Springer, 2014. 2
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. IEEE, 2009. 4
    Google ScholarLocate open access versionFindings
  • M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascalnetwork.org/challenges/VOC/voc2012/workshop/index.html.6
    Locate open access versionFindings
  • S. S. Farfade, M. Saberian, and L.-J. Li. Multi-view face detection using deep convolutional neural networks. arXiv preprint arXiv:1502.02766, 2012
    Findings
  • P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Cascade object detection with deformable part models. In Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pages 2241–2248. IEEE, 2010. 1
    Google ScholarLocate open access versionFindings
  • P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(9):1627–1645, 2010. 1, 2
    Google ScholarLocate open access versionFindings
  • G. Ghiasi and C. C. Fowlkes. Occlusion coherence: Detecting and localizing occluded faces. arXiv preprint arXiv:1506.08347, 2015. 2
    Findings
  • R. Girshick. Fast r-cnn. arXiv preprint arXiv:1504.08083, 2015. 3
    Findings
  • R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 580–587. IEEE, 2014. 3
    Google ScholarLocate open access versionFindings
  • C. Huang, H. Ai, Y. Li, and S. Lao. High-performance rotation invariant multiview face detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(4):671– 686, 2007. 2
    Google ScholarLocate open access versionFindings
  • L. Huang, Y. Yang, Y. Deng, and Y. Yu. Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874, 2015. 2
    Findings
  • V. Jain and E. G. Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. UMass Amherst Technical Report, 2010. 2, 6
    Google ScholarFindings
  • M. Jones and P. Viola. Fast multi-view face detection. Mitsubishi Electric Research Lab TR-20003-96, 3:14, 2003. 2
    Google ScholarLocate open access versionFindings
  • M. Kostinger, P. Wohlhart, P. M. Roth, and H. Bischof. Annotated facial landmarks in the wild: A large-scale, realworld database for facial landmark localization. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 2144–2151. IEEE, 2011. 5
    Google ScholarLocate open access versionFindings
  • H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua. A convolutional neural network cascade for face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5325–5334, 2015. 1, 2, 3, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • S. Z. Li, L. Zhu, Z. Zhang, A. Blake, H. Zhang, and H. Shum. Statistical learning of multi-view face detection. In Computer VisionECCV 2002, pages 67–81. Springer, 2002. 2
    Google ScholarLocate open access versionFindings
  • M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detection without bells and whistles. In Computer Vision–ECCV 2014, pages 720–735. Springer, 2014. 2, 6, 7
    Google ScholarLocate open access versionFindings
  • E. Osuna, R. Freund, and F. Girosi. Training support vector machines: an application to face detection. In Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on, pages 130– 136. IEEE, 1997. 2
    Google ScholarLocate open access versionFindings
  • R. Ranjan, V. M. Patel, and R. Chellappa. A deep pyramid deformable part model for face detection. arXiv preprint arXiv:1508.04389, 2015. 2
    Findings
  • S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497, 2015. 2, 3, 8
    Findings
  • H. Rowley, S. Baluja, T. Kanade, et al. Neural network-based face detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20(1):23–38, 1998. 2
    Google ScholarLocate open access versionFindings
  • H. Rowley, S. Baluja, T. Kanade, et al. Rotation invariant neural network-based face detection. In Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on, pages 38–44. IEEE, 1998. 2
    Google ScholarLocate open access versionFindings
  • H. Schneiderman and T. Kanade. A statistical method for 3d object detection applied to faces and cars. In Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on, volume 1, pages 746–751. IEEE, 2000. 2
    Google ScholarLocate open access versionFindings
  • X. Shen, Z. Lin, J. Brandt, and Y. Wu. Detecting and aligning faces by image retrieval. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3460–3467. IEEE, 2013. 7
    Google ScholarLocate open access versionFindings
  • R. Vaillant, C. Monrocq, and Y. Le Cun. Original approach for the localisation of objects in images. IEE ProceedingsVision, Image and Signal Processing, 141(4):245–250, 1994. 2
    Google ScholarLocate open access versionFindings
  • P. Viola and M. J. Jones. Robust real-time face detection. International journal of computer vision, 57(2):137–154, 2004. 1, 2
    Google ScholarLocate open access versionFindings
  • R. Xiao, L. Zhu, and H.-J. Zhang. Boosting chain learning for object detection. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pages 709– 715. IEEE, 2003. 2
    Google ScholarLocate open access versionFindings
  • J. Yan, Z. Lei, L. Wen, and S. Z. Li. The fastest deformable part model for object detection. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 2497–2504. IEEE, 2014. 2
    Google ScholarLocate open access versionFindings
  • J. Yan, X. Zhang, Z. Lei, and S. Z. Li. Face detection by structural models. Image and Vision Computing, 32(10):790–799, 2014. 2, 7
    Google ScholarLocate open access versionFindings
  • B. Yang, J. Yan, Z. Lei, and S. Z. Li. Aggregate channel features for multi-view face detection. In Biometrics (IJCB), 2014 IEEE International Joint Conference on, pages 1–8. IEEE, 2014. 2
    Google ScholarLocate open access versionFindings
  • B. Yang, J. Yan, Z. Lei, and S. Z. Li. Convolutional channel features for pedestrian, face and edge detection. arXiv preprint arXiv:1504.07339, 2015. 2
    Findings
  • M.-H. Yang, D. J. Kriegman, and N. Ahuja. Detecting faces in images: A survey. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(1):34–58, 2002. 2
    Google ScholarLocate open access versionFindings
  • S. Yang, P. Luo, C. C. Loy, and X. Tang. From facial parts responses to face detection: A deep learning approach. arXiv preprint arXiv:1509.06451, 2015. 2, 7, 8
    Findings
  • S. Zafeiriou, C. Zhang, and Z. Zhang. A survey on face detection in the wild: past, present and future. Computer Vision and Image Understanding, 2015. 2
    Google ScholarLocate open access versionFindings
  • C. Zhang, J. C. Platt, and P. A. Viola. Multiple instance boosting for object detection. In Advances in neural information processing systems, pages 1417–1424, 2005. 2
    Google ScholarLocate open access versionFindings
  • C. Zhang and Z. Zhang. A survey of recent advances in face detection. Technical report, Tech. rep., Microsoft Research, 2010. 2
    Google ScholarFindings
  • L. Zhang, R. Chu, S. Xiang, S. Liao, and S. Z. Li. Face detection based on multi-block lbp representation. In Advances in biometrics, pages 11–18. Springer, 2007. 2
    Google ScholarLocate open access versionFindings
  • X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2879–2886. IEEE, 2012. 2, 6, 7
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments