Do We Really Need to Collect Millions of Faces for Effective Face Recognition?

ECCV, 2016.

Cited by: 277|Bibtex|Views115
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
In order to provide fair comparisons, our Convolutional Neural Network were fine tuned on CASIA subjects that are not included in Janus

Abstract:

Face recognition capabilities have recently made extraordinary leaps. Though this progress is at least partially due to ballooning training set sizes – huge numbers of face images downloaded and labeled for identity – it is not clear if the formidable task of collecting so many images is truly necessary. We propose a far more accessible m...More

Code:

Data:

0
Introduction
  • The recent impact of deep Convolutional Neural Network (CNN) based methods on machine face recognition capabilities has been nothing short of revolutionary.
  • The conditions under which faces can be recognized and the numbers of faces which systems can learn to identify improved to the point where some consider machines to be better than humans at this task.
  • This remarkable advancement is partially due to the gradual improvement of new network designs which offer better performance.
  • Some time later, [24] proposed the VGG-Face representation, trained on 2.6 million faces, and Face++ proposed its Megvii
Highlights
  • The recent impact of deep Convolutional Neural Network (CNN) based methods on machine face recognition capabilities has been nothing short of revolutionary
  • Alongside developments in network architectures, it is the underlying ability of CNNs to learn from massive training sets that allows these techniques to be so effective
  • In [35], a standard CNN was trained by Facebook using 4.4 million labeled faces and shown to achieve what was, at the time, state of the art performance on the Labeled Faces in the Wild (LFW) benchmark [11]
  • In order to provide fair comparisons, our CNNs were fine tuned on CASIA subjects that are not included in Janus (Sec. 4.1)
  • 4 The results reported in [4] with fine tuning on the training sets include system components not evaluated without fine tuning
  • Beyond faces there may be other domains where such approach is relevant and where the introduction of synthetically generated training data can help mitigate the many problems of data collection for CNN training
Methods
  • Real Synth Net Acc.
  • (%) 100% - EER.
  • Fisher Vector Faces [23] DeepFace [35] Fusion [36] FaceNet [29].
  • FaceNet + Alignment [29] 200M – 1 99.63.
  • VGG Face [24] 2.6M – 1 98.95 Us, no aug.
  • Pose, shape, expr.
  • (b) Results for methods trained on millions of images
Results
  • IJB-A is a new publicly available benchmark released by NIST3 to raise the challenges of unconstrained face identification and verification methods.
  • Both IJB-A and the Janus CS2 benchmark share the same subject identities, represented by images viewed in extreme conditions, including pose, expression and illumination variations, with IJB-A splits generally considered more difficult than those in CS2.
  • The authors follow the standard protocol for unrestricted, labeled outside data and report the mean classification accuracy as well as the 100%
Conclusion
  • The authors show how domain specific data augmentation can be used to generate valuable additional data to train effective face recognition systems, as an alternative to expensive data collection and labeling.
  • The underlying idea of domain specific data augmentation can be extended in more ways to provide additional intra subject appearance variations.
  • Beyond faces there may be other domains where such approach is relevant and where the introduction of synthetically generated training data can help mitigate the many problems of data collection for CNN training
Summary
  • Introduction:

    The recent impact of deep Convolutional Neural Network (CNN) based methods on machine face recognition capabilities has been nothing short of revolutionary.
  • The conditions under which faces can be recognized and the numbers of faces which systems can learn to identify improved to the point where some consider machines to be better than humans at this task.
  • This remarkable advancement is partially due to the gradual improvement of new network designs which offer better performance.
  • Some time later, [24] proposed the VGG-Face representation, trained on 2.6 million faces, and Face++ proposed its Megvii
  • Methods:

    Real Synth Net Acc.
  • (%) 100% - EER.
  • Fisher Vector Faces [23] DeepFace [35] Fusion [36] FaceNet [29].
  • FaceNet + Alignment [29] 200M – 1 99.63.
  • VGG Face [24] 2.6M – 1 98.95 Us, no aug.
  • Pose, shape, expr.
  • (b) Results for methods trained on millions of images
  • Results:

    IJB-A is a new publicly available benchmark released by NIST3 to raise the challenges of unconstrained face identification and verification methods.
  • Both IJB-A and the Janus CS2 benchmark share the same subject identities, represented by images viewed in extreme conditions, including pose, expression and illumination variations, with IJB-A splits generally considered more difficult than those in CS2.
  • The authors follow the standard protocol for unrestricted, labeled outside data and report the mean classification accuracy as well as the 100%
  • Conclusion:

    The authors show how domain specific data augmentation can be used to generate valuable additional data to train effective face recognition systems, as an alternative to expensive data collection and labeling.
  • The underlying idea of domain specific data augmentation can be extended in more ways to provide additional intra subject appearance variations.
  • Beyond faces there may be other domains where such approach is relevant and where the introduction of synthetically generated training data can help mitigate the many problems of data collection for CNN training
Tables
  • Table1: SoftMax template fusion for score pooling vs. other standard fusion techniques on the IJB-A benchmark for verification (ROC) and identification (CMC) resp
  • Table2: Effect of each augmentation on IJB-A performance on verification (ROC) and identification (CMC), resp. Only in-plane aligned images used in these tests
  • Table3: Effect of in-plane alignment and pose synthesis at test-time (matching) on IJB-A dataset respectively for verification (ROC) and identification (CMC)
  • Table4: Comparative performance analysis on JANUS CS2 and IJB-A respectively for verification (ROC) and identification (CMC). f.t. denotes fine tuning a deep network multiple times for each training split. A network trained once with our augmented data achieves mostly superior results, without this effort
Download tables as Excel
Related work
  • Face recognition: Face recognition is one of the central problems in computer vision and, as such, work on this problem is extensive. As with many other computer vision problems, face recognition performances sky rocketed with the introduction of deep learning techniques and in particular CNNs. Though CNNs have been used for face recognition as far back as [17], only when massive amounts of data became available did their performance soar. This was originally demonstrated by the Facebook DeepFace system [35], which used an architecture not unlike the one used by [17], but with over 4 million images used for training they obtained far more impressive results.

    Since then, CNN based recognition systems continuously cross performance barriers with some notable examples including the Deep-ID 1-3 systems [34,32,33]. They and many others since, developed and trained their systems using far fewer training images, at the cost of somewhat more elaborate network architectures.
Funding
  • This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA 2014-14071600011
  • Moreover, we gratefully acknowledge the support of NVIDIA Corporation with the donation of the NVIDIA Titan X GPU used for this research
Reference
  • W. AbdAlmageed, Y. Wu, S. Rawls, S. Harel, T. Hassner, I. Masi, J. Choi, J. Leksut, J. Kim, P. Natarajan, R. Nevatia, and G. Medioni. Face recognition using deep multi-pose representations. In Winter Conf. on App. of Comput. Vision, 2016. 13
    Google ScholarLocate open access versionFindings
  • T. Baltrusaitis, P. Robinson, and L.-P. Morency. Constrained local neural fields for robust facial landmark detection in the wild. In Proc. Int. Conf. Comput. Vision Workshops, 2013. 5, 11 3.
    Google ScholarLocate open access versionFindings
  • Conf., 2014. 4 4. J.-C. Chen, V. M. Patel, and R. Chellappa. Unconstrained face verification using deep cnn features. In Winter Conf. on App. of Comput. Vision, 2016. 4, 13 5. J.-C. Chen, S. Sankaranarayanan, V. M. Patel, and R. Chellappa. Unconstrained face verification using fisher vectors computed from frontalized faces. In Int. Conf.
    Google ScholarLocate open access versionFindings
  • on Biometrics: Theory, Applications and Systems, 2015. 13 6.
    Google ScholarLocate open access versionFindings
  • Image Processing, 24(3):980–993, 20110 7. D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. Int. Conf. Comput.
    Google ScholarLocate open access versionFindings
  • Vision, pages 2650–2658, 2015. 4 8. R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003. 5 9. T. Hassner. Viewing real-world faces in 3d. In Proc. Int. Conf. Comput. Vision, pages 3607–3614, 2013. 4 10.
    Google ScholarLocate open access versionFindings
  • T. Hassner, S. Harel, E. Paz, and R. Enbar. Effective face frontalization in unconstrained images. In Proc. Conf. Comput. Vision Pattern Recognition, 2015. 4, 5, 6, 10 11.
    Google ScholarLocate open access versionFindings
  • Technical Report 07-49, UMass, Amherst, October 2007. 1, 3, 10, 13 12. I. Kemelmacher-Shlizerman, S. M. Seitz, D. Miller, and E. Brossard. The MegaFace benchmark: 1 million faces for recognition at scale. In Proc. Conf. Comput. Vision
    Google ScholarLocate open access versionFindings
  • Pattern Recognition, 2016. 2 13. I. Kemelmacher-Shlizerman, S. Suwajanakorn, and S. M. Seitz. Illumination-aware age progression. In Proc. Conf. Comput. Vision Pattern Recognition, pages 3334–
    Google ScholarLocate open access versionFindings
  • 3341. IEEE, 2014. 15 14. Vision Pattern Recognition, pages 1931–1939, 2015. 3, 4, 9, 10, 13 15.
    Google ScholarLocate open access versionFindings
  • J. Klontz, B. Klare, S. Klum, E. Taborsky, M. Burge, and A. K. Jain. Open source biometric recognition. In Int. Conf. on Biometrics: Theory, Applications and Systems, 2013. 13 16.
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Neural Inform. Process. Syst., pages 1097–1105, 204, 9 17.
    Google ScholarLocate open access versionFindings
  • S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back. Face recognition: A convolutional neural-network approach. Trans. Neural Networks, 8(1):98–113, 1997.
    Google ScholarLocate open access versionFindings
  • Theory of Blendshape Facial Models. In Eurographics 2014, 207 20.
    Google ScholarLocate open access versionFindings
  • H. Li, G. Hua, Z. Lin, J. Brandt, and J. Yang. Probabilistic elastic matching for pose variant face verification. In Proceedings of the IEEE Conference on Computer
    Google ScholarLocate open access versionFindings
  • Vision and Pattern Recognition, pages 3499–3506, 2013. 10
    Google ScholarFindings
  • 21. N. McLaughlin, J. Martinez Del Rincon, and P. Miller. Data-augmentation for reducing dataset bias in person re-identification. In Int. Conf. Advanced Video and Signal Based Surveillance. IEEE, 2015. 4
    Google ScholarLocate open access versionFindings
  • 22. M. H. Nguyen, J.-F. Lalonde, A. A. Efros, and F. De la Torre. Image-based shaving. Computer Graphics Forum, 27(2):627–635, 2008. 15
    Google ScholarLocate open access versionFindings
  • 23. O. M. Parkhi, K. Simonyan, A. Vedaldi, and A. Zisserman. A compact and discriminative face track descriptor. In Proc. Conf. Comput. Vision Pattern Recognition, 2014. 14
    Google ScholarLocate open access versionFindings
  • 24. O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In Proc. British Mach. Vision Conf., 2015. 1, 2, 14, 15
    Google ScholarLocate open access versionFindings
  • 25. P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter. A 3d face model for pose and illumination invariant face recognition. In Advanced Video and Signal Based Surveillance, 2009. AVSS ’09. Sixth IEEE International Conference on, pages 296–301, Sept 2009. 7
    Google ScholarLocate open access versionFindings
  • 26. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision, pages 1–42, 2014. 8
    Google ScholarLocate open access versionFindings
  • 27. J. Sanchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the fisher vector: Theory and practice. Int. J. Comput. Vision, 105(3):222–245, 2013. 11
    Google ScholarLocate open access versionFindings
  • 28. S. Sankaranarayanan, A. Alavi, and R. Chellappa. Triplet similarity embedding for face verification. arxiv preprint, arXiv:1602.03418, 2016. 13
    Findings
  • 29. F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proc. Conf. Comput. Vision Pattern Recognition, pages 815–823, 2015. 2, 3, 14, 15
    Google ScholarLocate open access versionFindings
  • 30. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Int. Conf. on Learning Representations, 2015. 4, 8
    Google ScholarLocate open access versionFindings
  • 31. H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. In Proc. Int. Conf. Comput. Vision, pages 945–953, 2015. 12
    Google ScholarLocate open access versionFindings
  • 32. Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In Neural Inform. Process. Syst., pages 1988–1996, 2014. 3
    Google ScholarLocate open access versionFindings
  • 33. Y. Sun, D. Liang, X. Wang, and X. Tang. Deepid3: Face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873, 2015. 3
    Findings
  • 34. Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In Proc. Conf. Comput. Vision Pattern Recognition. IEEE, 2014. 3
    Google ScholarLocate open access versionFindings
  • 35. Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In Proc. Conf. Comput. Vision Pattern Recognition, pages 1701–1708. IEEE, 2014. 1, 2, 3, 4, 5, 6, 14, 15
    Google ScholarLocate open access versionFindings
  • 36. Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Web-scale training for face identification. In Proc. Conf. Comput. Vision Pattern Recognition, 2015. 14, 15
    Google ScholarLocate open access versionFindings
  • 37. D. Wang, C. Otto, and A. K. Jain. Face search at scale: 80 million gallery. arXiv preprint, arXiv:1507.07242, 2015. 13
    Findings
  • 38. L. Wolf, T. Hassner, and Y. Taigman. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. Trans. Pattern Anal. Mach. Intell., 33(10):1978–1990, 2011. 4
    Google ScholarLocate open access versionFindings
  • 39. S. Xie and Z. Tu. Holistically-nested edge detection. In Proc. Int. Conf. Comput. Vision, 2015. 4
    Google ScholarLocate open access versionFindings
  • 40. S. Xie, T. Yang, X. Wang, and Y. Lin. Hyper-class augmented and regularized deep learning for fine-grained image classification. In Proc. Conf. Comput. Vision Pattern Recognition, pages 2645–2654, 2015. 4
    Google ScholarLocate open access versionFindings
  • 41. Z. Xu, S. Huang, Y. Zhang, and D. Tao. Augmenting strong supervision using web data for fine-grained categorization. In Proc. Int. Conf. Comput. Vision, pages 2524–2532, 2015. 4
    Google ScholarLocate open access versionFindings
  • 42. H. Yang and I. Patras. Mirror, mirror on the wall, tell me, is the error small? In Proc. Conf. Comput. Vision Pattern Recognition, 2015. 4
    Google ScholarLocate open access versionFindings
  • 43. D. Yi, Z. Lei, and S. Li. Towards pose robust face recognition. In Proc. Conf. Comput. Vision Pattern Recognition, pages 3539–3545, 2013. 10
    Google ScholarLocate open access versionFindings
  • 44. D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. arXiv preprint arXiv:1411.7923, 2014. Available: http://www.cbsr.ia.ac.cn/english/ CASIA-WebFace-Database.html.2, 3, 5, 15
    Findings
  • 45. E. Zhou, Z. Cao, and Q. Yin. Naive-deep face recognition: Touching the limit of LFW benchmark or not? arXiv preprint, arXiv:1501.04690, 2015. 2
    Findings
Full Text
Your rating :
0

 

Tags
Comments