Do We Really Need to Collect Millions of Faces for Effective Face Recognition?
ECCV, 2016.
EI
Weibo:
Abstract:
Face recognition capabilities have recently made extraordinary leaps. Though this progress is at least partially due to ballooning training set sizes – huge numbers of face images downloaded and labeled for identity – it is not clear if the formidable task of collecting so many images is truly necessary. We propose a far more accessible m...More
Code:
Data:
Introduction
- The recent impact of deep Convolutional Neural Network (CNN) based methods on machine face recognition capabilities has been nothing short of revolutionary.
- The conditions under which faces can be recognized and the numbers of faces which systems can learn to identify improved to the point where some consider machines to be better than humans at this task.
- This remarkable advancement is partially due to the gradual improvement of new network designs which offer better performance.
- Some time later, [24] proposed the VGG-Face representation, trained on 2.6 million faces, and Face++ proposed its Megvii
Highlights
- The recent impact of deep Convolutional Neural Network (CNN) based methods on machine face recognition capabilities has been nothing short of revolutionary
- Alongside developments in network architectures, it is the underlying ability of CNNs to learn from massive training sets that allows these techniques to be so effective
- In [35], a standard CNN was trained by Facebook using 4.4 million labeled faces and shown to achieve what was, at the time, state of the art performance on the Labeled Faces in the Wild (LFW) benchmark [11]
- In order to provide fair comparisons, our CNNs were fine tuned on CASIA subjects that are not included in Janus (Sec. 4.1)
- 4 The results reported in [4] with fine tuning on the training sets include system components not evaluated without fine tuning
- Beyond faces there may be other domains where such approach is relevant and where the introduction of synthetically generated training data can help mitigate the many problems of data collection for CNN training
Methods
- Real Synth Net Acc.
- (%) 100% - EER.
- Fisher Vector Faces [23] DeepFace [35] Fusion [36] FaceNet [29].
- FaceNet + Alignment [29] 200M – 1 99.63.
- VGG Face [24] 2.6M – 1 98.95 Us, no aug.
- Pose, shape, expr.
- (b) Results for methods trained on millions of images
Results
- IJB-A is a new publicly available benchmark released by NIST3 to raise the challenges of unconstrained face identification and verification methods.
- Both IJB-A and the Janus CS2 benchmark share the same subject identities, represented by images viewed in extreme conditions, including pose, expression and illumination variations, with IJB-A splits generally considered more difficult than those in CS2.
- The authors follow the standard protocol for unrestricted, labeled outside data and report the mean classification accuracy as well as the 100%
Conclusion
- The authors show how domain specific data augmentation can be used to generate valuable additional data to train effective face recognition systems, as an alternative to expensive data collection and labeling.
- The underlying idea of domain specific data augmentation can be extended in more ways to provide additional intra subject appearance variations.
- Beyond faces there may be other domains where such approach is relevant and where the introduction of synthetically generated training data can help mitigate the many problems of data collection for CNN training
Summary
Introduction:
The recent impact of deep Convolutional Neural Network (CNN) based methods on machine face recognition capabilities has been nothing short of revolutionary.- The conditions under which faces can be recognized and the numbers of faces which systems can learn to identify improved to the point where some consider machines to be better than humans at this task.
- This remarkable advancement is partially due to the gradual improvement of new network designs which offer better performance.
- Some time later, [24] proposed the VGG-Face representation, trained on 2.6 million faces, and Face++ proposed its Megvii
Methods:
Real Synth Net Acc.- (%) 100% - EER.
- Fisher Vector Faces [23] DeepFace [35] Fusion [36] FaceNet [29].
- FaceNet + Alignment [29] 200M – 1 99.63.
- VGG Face [24] 2.6M – 1 98.95 Us, no aug.
- Pose, shape, expr.
- (b) Results for methods trained on millions of images
Results:
IJB-A is a new publicly available benchmark released by NIST3 to raise the challenges of unconstrained face identification and verification methods.- Both IJB-A and the Janus CS2 benchmark share the same subject identities, represented by images viewed in extreme conditions, including pose, expression and illumination variations, with IJB-A splits generally considered more difficult than those in CS2.
- The authors follow the standard protocol for unrestricted, labeled outside data and report the mean classification accuracy as well as the 100%
Conclusion:
The authors show how domain specific data augmentation can be used to generate valuable additional data to train effective face recognition systems, as an alternative to expensive data collection and labeling.- The underlying idea of domain specific data augmentation can be extended in more ways to provide additional intra subject appearance variations.
- Beyond faces there may be other domains where such approach is relevant and where the introduction of synthetically generated training data can help mitigate the many problems of data collection for CNN training
Tables
- Table1: SoftMax template fusion for score pooling vs. other standard fusion techniques on the IJB-A benchmark for verification (ROC) and identification (CMC) resp
- Table2: Effect of each augmentation on IJB-A performance on verification (ROC) and identification (CMC), resp. Only in-plane aligned images used in these tests
- Table3: Effect of in-plane alignment and pose synthesis at test-time (matching) on IJB-A dataset respectively for verification (ROC) and identification (CMC)
- Table4: Comparative performance analysis on JANUS CS2 and IJB-A respectively for verification (ROC) and identification (CMC). f.t. denotes fine tuning a deep network multiple times for each training split. A network trained once with our augmented data achieves mostly superior results, without this effort
Related work
- Face recognition: Face recognition is one of the central problems in computer vision and, as such, work on this problem is extensive. As with many other computer vision problems, face recognition performances sky rocketed with the introduction of deep learning techniques and in particular CNNs. Though CNNs have been used for face recognition as far back as [17], only when massive amounts of data became available did their performance soar. This was originally demonstrated by the Facebook DeepFace system [35], which used an architecture not unlike the one used by [17], but with over 4 million images used for training they obtained far more impressive results.
Since then, CNN based recognition systems continuously cross performance barriers with some notable examples including the Deep-ID 1-3 systems [34,32,33]. They and many others since, developed and trained their systems using far fewer training images, at the cost of somewhat more elaborate network architectures.
Funding
- This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA 2014-14071600011
- Moreover, we gratefully acknowledge the support of NVIDIA Corporation with the donation of the NVIDIA Titan X GPU used for this research
Reference
- W. AbdAlmageed, Y. Wu, S. Rawls, S. Harel, T. Hassner, I. Masi, J. Choi, J. Leksut, J. Kim, P. Natarajan, R. Nevatia, and G. Medioni. Face recognition using deep multi-pose representations. In Winter Conf. on App. of Comput. Vision, 2016. 13
- T. Baltrusaitis, P. Robinson, and L.-P. Morency. Constrained local neural fields for robust facial landmark detection in the wild. In Proc. Int. Conf. Comput. Vision Workshops, 2013. 5, 11 3.
- Conf., 2014. 4 4. J.-C. Chen, V. M. Patel, and R. Chellappa. Unconstrained face verification using deep cnn features. In Winter Conf. on App. of Comput. Vision, 2016. 4, 13 5. J.-C. Chen, S. Sankaranarayanan, V. M. Patel, and R. Chellappa. Unconstrained face verification using fisher vectors computed from frontalized faces. In Int. Conf.
- on Biometrics: Theory, Applications and Systems, 2015. 13 6.
- Image Processing, 24(3):980–993, 20110 7. D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. Int. Conf. Comput.
- Vision, pages 2650–2658, 2015. 4 8. R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003. 5 9. T. Hassner. Viewing real-world faces in 3d. In Proc. Int. Conf. Comput. Vision, pages 3607–3614, 2013. 4 10.
- T. Hassner, S. Harel, E. Paz, and R. Enbar. Effective face frontalization in unconstrained images. In Proc. Conf. Comput. Vision Pattern Recognition, 2015. 4, 5, 6, 10 11.
- Technical Report 07-49, UMass, Amherst, October 2007. 1, 3, 10, 13 12. I. Kemelmacher-Shlizerman, S. M. Seitz, D. Miller, and E. Brossard. The MegaFace benchmark: 1 million faces for recognition at scale. In Proc. Conf. Comput. Vision
- Pattern Recognition, 2016. 2 13. I. Kemelmacher-Shlizerman, S. Suwajanakorn, and S. M. Seitz. Illumination-aware age progression. In Proc. Conf. Comput. Vision Pattern Recognition, pages 3334–
- 3341. IEEE, 2014. 15 14. Vision Pattern Recognition, pages 1931–1939, 2015. 3, 4, 9, 10, 13 15.
- J. Klontz, B. Klare, S. Klum, E. Taborsky, M. Burge, and A. K. Jain. Open source biometric recognition. In Int. Conf. on Biometrics: Theory, Applications and Systems, 2013. 13 16.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Neural Inform. Process. Syst., pages 1097–1105, 204, 9 17.
- S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back. Face recognition: A convolutional neural-network approach. Trans. Neural Networks, 8(1):98–113, 1997.
- Theory of Blendshape Facial Models. In Eurographics 2014, 207 20.
- H. Li, G. Hua, Z. Lin, J. Brandt, and J. Yang. Probabilistic elastic matching for pose variant face verification. In Proceedings of the IEEE Conference on Computer
- Vision and Pattern Recognition, pages 3499–3506, 2013. 10
- 21. N. McLaughlin, J. Martinez Del Rincon, and P. Miller. Data-augmentation for reducing dataset bias in person re-identification. In Int. Conf. Advanced Video and Signal Based Surveillance. IEEE, 2015. 4
- 22. M. H. Nguyen, J.-F. Lalonde, A. A. Efros, and F. De la Torre. Image-based shaving. Computer Graphics Forum, 27(2):627–635, 2008. 15
- 23. O. M. Parkhi, K. Simonyan, A. Vedaldi, and A. Zisserman. A compact and discriminative face track descriptor. In Proc. Conf. Comput. Vision Pattern Recognition, 2014. 14
- 24. O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In Proc. British Mach. Vision Conf., 2015. 1, 2, 14, 15
- 25. P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter. A 3d face model for pose and illumination invariant face recognition. In Advanced Video and Signal Based Surveillance, 2009. AVSS ’09. Sixth IEEE International Conference on, pages 296–301, Sept 2009. 7
- 26. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision, pages 1–42, 2014. 8
- 27. J. Sanchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the fisher vector: Theory and practice. Int. J. Comput. Vision, 105(3):222–245, 2013. 11
- 28. S. Sankaranarayanan, A. Alavi, and R. Chellappa. Triplet similarity embedding for face verification. arxiv preprint, arXiv:1602.03418, 2016. 13
- 29. F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proc. Conf. Comput. Vision Pattern Recognition, pages 815–823, 2015. 2, 3, 14, 15
- 30. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Int. Conf. on Learning Representations, 2015. 4, 8
- 31. H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. In Proc. Int. Conf. Comput. Vision, pages 945–953, 2015. 12
- 32. Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In Neural Inform. Process. Syst., pages 1988–1996, 2014. 3
- 33. Y. Sun, D. Liang, X. Wang, and X. Tang. Deepid3: Face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873, 2015. 3
- 34. Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In Proc. Conf. Comput. Vision Pattern Recognition. IEEE, 2014. 3
- 35. Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In Proc. Conf. Comput. Vision Pattern Recognition, pages 1701–1708. IEEE, 2014. 1, 2, 3, 4, 5, 6, 14, 15
- 36. Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Web-scale training for face identification. In Proc. Conf. Comput. Vision Pattern Recognition, 2015. 14, 15
- 37. D. Wang, C. Otto, and A. K. Jain. Face search at scale: 80 million gallery. arXiv preprint, arXiv:1507.07242, 2015. 13
- 38. L. Wolf, T. Hassner, and Y. Taigman. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. Trans. Pattern Anal. Mach. Intell., 33(10):1978–1990, 2011. 4
- 39. S. Xie and Z. Tu. Holistically-nested edge detection. In Proc. Int. Conf. Comput. Vision, 2015. 4
- 40. S. Xie, T. Yang, X. Wang, and Y. Lin. Hyper-class augmented and regularized deep learning for fine-grained image classification. In Proc. Conf. Comput. Vision Pattern Recognition, pages 2645–2654, 2015. 4
- 41. Z. Xu, S. Huang, Y. Zhang, and D. Tao. Augmenting strong supervision using web data for fine-grained categorization. In Proc. Int. Conf. Comput. Vision, pages 2524–2532, 2015. 4
- 42. H. Yang and I. Patras. Mirror, mirror on the wall, tell me, is the error small? In Proc. Conf. Comput. Vision Pattern Recognition, 2015. 4
- 43. D. Yi, Z. Lei, and S. Li. Towards pose robust face recognition. In Proc. Conf. Comput. Vision Pattern Recognition, pages 3539–3545, 2013. 10
- 44. D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. arXiv preprint arXiv:1411.7923, 2014. Available: http://www.cbsr.ia.ac.cn/english/ CASIA-WebFace-Database.html.2, 3, 5, 15
- 45. E. Zhou, Z. Cao, and Q. Yin. Naive-deep face recognition: Touching the limit of LFW benchmark or not? arXiv preprint, arXiv:1501.04690, 2015. 2
Full Text
Tags
Comments