SphereFace: Deep Hypersphere Embedding for Face Recognition

CVPR, pp. 6738-6746, 2017.

Cited by: 1061|Bibtex|Views175|DOI:https://doi.org/10.1109/CVPR.2017.713
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We propose the angular softmax loss for convolutional neural networks to learn discriminative face features with angular margin

Abstract:

This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the an...More

Code:

Data:

0
Introduction
  • Recent years have witnessed the great success of convolutional neural networks (CNNs) in face recognition (FR).
  • Face recognition can be categorized as face identification and face verification [8, 11]
  • The former classifies a face to a specific identity, while the latter determines whether a pair of faces belongs to the same identity.
  • It is natural to classify testing face images to the given identities
  • In this scenario, face verification is equivalent to performing identification for a pair of faces respectively.
  • For open-set protocol, the testing identities are usually disjoint
Highlights
  • Recent years have witnessed the great success of convolutional neural networks (CNNs) in face recognition (FR)
  • Face recognition can be categorized as face identification and face verification [8, 11]
  • Our major contributions can be summarized as follows: (1) We propose A-Softmax loss for CNNs to learn discriminative face features with clear and novel geometric interpretation
  • This paper presents a novel deep hypersphere embedding approach for face recognition
  • We propose the angular softmax loss for CNNs to learn discriminative face features (SphereFace) with angular margin
  • A-Softmax loss renders nice geometric interpretation by constraining learned features to be discriminative on a hypersphere manifold, which intrinsically matches the prior that faces lie on a non-linear manifold
Results
  • A-Softmax loss greatly improve the verification accuracy from 98.20% to 99.42% on LFW, and from 93.4% to 95.0% on YTF.
  • Compared to the models trained on large dataset (500 million for Google and 18 million for NTechLAB), the method still performs better (0.64% for id.
  • Rate and 1.4% for veri.
  • Compared to the models trained on large dataset (500 million for Google and 18 million for NTechLAB), the method still performs better (0.64% for id. rate and 1.4% for veri. rate)
Conclusion
  • Concluding Remarks

    This paper presents a novel deep hypersphere embedding approach for face recognition.
  • A-Softmax loss renders nice geometric interpretation by constraining learned features to be discriminative on a hypersphere manifold, which intrinsically matches the prior that faces lie on a non-linear manifold.
  • This connection makes A-Softmax very effective for learning face representation.
  • Competitive results on several popular face benchmarks demonstrate the superiority and great potentials of the approach
Summary
  • Introduction:

    Recent years have witnessed the great success of convolutional neural networks (CNNs) in face recognition (FR).
  • Face recognition can be categorized as face identification and face verification [8, 11]
  • The former classifies a face to a specific identity, while the latter determines whether a pair of faces belongs to the same identity.
  • It is natural to classify testing face images to the given identities
  • In this scenario, face verification is equivalent to performing identification for a pair of faces respectively.
  • For open-set protocol, the testing identities are usually disjoint
  • Results:

    A-Softmax loss greatly improve the verification accuracy from 98.20% to 99.42% on LFW, and from 93.4% to 95.0% on YTF.
  • Compared to the models trained on large dataset (500 million for Google and 18 million for NTechLAB), the method still performs better (0.64% for id.
  • Rate and 1.4% for veri.
  • Compared to the models trained on large dataset (500 million for Google and 18 million for NTechLAB), the method still performs better (0.64% for id. rate and 1.4% for veri. rate)
  • Conclusion:

    Concluding Remarks

    This paper presents a novel deep hypersphere embedding approach for face recognition.
  • A-Softmax loss renders nice geometric interpretation by constraining learned features to be discriminative on a hypersphere manifold, which intrinsically matches the prior that faces lie on a non-linear manifold.
  • This connection makes A-Softmax very effective for learning face representation.
  • Competitive results on several popular face benchmarks demonstrate the superiority and great potentials of the approach
Tables
  • Table1: Comparison of decision boundaries in binary case. Note that, θi is the angle between Wi and x
  • Table2: Our CNN architectures with different convolutional layers. Conv1.x, Conv2.x and Conv3.x denote convolution units that may contain multiple convolution layers and residual units are shown in double-column brackets. E.g., [3×3, 64]×4 denotes 4 cascaded convolution layers with 64 filters of size 3×3, and S2 denotes stride 2. FC1 is the fully connected layer
  • Table3: Accuracy(%) comparison of different m (A-Softmax loss) and original softmax loss on LFW and YTF dataset
  • Table4: Accuracy (%) on LFW and YTF dataset. * denotes the outside data is private (not publicly available). For fair comparison, all loss functions (including ours) we implemented use 64-layer CNN architecture in Table 2
  • Table5: Performance (%) on MegaFace challenge. “Rank-1 Acc.” indicates rank-1 identification accuracy with 1M distractors, and “Ver.” indicates verification TAR for 10−6 FAR. TAR and FAR denote True Accept Rate and False Accept Rate respectively. For fair comparison, all loss functions
Download tables as Excel
Related work
  • Metric learning. Metric learning aims to learn a similarity (distance) function. Traditional metric learning [36, 33, 12, 38] usually learns a matrix A for a distance metric x1 − x2 A = (x1 − x2)T A(x1 − x2) upon the given features x1, x2. Recently, prevailing deep metric learning [7, 17, 24, 30, 25, 22, 34] usually uses neural networks to automatically learn discriminative features x1, x2 followed by a simple distance metric such as Euclidean distance x1 − x2 2. Most widely used loss functions for deep metric learning are contrastive loss [1, 3] and triplet loss [32, 22, 6], and both impose Euclidean margin to features.

    Deep face recognition. Deep face recognition is arguably one of the most active research area in the past few years. [30, 26] address the open-set FR using CNNs supervised by softmax loss, which essentially treats open-set FR as a multi-class classification problem. [25] combines contrastive loss and softmax loss to jointly supervise the CNN training, greatly boosting the performance. [22] uses triplet loss to learn a unified face embedding. Training on nearly 200 million face images, they achieve current state-of-the-art FR accuracy. Inspired by linear discriminant analysis, [34] proposes center loss for CNNs and also obtains promising performance. In general, current well-performing CNNs [28, 15] for FR are mostly built on either contrastive loss or triplet loss. One could notice that state-of-the-art FR methods usually adopt ideas (e.g. contrastive loss, triplet loss) from metric learning, showing open-set FR could be well addressed by discriminative metric learning.
Funding
  • The work was funded by NSFC (61401524), NSFGD (2014A030313123), NSFGZ (201605121423270)
Study subjects and analysis
individuals: 6
To show that larger m leads to larger angular margin (i.e. more discriminative feature distribution on manifold), we perform a toy example with different m. We train A-Softmax loss with 6 individuals that have the most samples in CASIA-WebFace. We set the output feature dimension (FC1) as 3 and visualize the training samples in Fig. 5

face pairs: 6000
We follow the unrestricted with labeled outside data protocol [8] on both datasets. The performance of SphereFace are evaluated on 6,000 face pairs from LFW and 5,000 video pairs from YTF. The results are given in Table 4

Reference
  • S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In CVPR, 2005. 3
    Google ScholarLocate open access versionFindings
  • C. Ding and D. Tao. Robust face recognition via multimodal deep face representation. IEEE TMM, 17(11):2049–2058, 2015. 7
    Google ScholarLocate open access versionFindings
  • R. Hadsell, S. Chopra, and Y. LeCun. Dimensionality reduction by learning an invariant mapping. In CVPR, 2006. 2, 3
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 1, 6
    Google ScholarLocate open access versionFindings
  • X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang. Face recognition using laplacianfaces. TPAMI, 27(3):328–340, 2002
    Google ScholarLocate open access versionFindings
  • E. Hoffer and N. Ailon. Deep metric learning using triplet network. arXiv preprint:1412.6622, 2014. 3
    Findings
  • J. Hu, J. Lu, and Y.-P. Tan. Discriminative deep metric learning for face verification in the wild. In CVPR, 2014. 3
    Google ScholarLocate open access versionFindings
  • G. B. Huang and E. Learned-Miller. Labeled faces in the wild: Updates and new reporting procedures. Dept. Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA, USA, Tech. Rep, pages 14–003, 2014. 1, 7
    Google ScholarLocate open access versionFindings
  • G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, Technical Report, 2007. 7
    Google ScholarFindings
  • Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint:1408.5093, 2014. 6
    Findings
  • I. Kemelmacher-Shlizerman, S. M. Seitz, D. Miller, and E. Brossard. The megaface benchmark: 1 million faces for recognition at scale. In CVPR, 2016. 1
    Google ScholarLocate open access versionFindings
  • M. Köstinger, M. Hirzer, P. Wohlhart, P. M. Roth, and H. Bischof. Large scale metric learning from equivalence constraints. In CVPR, 203
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1
    Google ScholarLocate open access versionFindings
  • K.-C. Lee, J. Ho, M.-H. Yang, and D. Kriegman. Video-based face recognition using probabilistic appearance manifolds. In CVPR, 2003. 2
    Google ScholarFindings
  • J. Liu, Y. Deng, and C. Huang. Targeting ultimate accuracy: Face recognition via deep embedding. arXiv preprint:1506.07310, 203, 7
    Findings
  • W. Liu, Y. Wen, Z. Yu, and M. Yang. Large-margin softmax loss for convolutional neural networks. In ICML, 202, 3, 7, 8
    Google ScholarLocate open access versionFindings
  • J. Lu, G. Wang, W. Deng, P. Moulin, and J. Zhou. Multimanifold deep metric learning for image set classification. In CVPR, 2015. 3
    Google ScholarLocate open access versionFindings
  • D. Miller, E. Brossard, S. Seitz, and I. KemelmacherShlizerman. Megaface: A million faces for recognition at scale. arXiv preprint:1505.02108, 2015. 8
    Findings
  • H.-W. Ng and S. Winkler. A data-driven approach to cleaning large face datasets. In ICIP, 2014. 8
    Google ScholarLocate open access versionFindings
  • O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In BMVC, 2015. 6, 7
    Google ScholarLocate open access versionFindings
  • A. Ross and A. K. Jain. Multimodal biometrics: An overview. In Signal Processing Conference, 2004 12th European, pages 1221–1224. IEEE, 2004. 1
    Google ScholarLocate open access versionFindings
  • F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In CVPR, 2015. 1, 2, 3, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint:1409.1556, 2014. 1
    Findings
  • H. O. Song, Y. Xiang, S. Jegelka, and S. Savarese. Deep metric learning via lifted structured feature embedding. In CVPR, 2016. 3
    Google ScholarLocate open access versionFindings
  • Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In NIPS, 2014. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In CVPR, 2014. 2, 3, 7, 8
    Google ScholarLocate open access versionFindings
  • Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. In CVPR, 2015. 7
    Google ScholarLocate open access versionFindings
  • Y. Sun, X. Wang, and X. Tang. Sparsifying neural network connections for face recognition. In CVPR, 2016. 2, 3
    Google ScholarLocate open access versionFindings
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015. 1
    Google ScholarLocate open access versionFindings
  • Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In CVPR, 2014. 2, 3, 6, 7
    Google ScholarLocate open access versionFindings
  • A. Talwalkar, S. Kumar, and H. Rowley. Large-scale manifold learning. In CVPR, 2008. 2
    Google ScholarLocate open access versionFindings
  • J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang, J. Philbin, B. Chen, and Y. Wu. Learning fine-grained image similarity with deep ranking. In CVPR, 2014. 3
    Google ScholarLocate open access versionFindings
  • K. Q. Weinberger and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(Feb):207–244, 2009. 3
    Google ScholarLocate open access versionFindings
  • Y. Wen, K. Zhang, Z. Li, and Y. Qiao. A discriminative feature learning approach for deep face recognition. In ECCV, 2016. 1, 2, 3, 7, 8
    Google ScholarLocate open access versionFindings
  • L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background similarity. In CVPR, 2011. 7
    Google ScholarLocate open access versionFindings
  • E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning with application to clustering with sideinformation. NIPS, 2003. 3
    Google ScholarFindings
  • D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. arXiv preprint:1411.7923, 2014. 2, 6, 7
    Findings
  • Y. Ying and P. Li. Distance metric learning with eigenvalue optimization. JMLR, 13(Jan):1–26, 2012. 3
    Google ScholarLocate open access versionFindings
  • K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. Joint face detection and alignment using multi-task cascaded convolutional networks. arXiv preprint:1604.02878, 2016. 5
    Findings
Full Text
Your rating :
0

 

Tags
Comments