A Discriminative Feature Learning Approach for Deep Face Recognition
ECCV, pp. 499-515, 2016.
EI
Weibo:
Abstract:
Convolutional neural networks (CNNs) have been widely used in computer vision community, significantly improving the state-of-the-art. In most of the available CNNs, the softmax loss function is used as the supervision signal to train the deep model. In order to enhance the discriminative power of the deeply learned features, this paper p...More
Code:
Data:
Introduction
- Convolutional neural networks (CNNs) have achieved great success on vision community, significantly improving the state of the art in classification problems, such as object [11,12,18,28,33], scene [41,42], action [3,16,36] and so on
- It mainly benefits from the large scale training data [8,26] and the end-to-end learning framework.
Highlights
- Convolutional neural networks (CNNs) have achieved great success on vision community, significantly improving the state of the art in classification problems, such as object [11,12,18,28,33], scene [41,42], action [3,16,36] and so on
- We propose a new loss function, namely center loss, to efficiently enhance the discriminative power of the deeply learned features in neural networks
- – We propose a new loss function to minimize the intraclass distances of the deep features
- – We show that the proposed loss function is very easy to implement in the Convolutional neural networks
- We have proposed a new loss function, referred to as center loss
- By combining the center loss with the softmax loss to jointly supervise the learning of Convolutional neural networks, the discriminative power of the deeply learned features can be highly enhanced for robust face recognition
Methods
- Compared to model B, model C achieves better performance (99.10 % v.s. 99.28 % and 93.8 % v.s. 94.9 %)
- This shows the advantage of the center loss over the contrastive loss in the designed CNNs. Last, compared to the state-of-the-art results on the two databases, the results of the proposed model C are consistently among the top-ranked sets of approaches based on the two databases, outperforming most of the existing results in Table 2.
Conclusion
- The authors have proposed a new loss function, referred to as center loss.
- By combining the center loss with the softmax loss to jointly supervise the learning of CNNs, the discriminative power of the deeply learned features can be highly enhanced for robust face recognition.
- Extensive experiments on several largescale face benchmarks have convincingly demonstrated the effectiveness of the proposed approach
Summary
Introduction:
Convolutional neural networks (CNNs) have achieved great success on vision community, significantly improving the state of the art in classification problems, such as object [11,12,18,28,33], scene [41,42], action [3,16,36] and so on- It mainly benefits from the large scale training data [8,26] and the end-to-end learning framework.
Methods:
Compared to model B, model C achieves better performance (99.10 % v.s. 99.28 % and 93.8 % v.s. 94.9 %)- This shows the advantage of the center loss over the contrastive loss in the designed CNNs. Last, compared to the state-of-the-art results on the two databases, the results of the proposed model C are consistently among the top-ranked sets of approaches based on the two databases, outperforming most of the existing results in Table 2.
Conclusion:
The authors have proposed a new loss function, referred to as center loss.- By combining the center loss with the softmax loss to jointly supervise the learning of CNNs, the discriminative power of the deeply learned features can be highly enhanced for robust face recognition.
- Extensive experiments on several largescale face benchmarks have convincingly demonstrated the effectiveness of the proposed approach
Tables
- Table1: The CNNs architecture we use in toy example, called LeNets++. Some of the convolution layers are followed by max pooling. (5, 32)/1,2 × 2 denotes 2 cascaded convolution layers with 32 filters of size 5 × 5, where the stride and padding are 1 and 2 respectively. 2/2,0 denotes the max-pooling layers with grid of 2 × 2, where the stride and padding are 2 and 0 respectively. In LeNets++, we use the Parametric Rectified
- Table2: Verification performance of different methods on LFW and YTF datasets
- Table3: Identification rates of different methods on MegaFace with 1M distractors
- Table4: Verification TAR of different methods at 10−6 FAR on MegaFace with 1M distractors
Related work
- Face recognition via deep learning has achieved a series of breakthrough in these years [25,27,29,30,34,37]. The idea of mapping a pair of face images to a distance starts from [6]. They train siamese networks for driving the similarity metric to be small for positive pairs, and large for the negative pairs. Hu et al [13] learn a nonlinear transformations and yield discriminative deep metric with a margin between positive and negative face image pairs. There approaches are required image pairs as input.
Very recently, [31,34] supervise the learning process in CNNs by challenging identification signal (softmax loss function), which brings richer identityrelated information to deeply learned features. After that, joint identificationverification supervision signal is adopted in [29,37], leading to more discriminative features. [32] enhances the supervision by adding a fully connected layer and loss functions to each convolutional layer. The effectiveness of triplet loss has been demonstrated in [21,25,27]. With the deep embedding, the distance between an anchor and a positive are minimized, while the distance between an anchor and a negative are maximized until the margin is met. They achieve state-of-the-art performance in LFW and YTF datasets.
Funding
- This work was funded by External Cooperation Program of BIC, Chinese Academy of Sciences (172644KYSB20160033, 172644KYSB20150019), Shenzhen Research Program (KQCX2015033117354153, JSGG20150925164740726, CXZZ20150930104115529 and JCYJ20150925163005055), Guangdong Research Program (2014B050505017 and 2015B010129013), Natural Science Foundation of Guangdong Province (2014A030313688) and the Key Laboratory of Human-Machine Intelligence-Synergy Systems through the Chinese Academy of Sciences
Reference
- Fg-net aging database. In: (2010). http://www.fgnet.rsunit.com/
- Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)
- Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011). doi:10.1007/ 978-3-642-25446-8 4 4.
- Chen, B.C., Chen, C.S., Hsu, W.H.: Face recognition and retrieval using cross-age reference coding with cross-age celebrity dataset. IEEE Trans. Multimedia 17(6), 804–815 (2015)
- Chen, X., Li, Q., Song, Y., Jin, X., Zhao, Q.: Supervised geodesic propagation for semantic label transfer. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 553–565. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33712-3 40 6.
- Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 539–54IEEE (2005)
- Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)
- Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)
- Fukunaga, K., Narendra, P.M.: A branch and bound algorithm for computing knearest neighbors. IEEE Trans. Comput. 100(7), 750–753 (1975)
- Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006)
- He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint (2015). arXiv:1512.03385
- He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing humanlevel performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
- Hu, J., Lu, J., Tan, Y.P.: Discriminative deep metric learning for face verification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1875–1882 (2014)
- Huang, G.B., Learned-Miller, E.: Labeled faces in the wild: updates and new reporting procedures. Dept. Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA, USA, Technical report, pp. 14–003 (2014)
- Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, Technical Report 07–49, University of Massachusetts, Amherst (2007)
- Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
- Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
- Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
- LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
- LeCun, Y., Cortes, C., Burges, C.J.: The MNIST database of handwritten digits (1998)
- Liu, J., Deng, Y., Huang, C.: Targeting ultimate accuracy: Face recognition via deep embedding. arXiv preprint (2015). arXiv:1506.07310
- Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)
- Miller, D., Kemelmacher-Shlizerman, I., Seitz, S.M.: Megaface: a million faces for recognition at scale. arXiv preprint (2015). arXiv:1505.02108
- Ng, H.W., Winkler, S.: A data-driven approach to cleaning large face datasets. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 343–347. IEEE (2014)
- Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of the British Machine Vision, vol. 1, no. 3, p. 6 (2015)
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
- Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
- Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint (2014). arXiv:1409.1556
- Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Advances in Neural Information Processing Systems, pp. 1988–1996 (2014)
- Sun, Y., Wang, X., Tang, X.: Hybrid deep learning for face verification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1489–1496 (2013)
- Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898 (2014)
- Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2892–2900 (2015)
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
- Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to humanlevel performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)
- Thomee, B., Shamma, D.A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., Li, L.J.: The new data and new challenges in multimedia research. arXiv preprint (2015). arXiv:1503.01817
- Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deepconvolutional descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4305–4314 (2015)
- Wen, Y., Li, Z., Qiao, Y.: Latent factor guided convolutional neural networks for age-invariant face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4893–4901 (2016)
- Wolf, L., Hassner, T., Maoz, I.: Face recognition in unconstrained videos with matched background similarity. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 529–534. IEEE (2011)
- Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv preprint (2014). arXiv:1411.7923
- Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multi-task cascaded convolutional networks. arXiv preprint (2016). arXiv:1604.02878
- Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene cnns. arXiv preprint (2014). arXiv:1412.6856
- Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp. 487–495 (2014)
Full Text
Tags
Comments