Deep Face Detector Adaptation Without Negative Transfer or Catastrophic Forgetting

CVPR, pp. 5608-5618, 2018.

Cited by: 5|Bibtex|Views23
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
We argue that it is probably easier to deal with the catastrophic forgetting problem for domain adaptation which can be seen as a special case of sequential multi-task learning, due to that the source and target domains share the same semantic labels

Abstract:

Arguably, no single face detector fits all real-life scenarios. It is often desirable to have some built-in schemes for a face detector to automatically adapt, e.g., to a particular useru0027s photo album (the target domain). We propose a novel face detector adaptation approach that works as long as there are representative images of the ...More

Code:

Data:

0
Introduction
  • Recent literatures [3, 4, 5, 6] demonstrate the effectiveness of deep learning for face detection.
  • As a massively data-driven method, the deep learning based face detectors are inevitably biased to the training data distribution.
  • To address the discrepancy between the data distribution in training and the deployment of the face detector, it is highly desirable to have some adaptation mechanism built for the face detectors.
  • When there are labeled or unlabeled images available from a particular target domain, one can adapt the detectors to achieve better performance in the target domain than the original ones do
Highlights
  • Face detection is often the very first step in analyzing faces
  • We propose a novel face detector adaptation approach that is applicable whenever the target domain supplies many representative images, no matter they are labeled or not
  • We argue that it is probably easier to deal with the catastrophic forgetting problem for domain adaptation which can be seen as a special case of sequential multi-task learning, due to that the source and target domains share the same semantic labels
  • When there are no labels available in the target domain, we find a robust target face detector that improves upon the source one under the worst case scenario, min u, θe λ 2
  • We evaluate our method on Caltech Occluded Faces in the Wild (COFW) dataset [56]
  • The approach we proposed offers three key properties which we contend are missing or not explicitly discussed in the existing face detector adaptation works
Methods
  • The authors' approach is model-agnostic, in the sense that it is readily applicable to different types of face detectors.
  • The authors experiment with two deep learning based face detectors: CascadeCNN [53] and Faster-RCNN [1, 2].
  • The CascadeCNN face detector is fast but extracts relatively weaker features while the Faster-RCNN model runs slower due to its use of a bigger network and more discriminative features.
  • Per the comparison experiments in [2], the open-sourced Faster-RCNN face detector model is superior over 11 other top-performing detectors, all of which are published after 2015.
  • It is interesting to note that both AFLW and WIDER FACE strive to cover a wide spectrum of face appearance variations, making them effective sources to adapt from
Results
  • Both WIDER FACE and FDDB datasets have defined and released the code for standard evaluation metrics.
  • The Precision-Recall curve is used by WIDER FACE.
  • FDDB employs the ROC curves of discrete and continuous scores computed from a bipartite graph.
  • The authors use their code to evaluate the results in order to have direct comparison with existing methods
Conclusion
  • The authors revisit the face detector adaptation problem under the new context of deep learning based face detectors.
  • The approach the authors proposed offers three key properties which the authors contend are missing or not explicitly discussed in the existing face detector adaptation works.
  • The adaptation of face detectors is supposed to be executed in the absence of the source domain’s data, with little negative transfer, and incurring no catastrophic forgetting about the source domain.
  • The authors demonstrated the effectiveness of the approach by adapting two face detectors from two large-scale source datasets to two smaller target datasets
Summary
  • Introduction:

    Recent literatures [3, 4, 5, 6] demonstrate the effectiveness of deep learning for face detection.
  • As a massively data-driven method, the deep learning based face detectors are inevitably biased to the training data distribution.
  • To address the discrepancy between the data distribution in training and the deployment of the face detector, it is highly desirable to have some adaptation mechanism built for the face detectors.
  • When there are labeled or unlabeled images available from a particular target domain, one can adapt the detectors to achieve better performance in the target domain than the original ones do
  • Methods:

    The authors' approach is model-agnostic, in the sense that it is readily applicable to different types of face detectors.
  • The authors experiment with two deep learning based face detectors: CascadeCNN [53] and Faster-RCNN [1, 2].
  • The CascadeCNN face detector is fast but extracts relatively weaker features while the Faster-RCNN model runs slower due to its use of a bigger network and more discriminative features.
  • Per the comparison experiments in [2], the open-sourced Faster-RCNN face detector model is superior over 11 other top-performing detectors, all of which are published after 2015.
  • It is interesting to note that both AFLW and WIDER FACE strive to cover a wide spectrum of face appearance variations, making them effective sources to adapt from
  • Results:

    Both WIDER FACE and FDDB datasets have defined and released the code for standard evaluation metrics.
  • The Precision-Recall curve is used by WIDER FACE.
  • FDDB employs the ROC curves of discrete and continuous scores computed from a bipartite graph.
  • The authors use their code to evaluate the results in order to have direct comparison with existing methods
  • Conclusion:

    The authors revisit the face detector adaptation problem under the new context of deep learning based face detectors.
  • The approach the authors proposed offers three key properties which the authors contend are missing or not explicitly discussed in the existing face detector adaptation works.
  • The adaptation of face detectors is supposed to be executed in the absence of the source domain’s data, with little negative transfer, and incurring no catastrophic forgetting about the source domain.
  • The authors demonstrated the effectiveness of the approach by adapting two face detectors from two large-scale source datasets to two smaller target datasets
Related work
  • Face detector adaptation. Jain and Learned-Miller use a Gaussian process to update the low detection scores by assuming smoothness of the detections and that the detected regions of high scores are more likely correct than the others [7]. Wang et al [8] and Li et al [9] make similar assumptions and yet use the regions of high detection scores to re-train a new detector for the target domain using vocabulary trees and probabilistic elastic part models, respectively. When the target domain comprises video sequences, the motion and tracking cues are usually very effective for adapting the detectors [24, 25, 26, 27, 28]. Domain adaptation. There has been a rich line of works on domain adaptation for generic visual recognition [13, 29], such as object recognition [14], action recognition [30], Webly-supervised learning [31, 32, 33], attribute detection [34], etc. They minimize the discrepancy between the source and target by exploring the data from both domains. However, the modern face detectors are often trained from an extreme-scale training set, making it hard to carry the source data to the adaptation stage. Domain adaptation in the absence of the source data [35, 36] is the most relevant to ours. Such methods use the source models either for regularization [36] or to augment the features of the target data [35], while we consider a different problem, deep face detectors, and refer to the source model in both the cost function and the classifier of the target face detector. Negative transfer is a notorious caveat in domain adaptation [37, 38, 39, 40]. Whereas existing works attempt to solve this problem by defining intuitive statistical measures, we directly tackle it with a novel cost function motivated by the safe semi-supervised learning [41, 42, 43]. Nonetheless, we devise the cost function in such a way of seamlessly integrating it with the deep models. Besides, we derive an analytic form for the unsupervised adaptation, getting rid of the cumbersome EM style optimization. Catastrophic forgetting or interference [17, 44, 45, 18] refers to that a pre-trained network cannot perform well on the old tasks after it is fine-tuned for a new task. Recent years witness an upsurge of interest in this problem, including the exploitation of a local winner-takes-all activation function [46], dropout [16, 47], a knowledge distillation loss [48, 49, 50], pathway connections [51], and progressive networks [52]. We argue that it is probably easier to deal with the catastrophic forgetting problem for domain adaptation which can be seen as a special case of sequential multi-task learning, due to that the source and target domains share the same semantic labels. We leverage exactly this idiosyncrasy to re-parameterize the target classifier as the source classifier plus an offset.
Reference
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015. 2, 4
    Google ScholarLocate open access versionFindings
  • Huaizu Jiang and Erik G. Learned-Miller. Face detection with the faster R-CNN. CoRR, abs/1606.03473, 2016. 2, 3, 4
    Findings
  • Yu Liu, Hongyang Li, Junjie Yan, Fangyin Wei, Xiaogang Wang, and Xiaoou Tang. Recurrent scale approximation for object detection in cnn. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017. 1
    Google ScholarLocate open access versionFindings
  • Mahyar Najibi, Pouya Samangouei, Rama Chellappa, and Larry S. Davis. Ssh: Single stage headless face detector. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017. 1
    Google ScholarLocate open access versionFindings
  • Shifeng Zhang, Xiangyu Zhu, Zhen Lei, Hailin Shi, Xiaobo Wang, and Stan Z. Li. S3fd: Single shot scale-invariant face detector. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017. 1
    Google ScholarLocate open access versionFindings
  • Shuo Yang, Ping Luo, Chen-Change Loy, and Xiaoou Tang. Wider face: A face detection benchmark. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2011, 4
    Google ScholarLocate open access versionFindings
  • Vidit Jain and Erik G. Learned-Miller. Online domain adaptation of a pre-trained cascade of classifiers. 2011. 1, 2, 4, 5
    Google ScholarFindings
  • Xiaoyu Wang, Gang Hua, and T.X. Han. Detection by detections: Non-parametric detector adaptation for a video. 2012. 1, 2, 4
    Google ScholarFindings
  • Haoxiang Li, Gang Hua, Zhe Lin, Jonathan Brandt, and Jianchao Yang. Probabilistic elastic part model for unsupervised face detector adaptation. In The IEEE International Conference on Computer Vision (ICCV), December 2013. 1, 2, 4
    Google ScholarLocate open access versionFindings
  • Corinna Cortes, Mehryar Mohri, Michael Riley, and Afshin Rostamizadeh. Sample selection bias correction theory. In International Conference on Algorithmic Learning Theory, pages 38–53. Springer, 2008. 1
    Google ScholarLocate open access versionFindings
  • Boqing Gong, Kristen Grauman, and Fei Sha. Connecting the dots with landmarks: Discriminatively learning domaininvariant features for unsupervised domain adaptation. In ICML, pages 222–230, 2013. 1
    Google ScholarLocate open access versionFindings
  • Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. Learning transferable features with deep adaptation networks. In International Conference on Machine Learning, pages 97–105, 2015. 1
    Google ScholarLocate open access versionFindings
  • Raghuraman Gopalan, Ruonan Li, Vishal M Patel, Rama Chellappa, et al. Domain adaptation for visual recognition. Foundations and Trends R in Computer Graphics and Vision, 8(4):285–378, 2015. 1, 2
    Google ScholarLocate open access versionFindings
  • Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. Adapting visual category models to new domains. Computer Vision–ECCV 2010, pages 213–226, 2010. 1, 2
    Google ScholarLocate open access versionFindings
  • Boqing Gong, Kristen Grauman, and Fei Sha. Learning kernels for unsupervised domain adaptation with applications to visual object recognition. International Journal of Computer Vision, 109(1-2):3–27, 2014. 1
    Google ScholarLocate open access versionFindings
  • Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211, 2013. 1, 3, 8
    Findings
  • Michael McCloskey and Neal J Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of learning and motivation, 24:109– 165, 1989. 1, 3
    Google ScholarLocate open access versionFindings
  • JL McClelland. A connectionist perspective on knowledge and development. 1995. 1, 3
    Google ScholarFindings
  • Daniel Jurafsky and James H. Martin. Speech and language processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2017. 1
    Google ScholarLocate open access versionFindings
  • Ruslan Salakhutdinov and Geoffrey Hinton. Deep boltzmann machines. In Artificial Intelligence and Statistics, pages 448–455, 2009. 2, 5
    Google ScholarLocate open access versionFindings
  • Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in neural information processing systems, pages 3320–3328, 2014. 2
    Google ScholarLocate open access versionFindings
  • Yoav Freund, Robert E Schapire, et al. Experiments with a new boosting algorithm. In icml, volume 96, pages 148–156, 1996. 2
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016. 2
    Google ScholarLocate open access versionFindings
  • Kevin Tang, Vignesh Ramanathan, Li Fei-Fei, and Daphne Koller. Shifting weights: Adapting object detectors from image to video. In Advances in Neural Information Processing Systems, pages 638–646, 2012. 2
    Google ScholarLocate open access versionFindings
  • Pramod Sharma and Ram Nevatia. Efficient detector adaptation for object detection in a video. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013. 2
    Google ScholarLocate open access versionFindings
  • Meng Wang and Xiaogang Wang. Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3401–3408. IEEE, 2011. 2
    Google ScholarLocate open access versionFindings
  • Enver Sangineto. Statistical and spatial consensus collection for detector adaptation. In European Conference on Computer Vision, pages 456–471. Springer, 2014. 2
    Google ScholarLocate open access versionFindings
  • Peter M. Roth, Sabine Sternig, Helmut Grabner, and Horst Bischof. Classifier grids for robust adaptive object detection. In cvpr, 2009. 2
    Google ScholarLocate open access versionFindings
  • Gabriela Csurka. Domain adaptation for visual applications: A comprehensive survey. arXiv preprint arXiv:1702.05374, 2017. 2
    Findings
  • Ruonan Li and Todd Zickler. Discriminative virtual views for cross-view action recognition. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2855–2862. IEEE, 2012. 2
    Google ScholarLocate open access versionFindings
  • Alessandro Bergamo and Lorenzo Torresani. Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In Advances in Neural Information Processing Systems, pages 181–189, 2010. 2
    Google ScholarLocate open access versionFindings
  • Lixin Duan, Dong Xu, Ivor Wai-Hung Tsang, and Jiebo Luo. Visual event recognition in videos by learning from web data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9):1667–1680, 2012. 2
    Google ScholarLocate open access versionFindings
  • Xinlei Chen and Abhinav Gupta. Webly supervised learning of convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 1431–1439, 2015. 2
    Google ScholarLocate open access versionFindings
  • Chuang Gan, Tianbao Yang, and Boqing Gong. Learning attributes equals multi-source domain generalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 87–97, 2016. 2
    Google ScholarLocate open access versionFindings
  • Boris Chidlovskii, Stephane Clinchant, and Gabriela Csurka. Domain adaptation in the absence of source domain data. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 451–460. ACM, 2016. 2
    Google ScholarLocate open access versionFindings
  • Ilja Kuzborskij and Francesco Orabona. Stability and hypothesis transfer learning. In ICML (3), pages 942–950, 2013. 2, 5
    Google ScholarLocate open access versionFindings
  • Michael T Rosenstein, Zvika Marx, Leslie Pack Kaelbling, and Thomas G Dietterich. To transfer or not to transfer. In NIPS 2005 Workshop on Transfer Learning, volume 898, 2005. 3
    Google ScholarLocate open access versionFindings
  • Liang Ge, Jing Gao, Hung Ngo, Kang Li, and Aidong Zhang. On handling negative transfer and imbalanced distributions in multiple source transfer learning. Statistical Analysis and Data Mining, 7(4):254–271, 2014. 3
    Google ScholarLocate open access versionFindings
  • Chun-Wei Seah, Yew-Soon Ong, and Ivor W Tsang. Combating negative transfer from predictive distribution differences. IEEE transactions on cybernetics, 43(4):1153–1165, 2013. 3
    Google ScholarLocate open access versionFindings
  • Hao Shao, Bin Tong, and Einoshin Suzuki. Compact coding for hyperplane classifiers in heterogeneous environment. Machine Learning and Knowledge Discovery in Databases, pages 207–222, 2011. 3
    Google ScholarLocate open access versionFindings
  • Yu-Feng Li and Zhi-Hua Zhou. Towards making unlabeled data never hurt. In Lise Getoor and Tobias Scheffer, editors, Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML ’11, pages 1081–1088, New York, NY, USA, June 2011. ACM. 3
    Google ScholarLocate open access versionFindings
  • Yu-Feng Li, James T Kwok, and Zhi-Hua Zhou. Towards safe semi-supervised learning for multivariate performance measures. In AAAI, pages 1816–1822, 2016. 3
    Google ScholarLocate open access versionFindings
  • Marco Loog. Contrastive pessimistic likelihood estimation for semi-supervised classification. IEEE transactions on pattern analysis and machine intelligence, 38(3):462–475, 2016. 3
    Google ScholarLocate open access versionFindings
  • Roger Ratcliff. Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological review, 97(2):285–308, 1990. 3
    Google ScholarLocate open access versionFindings
  • Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999. 3
    Google ScholarLocate open access versionFindings
  • Rupesh K Srivastava, Jonathan Masci, Sohrob Kazerounian, Faustino Gomez, and Jurgen Schmidhuber. Compete to compute. In Advances in neural information processing systems, pages 2310–2318, 2013. 3
    Google ScholarLocate open access versionFindings
  • Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014. 3
    Google ScholarLocate open access versionFindings
  • Zhizhong Li and Derek Hoiem. Learning without forgetting. In European Conference on Computer Vision, pages 614– 629. Springer, 2016. 3, 8
    Google ScholarLocate open access versionFindings
  • Matthew Riemer, Elham Khabiri, and Richard Goodwin. Representation stability as a regularizer for improved text analytics transfer learning. arXiv preprint arXiv:1704.03617, 2017. 3
    Findings
  • Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. 3, 5
    Findings
  • Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A Rusu, Alexander Pritzel, and Daan Wierstra. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734, 2017. 3
    Findings
  • Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016. 3
    Findings
  • Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, and Gang Hua. A convolutional neural network cascade for face detection. In CVPR, pages 5325–5334. IEEE Computer Society, 2015. 4
    Google ScholarLocate open access versionFindings
  • Peter M. Roth Martin Koestinger, Paul Wohlhart and Horst Bischof. Annotated Facial Landmarks in the Wild: A Largescale, Real-world Database for Facial Landmark Localization. In Proc. First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011. 4
    Google ScholarLocate open access versionFindings
  • Vidit Jain and Erik Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. Technical report, 2010. 5
    Google ScholarFindings
  • Xavier P. Burgos-Artizzu, Pietro Perona, and Piotr Dollar. Robust face landmark estimation under occlusion. In Proceedings of the 2013 IEEE International Conference on Computer Vision, ICCV ’13, pages 1513–1520, Washington, DC, USA, 2013. IEEE Computer Society. 5
    Google ScholarLocate open access versionFindings
  • Zhizhong Li and Derek Hoiem. Learning without forgetting. CoRR, abs/1606.09282, 2016. 5
    Findings
  • Shuang Ao, Xiang Li, and Charles X Ling. Fast generalized distillation for semi-supervised domain adaptation. In AAAI, 2017. 5
    Google ScholarLocate open access versionFindings
  • David Lopez-Paz, Leon Bottou, Bernhard Scholkopf, and Vladimir Vapnik. Unifying distillation and privileged information. arXiv preprint arXiv:1511.03643, 2015. 5
    Findings
  • Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning, pages 1180–1189, 2015. 5
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments