Partial Transfer Learning with Selective Adversarial Networks

computer vision and pattern recognition, 2018.

Cited by: 132|Bibtex|Views43|Links
EI
Keywords:
reproducing kernel Hilbert spacesdatum distributionstrong motivationtarget domainsource label spaceMore(22+)
Weibo:
It achieves considerable accuracy gains on tasks with large-scale source domain and target domain, e.g. I 1000 → C 84. These results suggest that Selective Adversarial Network can learn transferable features for partial transfer learning in all the tasks under the setting where t...

Abstract:

Adversarial learning has been successfully embedded into deep networks to learn transferable features, which reduce distribution discrepancy between the source and target domains. Existing domain adversarial networks assume fully shared label space across domains. In the presence of big data, there is strong motivation of transferring bot...More

Code:

Data:

0
Introduction
  • Deep networks have significantly improved the state of the art for a wide variety of machine learning problems and applications.
  • Since manual labeling of sufficient training data for diverse application domains on-the-fly is often prohibitive, for problems short of labeled data, there is strong motivation to establishing effective algorithms to reduce the labeling consumption, typically by leveraging off-the-shelf labeled data from a different but related source domain
  • This promising transfer learning paradigm, suffers from the shift in data distributions across different domains, which poses a major obstacle in adapting classification models to target tasks [23].
  • The latest advances have been achieved by embedding transfer learning in the pipeline of deep feature learning to extract domain-invariant deep representations [30, 16, 7, 31, 18]
Highlights
  • Deep networks have significantly improved the state of the art for a wide variety of machine learning problems and applications
  • We introduce a novel partial transfer learning problem, assuming that the target label space is a subspace of the source label space
  • This paper presents Selective Adversarial Networks (SAN), which largely extends the ability of deep adversarial adaptation [7] to address partial transfer learning from largetarget domain source domain soccer-ball binoculars soccer-ball binoculars scale domains to small-scale domains
  • It achieves considerable accuracy gains on tasks with large-scale source domain and target domain, e.g. I 1000 → C 84. These results suggest that Selective Adversarial Network can learn transferable features for partial transfer learning in all the tasks under the setting where the target label space is a subspace of the source label space
  • (1) Previous deep transfer learning methods including those based on adversarial-network like RevGrad and those based on MMD like Deep Adaptation Network perform worse than standard AlexNet, which demonstrates the influence of negative transfer effect
  • Unlike previous adversarial adaptation methods that match the whole source and target domains based on the shared label space assumption, the proposed approach simultaneously circumvents negative transfer by selecting out the outlier source classes and promotes positive transfer by maximally matching the data distributions in the shared label space
Methods
  • Fooling the adversarial network to match the distribution of outlier source data and target data will make the classifier more likely to classify target data in these outlier classes, which is prone to negative transfer
  • These previous methods perform even worse than standard AlexNet. SAN outperforms them by large margins, indicating that SAN can effectively avoid negative transfer by eliminating the outlier source classes irrelevant to target domain.
  • As shown in Table 1, the SAN performs 6.71% worse than the upper bound while best baseline ADDA 12.56% worse
Results
  • The classification results on the six tasks of Office-31, the three tasks of Caltech-Office and the two tasks of ImageNetCaltech are shown in Table 1 and 2.
  • (1) Previous deep transfer learning methods including those based on adversarial-network like RevGrad and those based on MMD like DAN perform worse than standard AlexNet, which demonstrates the influence of negative transfer effect
  • These methods try to transfer knowledge from all classes of source domain to target domain but there are classes in source domain that do not exist in the target domain, a.k.a
Conclusion
  • This paper presented a novel selective adversarial network approach to partial transfer learning.
  • Unlike previous adversarial adaptation methods that match the whole source and target domains based on the shared label space assumption, the proposed approach simultaneously circumvents negative transfer by selecting out the outlier source classes and promotes positive transfer by maximally matching the data distributions in the shared label space.
  • The authors' approach successfully tackles partial transfer learning where source label space subsumes target label space, which is testified by extensive experiments
Summary
  • Introduction:

    Deep networks have significantly improved the state of the art for a wide variety of machine learning problems and applications.
  • Since manual labeling of sufficient training data for diverse application domains on-the-fly is often prohibitive, for problems short of labeled data, there is strong motivation to establishing effective algorithms to reduce the labeling consumption, typically by leveraging off-the-shelf labeled data from a different but related source domain
  • This promising transfer learning paradigm, suffers from the shift in data distributions across different domains, which poses a major obstacle in adapting classification models to target tasks [23].
  • The latest advances have been achieved by embedding transfer learning in the pipeline of deep feature learning to extract domain-invariant deep representations [30, 16, 7, 31, 18]
  • Objectives:

    The goal of this paper is to design a deep neural network that enables learning of transfer features f = Gf (x) and adaptive classifier y = Gy (f ) to bridge the cross-domain discrepancy, such that the target risk Pr(x,y)∼q [Gy (Gf (x)) = y] is minimized by leveraging the source domain supervision
  • Methods:

    Fooling the adversarial network to match the distribution of outlier source data and target data will make the classifier more likely to classify target data in these outlier classes, which is prone to negative transfer
  • These previous methods perform even worse than standard AlexNet. SAN outperforms them by large margins, indicating that SAN can effectively avoid negative transfer by eliminating the outlier source classes irrelevant to target domain.
  • As shown in Table 1, the SAN performs 6.71% worse than the upper bound while best baseline ADDA 12.56% worse
  • Results:

    The classification results on the six tasks of Office-31, the three tasks of Caltech-Office and the two tasks of ImageNetCaltech are shown in Table 1 and 2.
  • (1) Previous deep transfer learning methods including those based on adversarial-network like RevGrad and those based on MMD like DAN perform worse than standard AlexNet, which demonstrates the influence of negative transfer effect
  • These methods try to transfer knowledge from all classes of source domain to target domain but there are classes in source domain that do not exist in the target domain, a.k.a
  • Conclusion:

    This paper presented a novel selective adversarial network approach to partial transfer learning.
  • Unlike previous adversarial adaptation methods that match the whole source and target domains based on the shared label space assumption, the proposed approach simultaneously circumvents negative transfer by selecting out the outlier source classes and promotes positive transfer by maximally matching the data distributions in the shared label space.
  • The authors' approach successfully tackles partial transfer learning where source label space subsumes target label space, which is testified by extensive experiments
Tables
  • Table1: Classification Accuracy (%) of Partial Transfer Learning Tasks on Office-31 (AlexNet as Base Network)
  • Table2: Classification Accuracy (%) of Partial Transfer Learning Tasks on Caltech-Office and ImageNet-Caltech (AlexNet as Base Network)
  • Table3: Classification Accuracy (%) of Partial Transfer Learning Tasks on Office-31 (VGG-16 as Base Network)
Download tables as Excel
Related work
  • Transfer learning [23] bridges different domains or tasks to mitigate the burden of manual labeling for machine learning [22, 6, 34, 32], computer vision [26, 9, 14] and natural language processing [4]. The main technical difficulty of transfer learning is to formally reduce the distribution discrepancy across different domains. Deep networks can learn abstract representations that disentangle different explanatory factors of variations behind data [2] and manifest invariant factors underlying different populations that transfer well from original tasks to similar novel tasks [33]. Thus deep networks have been explored for transfer learning [8, 21, 14], multimodal and multi-task learning [4, 20], where significant performance gains have been witnessed relative to prior shallow transfer learning methods.

    However, recent advances show that deep networks can learn abstract feature representations that can only reduce, but not remove, the cross-domain discrepancy [8, 30], resulting in unbounded risk for target tasks [19, 1]. Some recent work bridges deep learning and domain adaptation [30, 16, 7, 31, 18], which extends deep convolutional networks (CNNs) to domain adaptation by adding adaptation layers through which the mean embeddings of distributions are matched [30, 16, 18], or by adding a subnetwork as domain discriminator while the deep features are learned to confuse the discriminator in a domain-adversarial training paradigm [7, 31]. While performance was significantly improved, these state of the art methods may be restricted by the assumption that the source and target domains share the same label space. This assumption is violated in partial transfer learning, which transfers both representation and classification models from existing large-scale domains to unknown small-scale domains. To our knowledge, this is the first work that addresses partial transfer learning in adversarial networks.
Funding
  • This work was supported by the National Key R&D Program of China (No 2017YFC1502003), the National Natural Science Foundation of China (No 61772299, 71690231, 61502265), and Tsinghua TNList Laboratory Key Project
Reference
  • S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan. A theory of learning from different domains. MLJ, 79(1-2):151–175, 2010. 2
    Google ScholarLocate open access versionFindings
  • Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. TPAMI, 35(8):1798– 1828, 2013. 2
    Google ScholarLocate open access versionFindings
  • P. P. Busto and J. Gall. Open set domain adaptation. In The IEEE International Conference on Computer Vision (ICCV), volume 1, page 3, 2017. 1
    Google ScholarLocate open access versionFindings
  • R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. JMLR, 12:2493–2537, 2011. 2
    Google ScholarLocate open access versionFindings
  • J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, 2014. 1, 8
    Google ScholarLocate open access versionFindings
  • L. Duan, I. W. Tsang, and D. Xu. Domain transfer multiple kernel learning. TPAMI, 34(3):465–479, 2012. 2
    Google ScholarLocate open access versionFindings
  • Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. S. Lempitsky. Domainadversarial training of neural networks. Journal of Machine Learning Research, 17:59:1–59:35, 2016. 1, 2, 3, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • X. Glorot, A. Bordes, and Y. Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. In ICML, 2011. 2
    Google ScholarLocate open access versionFindings
  • B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In CVPR, 2012. 2, 5
    Google ScholarLocate open access versionFindings
  • Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization. In NIPS, pages 529–536, 2004. 4
    Google ScholarLocate open access versionFindings
  • A. Gretton, B. Sriperumbudur, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil, and K. Fukumizu. Optimal kernel choice for large-scale two-sample tests. In NIPS, 2012. 6
    Google ScholarLocate open access versionFindings
  • G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical report, California Institute of Technology, 2007. 5
    Google ScholarFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 5
    Google ScholarLocate open access versionFindings
  • J. Hoffman, S. Guadarrama, E. Tzeng, R. Hu, J. Donahue, R. Girshick, T. Darrell, and K. Saenko. LSDA: Large scale detection through adaptation. In NIPS, 202
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 5, 6
    Google ScholarLocate open access versionFindings
  • M. Long, Y. Cao, J. Wang, and M. I. Jordan. Learning transferable features with deep adaptation networks. In ICML, 2015. 1, 2, 3, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu. Transfer feature learning with joint distribution adaptation. In ICCV, 2013. 5
    Google ScholarFindings
  • M. Long, H. Zhu, J. Wang, and M. I. Jordan. Unsupervised domain adaptation with residual transfer networks. In Advances in Neural Information Processing Systems, pages 136–144, 2016. 1, 2, 3, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation: Learning bounds and algorithms. In COLT, 2009. 2
    Google ScholarLocate open access versionFindings
  • J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In ICML, 2011. 2
    Google ScholarLocate open access versionFindings
  • M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In CVPR, June 2013. 2
    Google ScholarLocate open access versionFindings
  • S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang. Domain adaptation via transfer component analysis. TNNLS, 22(2):199– 210, 2011. 2
    Google ScholarLocate open access versionFindings
  • S. J. Pan and Q. Yang. A survey on transfer learning. TKDE, 22(10):1345–1359, 2010. 1, 2
    Google ScholarLocate open access versionFindings
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. 2014. 7
    Google ScholarFindings
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. 5
    Google ScholarLocate open access versionFindings
  • K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In ECCV, 2010. 2, 5
    Google ScholarLocate open access versionFindings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015 (arXiv:1409.1556v6), 2015. 5, 7
    Findings
  • B. Sun, J. Feng, and K. Saenko. Return of frustratingly easy domain adaptation. In AAAI, 2016. 5
    Google ScholarLocate open access versionFindings
  • E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adaptation. In CVPR, 2017. 5, 6, 7
    Google ScholarLocate open access versionFindings
  • E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell. Deep domain confusion: Maximizing for domain invariance. 2014. 1, 2
    Google ScholarFindings
  • E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell. Simultaneous deep transfer across domains and tasks. In ICCV, 2015. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • X. Wang and J. Schneider. Flexible transfer learning under support and model shift. In NIPS, 2014. 2
    Google ScholarLocate open access versionFindings
  • J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In NIPS, 2014. 1, 2, 7
    Google ScholarLocate open access versionFindings
  • K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional shift. In ICML, 2013. 2
    Google ScholarFindings
Your rating :
0

 

Tags
Comments