Adaptation Regularization: A General Framework for Transfer Learning

IEEE Trans. Knowl. Data Eng., no. 5 (2014): 1076-1089

Cited: 478|Views75
EI WOS

Abstract

Domain transfer learning, which learns a target classifier using labeled data from a different distribution, has shown promising value in knowledge discovery yet still been a challenging problem. Most previous works designed adaptive classifiers by exploring two learning strategies independently: distribution adaptation and label propagat...More

Code:

Data:

0
Introduction
  • The author Ts is very difficult, if not impossible, to induce a supervised classifier without any labeled data.
  • For the emerging domains where labeled data are sparse, to save the manual labeling efforts, one may expect to leverage abundant labeled data available in a related source domain for training an accurate classifier to be reused in the target domain.
  • One major computational issue of transfer learning is how to reduce the difference in distributions between the source and target data.
  • Recent works aim to discover a good feature representation across domains, which can simultaneously reduce the distribution difference and preserve the important properties of the original data [10].
Highlights
  • I T is very difficult, if not impossible, to induce a supervised classifier without any labeled data
  • We propose a general transfer learning framework, referred to as Adaptation Regularization based Transfer Learning (ARTL), to model the joint distribution adaptation and manifold regularization in a unified way underpinned by the structural risk minimization principle and the regularization theory
  • We propose two methods using Regularized Least Squares and Support Vector Machines, and derive learning algorithms using the Representer theorem in Reproducing Kernel Hilbert Space
  • Spectral Feature Alignment is not compared since it cannot handle non-sparse image data, while Laplacian SVM and ARSVM are not compared since their original implementations cannot deal with multi-class problems
  • We proposed a general framework, referred to as Adaptation Regularization based Transfer Learning (ARTL), to address cross-domain learning problems
  • Adaptation Regularization based Transfer Learning is robust to the distribution difference between domains, and can significantly improve cross-domain text/image classification problems
Methods
  • Shrinkage Regularization σ: The authors run ARTL with varying values of σ.
  • Σ controls model complexity of the adaptive classifier.
  • When σ → 0, the classifier degenerates and overfitting occurs.
  • On the contrary, when σ → ∞, ARTL is dominated by the shrinkage regularization without fitting the input data.
  • The authors plot the classification accuracy w.r.t. different values of σ in Figure 6(b), and choose σ ∈ [0.001, 1]
Results
  • The authors compare the ARTL with the eight baseline methods in terms of classification accuracy.
  • 7. http://www.csie.ntu.edu.tw/∼cjlin/liblinear 8.
  • Http://www.csie.ntu.edu.tw/∼cjlin/libsvm 9.
  • The average classification accuracy of ARTL and the six baseline methods on the four image datasets is illustrated in Figure 4(b).
  • The transfer subspace learning methods, e.g., CDSC, generally outperform standard LR and SVM.
  • This is an expected result, since subspace learning methods, e.g., PCA, are very effective for image representation.
  • The main reasons are two-folds: 1) the MMD distance measure is not very suitable for image data, as exemplified by [12]; 2) the distribution difference is significantly large in the image datasets, resulting in the overfitting issues
Conclusion
  • The authors proposed a general framework, referred to as Adaptation Regularization based Transfer Learning (ARTL), to address cross-domain learning problems.
  • 0.5 1 γ (d) manifold regularization γ structural risk functional, joint distribution adaptation of both the marginal and conditional distributions, and the manifold consistency.
  • ARTL is robust to the distribution difference between domains, and can significantly improve cross-domain text/image classification problems.
  • Extensive experiments on 219 text datasets and 4 image datasets validate that the proposed approach can achieve superior performance than state-of-the-art adaptation methods
Tables
  • Table1: Notations and descriptions used in this paper
  • Table2: Comparison between Most Closely Related Works
  • Table3: Top categories and subcategories in 20-Newsgroups
  • Table4: Statistics of the 4 benchmark image datasets
  • Table5: Average classification accuracy (%) on the 6 cross-domain text dataset groups comprising of 216 datasets
  • Table6: Time complexity of ARTL and the baseline methods
Download tables as Excel
Related work
  • In this section, we discuss previous works on transfer learning that are most related to our work, and highlight their differences. According to literature survey [1], most previous methods can be roughly organized into two categories: instance reweighting [23], [24] and feature extraction. Our work belongs to the feature extraction category, which includes two subcategories: transfer subspace learning and transfer classifier induction.

    2.1 Transfer Subspace Learning

    These methods aim to extract a shared subspace in which the distributions of the source and target data are drawn close. Typical learning strategies includes: 1) Correspondence Learning, which first identifies the correspondence among features and then explores this correspondence for transfer subspace learning [4], [5]; 2) Property Preservation, which extracts shared latent factors between domains by preserving the important properties of the original data, e.g., statistical property [25], [2], geometric structure [26], [27], [28], or both [3]; 3) Distribution Adaptation, which learns a shared subspace where the distribution difference is explicitly reduced by minimizing predefined distance measures, e.g., MMD or Bregman divergence [11], [12], [10], [29].

    2.2 Transfer Classifier Induction

    These methods aim to directly design an adaptive classifier by incorporating the adaptation of different distributions through model regularization. For easy discussion, the learning strategies of these methods are summarized as below. Our ARTL framework belongs to this subcategory, with substantial extensions.
Funding
  • This work is supported by National HGJ Key Project (2010ZX01042-002-002), National High-Tech Development Program (2012AA040911), National Basic Research Program (2009CB320700), and National Natural Science Foundation of China (61073005, 61271394)
  • Yu is supported in part by US NSF through grants OISE-1129076, CNS-1115234, DBI-0960443, and US Department of Army through grant W911NF-121-0066
Reference
  • S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, pp. 1345–1359, 2010.
    Google ScholarLocate open access versionFindings
  • F. Zhuang, P. Luo, Z. Shen, Q. He, Y. Xiong, Z. Shi, and H. Xiong, “Mining distinction and commonality across multiple domains using generative model for text classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 11, 2011.
    Google ScholarLocate open access versionFindings
  • M. Long, J. Wang, G. Ding, D. Shen, and Q. Yang, “Transfer learning with graph co-regularization,” in Proceedings of the 26th AAAI Conference on Artificial Intelligence, ser. AAAI, 2012.
    Google ScholarLocate open access versionFindings
  • J. Blitzer, R. McDonald, and F. Pereira, “Domain adaptation with structural correspondence learning,” in Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, ser. EMNLP, 2006.
    Google ScholarLocate open access versionFindings
  • S. J. Pan, X. Ni, J.-T. Sun, Q. Yang, and Z. Chen, “Cross-domain sentiment classification via spectral feature alignment,” in Proceedings of the 19th International Conference on World Wide Web, ser. WWW, 2010.
    Google ScholarLocate open access versionFindings
  • Y. Zhu, Y. Chen, Z. Lu, S. J. Pan, G.-R. Xue, Y. Yu, and Q. Yang, “Heterogeneous transfer learning for image classification,” in Proceedings of the 25th AAAI Conference on Artificial Intelligence, ser. AAAI, 2011.
    Google ScholarLocate open access versionFindings
  • M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych, and B. Schiele, “What helps where – and why? semantic relatedness for knowledge transfer,” in Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition, ser. CVPR, 2010.
    Google ScholarLocate open access versionFindings
  • L. Li, K. Zhou, G.-R. Xue, H. Zha, and Y. Yu, “Video summarization via transferrable structured learning,” in Proceedings of International Conference on World Wide Web, ser. WWW, 2011.
    Google ScholarLocate open access versionFindings
  • B. Li, Q. Yang, and X. Xue, “Transfer learning for collaborative filtering via a rating-matrix generative model,” in Proceedings of the 26th International Conference on Machine Learning, ser. ICML, 2009.
    Google ScholarLocate open access versionFindings
  • S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,” IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 199–210, 2011.
    Google ScholarLocate open access versionFindings
  • S. J. Pan, J. T. Kwok, and Q. Yang, “Transfer learning via dimensionality reduction,” in Proceedings of the 22nd AAAI Conference on Artificial Intelligence, ser. AAAI, 2008.
    Google ScholarLocate open access versionFindings
  • S. Si, D. Tao, and B. Geng, “Bregman divergence-based regularization for transfer subspace learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 7, 2010.
    Google ScholarLocate open access versionFindings
  • A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Scholkopf, and A. J. Smola, “A kernel method for the two-sample problem,” in Neural Information Processing Systems, ser. NIPS, 2006.
    Google ScholarLocate open access versionFindings
  • J. Yang, R. Yan, and A. G. Hauptmann, “Cross-domain video concept detection using adaptive svms,” in Proceedings of the 15th international conference on Multimedia, ser. ACM MM, 2007.
    Google ScholarLocate open access versionFindings
  • B. Quanz and J. Huan, “Large margin transductive transfer learning,” in Proceedings of the 18th ACM conference on Information and knowledge management, ser. CIKM, 2009.
    Google ScholarLocate open access versionFindings
  • J. Tao, F.-L. Chung, and S. Wang, “On minimum distribution discrepancy support vector machine for domain adaptation,” Pattern Recognition, vol. 45, no. 11, 2012.
    Google ScholarLocate open access versionFindings
  • L. Duan, I. W. Tsang, and D. Xu, “Domain transfer multiple kernel learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 3, pp. 465–479, 2012.
    Google ScholarLocate open access versionFindings
  • E. Zhong, W. Fan, J. Peng, K. Zhang, J. Ren, D. Turaga, and O. Verscheure, “Cross domain distribution adaptation via kernel mapping,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD, 2009.
    Google ScholarLocate open access versionFindings
  • L. Bruzzone and M. Marconcini, “Domain adaptation problems: A dasvm classification technique and a circular validation strategy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 5, 2010.
    Google ScholarLocate open access versionFindings
  • M. T. Bahadori, Y. Liu, and D. Zhang, “Learning with minimum supervision: A general framework for transductive transfer learning,” in Proceedings of the 11th IEEE International Conference on Data Mining, ser. ICDM, 2011.
    Google ScholarLocate open access versionFindings
  • M. Xiao and Y. Guo, “Semi-supervised kernel matching for domain adaptation,” in Proceedings of the 26th AAAI Conference on Artificial Intelligence, ser. AAAI, 2012.
    Google ScholarLocate open access versionFindings
  • M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: A geometric framework for learning from labeled and unlabeled examples,” Journal of Machine Learning Research, vol. 7, pp. 2399–2434, 2006.
    Google ScholarLocate open access versionFindings
  • W. Dai, Q. Yang, G.-R. Xue, and Y. Yu, “Boosting for transfer learning,” in Proceedings of the 24th International Conference on Machine Learning, ser. ICML, 2007.
    Google ScholarLocate open access versionFindings
  • J. Jiang and C. Zhai, “Instance weighting for domain adaptation in nlp,” in Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, ser. ACL, 2007.
    Google ScholarLocate open access versionFindings
  • W. Dai, G.-R. Xue, Q. Yang, and Y. Yu, “Co-clustering based classification for out-of-domain documents,” in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD, 2007.
    Google ScholarLocate open access versionFindings
  • X. Ling, W. Dai, G.-R. Xue, Q. Yang, and Y. Yu, “Spectral domain-transfer learning,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD, 2008.
    Google ScholarLocate open access versionFindings
  • C. Wang and S. Mahadevan, “Heterogeneous domain adaptation using manifold alignment,” in Proceedings of the 25th AAAI Conference on Artificial Intelligence, ser. AAAI, 2011.
    Google ScholarLocate open access versionFindings
  • X. Shi, Q. Liu, W. Fan, and P. S. Yu, “Transfer across completely different feature spaces via spectral embedding,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 4, 2012.
    Google ScholarLocate open access versionFindings
  • B. Quanz, J. Huan, and M. Mishra, “Knowledge transfer with low-quality data: A feature extraction issue,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, 2012.
    Google ScholarLocate open access versionFindings
  • B. Chen, W. Lam, I. Tsang, and T.-L. Wong, “Extracting discriminative concepts for domain adaptation in text mining,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD, 2009.
    Google ScholarLocate open access versionFindings
  • H. Daume III, A. Kumar, and A. Saha, “Co-regularization based semi-supervised domain adaptation,” in Advances in Neural Information Processing Systems, ser. NIPS, 2010.
    Google ScholarLocate open access versionFindings
  • A. Argyriou and T. Evgeniou, “Multi-task feature learning,” in Neural Information Processing Systems, ser. NIPS, 2006.
    Google ScholarLocate open access versionFindings
  • Q. Liu, X. Liao, and L. Carin, “Semi-supervised multitask learning,” in Advances in Neural Information Processing Systems, ser. NIPS, 2007.
    Google ScholarLocate open access versionFindings
  • V. Vapnik, Statistical Learning Theory. John Wiley, 1998.
    Google ScholarFindings
  • B. Scholkopf, R. Herbrich, and A. J. Smola, “A generalized representer theorem,” in Proceedings of the 14th Annual Conference on Computational Learning Theory, ser. COLT, 2001.
    Google ScholarLocate open access versionFindings
  • C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, 2011, software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm.
    Locate open access versionFindings
  • M. Long, J. Wang, G. Ding, W. Cheng, X. Zhang, and W. Wang, “Dual transfer learning,” in Proceedings of the 12th SIAM International Conference on Data Mining, ser. SDM, 2012.
    Google ScholarLocate open access versionFindings
  • S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” in Advances in Neural Information Processing Systems, ser. NIPS, 2006.
    Google ScholarLocate open access versionFindings
  • R. Johnson and T. Zhang, “Graph-based semi-supervised learning and spectral kernel design,” IEEE Transactions on Information Theory, 2008.
    Google ScholarLocate open access versionFindings
  • J. Gao, W. Fan, J. Jiang, and J. Han, “Knowledge transfer via multiple model local structure mapping,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD, 2008.
    Google ScholarLocate open access versionFindings
  • D. Cai, X. He, J. Han, and T. S. Huang, “Graph regularized nonnegative matrix factorization for data representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, 2011.
    Google ScholarLocate open access versionFindings
  • D. Cai, X. He, and J. Han, “Spectral regression: A unified approach for sparse subspace learning,” in Proceedings of the IEEE International Conference on Data Mining, ser. ICDM, 2007. Jianmin Wang graduated from Peking University, China, in 1990, and received his M.E. and Ph.D. in computer software from Tsinghua University, China, in 1992 and 1995, respectively. He is now a professor at the School of Software, Tsinghua University. His research interests include unstructured data management, workflow and BPM technology, benchmark for database system, software watermarking, and mobile digital right management. He has published over 100 DBLP indexed papers in major journals (TKDE, DMKD, DKE, WWWJ, etc) and conferences (SIGMOD, VLDB, ICDE, CVPR, AAAI, etc). He led to develop a product data/lifecycle management system, which has been implemented in hundreds of enterprizes in China. He leads to develop an unstructured data management system named LaUDMS.
    Google ScholarLocate open access versionFindings
  • Mingsheng Long received the BS degree in 2008, from the Department of Electrical Engineering, Tsinghua University, China. He is a PhD candidate in the Department of Computer Science and Technology, Tsinghua University. His research interests are transfer learning, feature learning, large-scale data mining, and unstructured data management.
    Google ScholarLocate open access versionFindings
  • Philip S. Yu received his Ph.D. degree in E.E. from Stanford University. He is a Distinguished Professor in Computer Science at the University of Illinois at Chicago and holds the Wexler Chair in Information Technology. Dr. Yu is a Fellow of the ACM and the IEEE. He is the Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data. He was the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (20012004). He received a Research Contributions Award from IEEE International Conference on Data Mining (2003).
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
avatar
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn