# Knowledge transfer via multiple model local structure mapping

KDD, pp.283-291, (2008)

EI

关键词

摘要

The effectiveness of knowledge transfer using classification algorithms depends on the difference between the distribution that generates the training examples and the one from which test examples are to be drawn. The task can be especially difficult when the training examples are from one or several domains different from the test domain...更多

代码：

数据：

简介

- The authors are interested in transfer learning scenarios where the authors learn from one or several training domains and make predictions in a different but related test domain.
- The authors may wish to combine the knowledge from these base models rather than using any single model alone to more effectively transfer the useful knowledge to the new domain
- For this task, one would naturally consider model averaging that additively combines the predictions of multiple models.
- The existing model averaging methods in traditional supervised learning usually assign global weights to models, which are either uniform, or proportional to the training accuracy, or fixed by favoring certain model
- Such a global weighting scheme may not perform well in transfer learning because different test examples may favor predictions from different base models.
- The focus of this paper is to find an approximation to this optimal local weight assignment for each test example

重点内容

- We are interested in transfer learning scenarios where we learn from one or several training domains and make predictions in a different but related test domain
- We propose a graph-based approach to approximate the optimal model weights where the local weight for a base model is computed by first mapping and measuring the similarity between the model and the test domain’s local structure around the test example
- Our experiment results show that the locally weighted ensemble framework significantly improved the performance over a number of baseline methods on all three data sets, which shows the effectiveness of the proposed framework for transfer learning
- We propose a locally weighted ensemble framework to transfer the combined knowledge to a new domain that is different from all the training domains
- Based on the “clustering” assumption that the local structure of the test set is related to P (y|x), we design an effective weighting scheme to approximate the optimal model weights
- In most of the experiments, the improvement in accuracy after utilizing weighted ensemble is over 10% and up to 30% for some problems
- These results indicate that: 1) the locally weighted ensemble could successfully identify the knowledge from each model that is useful to predict in the test domain and transfer such information from all available base models; and 2) the proposed graph-based weight estimation method makes the framework practical by effectively approximating the optimal model weights

方法

- The authors compare the weighted ensemble framework with different learning algorithms. In particular, since most data sets are high-dimensional, the following commonly used algorithms are appropriate choices: 1) Winnow (WNN) from learning package SNoW [6], 2) Logistic Regression (LR) implemented in BBR package [16]; and 3) Support Vector Machines (SVM) implemented in LibSVM [8].
- To demonstrate the effectiveness of both steps, the authors include the following three methods in the comparison: 1) A simple model averaging framework (SMA) where all model predictions are combined using uniform weights; 2) The locally weighted ensemble framework without the adjustment step, which adopts the weighted prediction for each test example.

结果

- The improvement in accuracy is over 10% and up to 30% across different problems. The authors' experiment results show that the locally weighted ensemble framework significantly improved the performance over a number of baseline methods on all three data sets, which shows the effectiveness of the proposed framework for transfer learning.
- In most of the experiments, the improvement in accuracy after utilizing weighted ensemble is over 10% and up to 30% for some problems.
- The authors note that the worst single model’s accuracy is around 56% and the simple averaging method even degrades to having 54% accuracy.
- Based on such weak classifiers, the authors could still improve the accuracy to 80%

结论

- Knowledge transfer across domains with different distributions is an important problem in data mining that has not been fully investigated.
- The experimental results on four real transfer learning data sets show that the proposed method improves over each base model 10% to 30% in accuracy and is more accurate than both semi-supervised learning and simple model averaging models
- These results indicate that: 1) the locally weighted ensemble could successfully identify the knowledge from each model that is useful to predict in the test domain and transfer such information from all available base models; and 2) the proposed graph-based weight estimation method makes the framework practical by effectively approximating the optimal model weights.
- The authors plan to compare LWE with existing single-model based transfer learning algorithms, as well as to explore effective methods to set parameter values

- Table1: Data Sets Description
- Table2: Performance Comparison on a Series of Data Sets
- Table3: Performance Comparison on Intrusion Detection Data Set

相关工作

- The problem with different training and test distributions started gaining much attention very recently. When it is assumed that the two distributions differ only in P (x) but not in P (y|x), the problem is referred to as covariate shift [25, 18] or sample selection bias [14]. The instance weighting approaches [25, 18, 5] try to re-weight each training example with

Ptest (x) Ptrain (x) and maximize the re-weighted log likelihood.

Another line of work tries to change the representation of the observation x hoping that the distributions of the training and the test examples will become very similar after the transformation [3, 24]. [22] transforms the model learned from the training examples into a Bayesian prior to be applied to the learning process on the test domain. The major difference between our work and these studies is that they depend on a single source of information and try to learn a global single model that adapts well to the test set.

Constructing a good ensemble of classifiers has been an active research area in supervised learning [12]. By combining decisions from individual classifiers, ensembles can usually reduce variance and achieve higher accuracy than individual classifiers. Such methods include Bayesian averaging [17], bagging, boosting and many variants of ensemble approaches [2, 27, 13, 15]. Some ensemble methods assign weights locally [1, 19], but such weights are determined based on training data only. There has not been much work on ensemble methods to address the transfer learning problem. In [11, 26], it is assumed that the training and the test examples are generated from a mixture of different models, and the test distribution has different mixture coefficients than the training distribution. In [23], a

基金

- ∗The work was supported in part by the U.S National Science Foundation grants IIS-05-13678/06-42771 and BDI-0515813

引用论文

- C. G. Atkeson, A. W. Moore, and S. Schaal. Locally weighted learning. Artificial Intelligence Review, 11(1-5):11–73, 1997.
- E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36:105–139, 2004.
- S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Proc. of NIPS’ 07, pages 137–144. 2007.
- P. N. Bennett, S. T. Dumais, and E. Horvitz. The combination of text classifiers using reliability indicators. Information Retrieval, 8(1):67–100, 2005.
- S. Bickel, M. Bruckner, and T. Scheffer. Discriminative learning for differing training and test distributions. In Proc. of ICML’ 07, pages 81–88, 2007.
- A. J. Carlson, C. M. Cumby, J. L. R. Nicholas D. Rizzolo, and D. Roth. Snow learning architecture. http://l2r.cs.uiuc.edu/~cogcomp/asoftware.php?skey =SNOW#projects.
- R. Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.
- C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
- W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Co-clustering based classification for out-of-domain documents. In Proc. of KDD’ 07, pages 210–219, 2007.
- W. Dai, Q. Yang, G.-R. Xue, and Y. Yu. Boosting for transfer learning. In Proc. of ICML’ 07, pages 193–200.
- H. Daume III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101–126, 2006.
- T. Dietterich. Ensemble methods in machine learning. In Proc. of MCS ’00, pages 1–15, 2000.
- W. Fan. Systematic data selection to mine concept-drifting data streams. In Proc. KDD’ 04, pages 128–137, 2004.
- W. Fan and I. Davidson. On sample selection bias and its efficient correction via model averaging and unlabeled examples. In Proc. of SDM’07.
- J. Gao, W. Fan, and J. Han. On appropriate assumptions to mine data streams: Analysis and practice. In Proc. ICDM’ 07, pages 143–152, 2007.
- A. Genkin, D. D. Lewis, and D. Madigan. Bbr: Bayesian logistic regression software. http://stat.rutgers.edu/~madigan/BBR/.
- J. Hoeting, D. Madigan, A. Raftery, and C. Volinsky. Bayesian model averaging: a tutorial. Statist. Sci., 14:382–417, 1999.
- J. Huang, A. J. Smola, A. Gretton, K. M. Borgwardt, and B. Scholkopf. Correcting sample selection bias by unlabeled data. In Proc. of NIPS’ 06, pages 601–608. 2007.
- R. Jacobs, M. Jordan, S. Nowlan, and G. Hinton. Adaptive mixtures of local experts. Neural Computation, 3(1):79–87, 1991.
- T. Joachims. Making large-scale svm learning practical. advances in kernel methods - support vector learning. MIT-Press, 1999.
- G. Karypis. Cluto - family of data clustering software tools. http://glaros.dtc.umn.edu/gkhome/views/cluto.
- X. Li and J. Bilmes. A Bayesian divergence prior for classifier adaptation. In Proc. of AISTATS’ 07, 2007.
- D. M. Roy and L. P. Kaelbling. Efficient bayesian task-level transfer learning. In Proc. of IJCAI ’07.
- S. Satpal and S. Sarawagi. Domain adaptation of conditional probability models via feature subsetting. In Proc. of ECML/PKDD’ 07, pages 224–235, 2007.
- H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227–244, 2000.
- A. Storkey and M. Sugiyama. Mixture regression for covariate shift. In Proc. of NIPS’ 06, pages 1337–1344.
- H. Wang, W. Fan, P. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. of KDD’03, pages 226–235, 2003.
- X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison, 2005.

标签

评论

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn