Impossibility Theorems for Domain Adaptation

AISTATS(2010)

引用 331|浏览83
暂无评分
摘要
The domain adaptation problem in machine learning occurs when the future test data generating distribution differs from the one that generates the training data. It is clear that the success of such learning depends on similarities between the two data distributions. We study assumptions about the relationship between the two distributions that one needs to postulate so that domain adaptation learning can succeed. We analyze the assumptions in an agnostic PAC-style learning model where both labeled training data and unlabeled testing data are available to the learner. We focus on three assumptions: (i) small distance between the unlabeled distributions, (ii) existence of a classifier in the hypothesis class with low error on both training and testing distributions, and (iii) the so called covariate shift assumption i.e. the assumption that each data point is labeled the same in both distributions. We show that without either assumption (i) or (ii), the combination of the remaining assumptions is not sufficient to guarantee successful learning. Our negative results hold with respect to any domain adaptation learning algo- rithm, as long as it does not have access to target labeled examples. An interesting consequence of our analysis is that the popular covariate shift assumption is rather weak and does not relieve the necessity of the other assumptions.
更多
查看译文
关键词
adaptive learning,test data generation,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要