Reverse iterative volume sampling for linear regression

JOURNAL OF MACHINE LEARNING RESEARCH(2018)

引用 47|浏览52
暂无评分
摘要
We study the following basic machine learning task: Given a fixed set of d-dimensional input points for a linear regression problem, we wish to predict a hidden response value for each of the points. We can only afford to attain the responses for a small subset of the points that are then used to construct linear predictions for all points in the dataset. The performance of the predictions is evaluated by the total square loss on all responses (the attained as well as the hidden ones). We show that a good approximate solution to this least squares problem can be obtained from just dimension d many responses by using a joint sampling technique called volume sampling. Moreover, the least squares solution obtained for the volume sampled subproblem is an unbiased estimator of optimal solution based on all n responses. This unbiasedness is a desirable property that is not shared by other common subset selection techniques. Motivated by these basic properties, we develop a theoretical framework for studying volume sampling, resulting in a number of new matrix expectation equalities and statistical guarantees which are of importance not only to least squares regression but also to numerical linear algebra in general. Our methods also lead to a regularized variant of volume sampling, and we propose the first efficient algorithms for volume sampling which make this technique a practical tool in the machine learning toolbox. Finally, we provide experimental evidence which confirms our theoretical findings.
更多
查看译文
关键词
volume sampling,linear regression,row sampling,active learning,optimal design
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要