A Sharper Generalization Bound for Divide-and-Conquer Ridge Regression

AAAI（2019）

引用 26|浏览26

暂无评分

摘要

We study the distributed machine learning problem where the n feature-response pairs are partitioned among m machines uniformly at random. The goal is to approximately solve an empirical risk minimization (ERM) problem with the minimum amount of communication. The divide-and-conquer (DC) method, which was proposed several years ago, lets every worker machine independently solve the same ERM problem using its local feature-response pairs and the driver machine combine the solutions. This approach is in one-shot and thereby extremely communication-efficient. Although the DC method has been studied by many prior works, reasonable generalization bound has not been established before this work. For the ridge regression problem, we show that the prediction error of the DC method on unseen test samples is at most epsilon times larger than the optimal. There have been constant-factor bounds in the prior works, their sample complexities have a quadratic dependence on d, which does not match the setting of most real-world problems. In contrast, our bounds are much stronger. First, our 1 + epsilon error bound is much better than their constant-factor bounds. Second, our sample complexity is merely linear with d.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要