Winnowing with Gradient Descent

COLT(2020)

引用 13|浏览99
暂无评分
摘要
The performance of multiplicative updates is typically logarithmic in the number of features when the targets are sparse. Strikingly, we show that the same property can also be achieved with gradient descent updates. We obtain this result by rewriting the non-negative weights $w_i$ of multiplicative updates by $u_i^2$ and then performing a gradient descent step w.r.t. the new $u_i$ parameters. We apply this method to the Winnow update, the Hedge update, and the unnormalized and normalized exponentiated gradient (EG) updates for linear regression. When the original weights $w_i$ are scaled to sum to one (as done for Hedge and normalized EG), then in the corresponding reparameterized update, the $u_i$ parameters are now divided by $\Vert\mathbf{u}\Vert_2$ after the gradient descent step. We show that these reparameterizations closely track the original multiplicative updates by proving in each case the same online regret bounds (albeit in some cases, with slightly different constants). As a side, our work exhibits a simple two-layer linear neural network that, when trained with gradient descent, can experimentally solve a certain sparse linear problem (known as the Hadamard problem) with exponentially fewer examples than any kernel method.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要