Automatic Differentiation Of Sketched Regression

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108(2020)

引用 2|浏览82
暂无评分
摘要
Sketching for speeding up regression problems involves using a sketching matrix S to quickly find the approximate solution to a linear least squares regression (LLS) problem: given A of size n x d, with n >> d, along with b of size n x 1, we seek a vector y with minimal regression error MAy b112. This approximation technique is now standard in data science, and many software systems use sketched regression internally, as a component. It is often useful to calculate derivatives (gradients for the purpose of optimization, for example) of such large systems, where sketched LLS is merely a component of a larger system whose derivatives are needed. To support Automatic Differentiation (AD) of systems containing sketched LLS, we consider propagating derivatives through LLS: both propagating perturbations (forward AD) and gradients (reverse AD). AD performs accurate differentiation and is efficient for problems with a huge number of independent variables. Since we use LLSs (sketched LLS) instead of LLS for reasons of efficiency, propagation of derivatives also needs to trade accuracy for efficiency, presumably by sketching. There are two approaches for this: (a) use AD to transform the code that defines LLSs, or (b) approximate exact derivative propagation through LLS using sketching methods. We provide strong bounds on the errors produced due to these two natural forms of sketching in the context of AD, giving the first dimensionality reduction analysis for calculating the derivatives of a sketched computation. Our results crucially depend on a novel analysis of the operator norm of a sketched inverse matrix product in this context. Extensive experiments on both synthetic and real-world experiments demonstrate the efficacy of our sketched gradients.
更多
查看译文
关键词
automatic differentiation,regression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要