Kernel Distillation for Gaussian Processes.

arXiv: Machine Learning(2018)

引用 23|浏览27
暂无评分
摘要
Gaussian processes (GPs) are flexible models that can capture complex structure in large-scale dataset due to their non-parametric nature. However, the usage of GPs in real-world application is limited due to their high computational cost at inference time. In this paper, we introduce a new framework, textit{kernel distillation}, for kernel matrix approximation. The idea adopts from knowledge distillation in deep learning community, where we approximate a fully trained teacher kernel matrix of size $ntimes n$ with a student kernel matrix. We combine inducing points method with sparse low-rank approximation in the distillation procedure. The distilled student kernel matrix only cost $mathcal{O}(m^2)$ storage where $m$ is the number of inducing points and $m ll n$. We also show that one application of kernel distillation is for fast GP prediction, where we demonstrate empirically that our approximation provide better balance between the prediction time and the predictive performance compared to the alternatives.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要