Information-Theoretic Generalization Bounds For Sgld Via Data-Dependent Estimates

Jeffrey Negrea,Mahdi Haghifam,Gintare Karolina Dziugaite,Ashish Khisti,Daniel M. Roy

NeurIPS（2019）

点击这里查看nips2019的所有论文

引用 29|浏览46

暂无评分

摘要

In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019). Our main contributions are significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via data-dependent estimates. Our approach is based on the variational characterization of mutual information and the use of data-dependent priors that forecast the mini-batch gradient based on a subset of the training samples. Our approach is broadly applicable within the information-theoretic framework of Russo and Zou (2015) and Xu and Raginsky (2017). Our bound can be tied to a measure of flatness of the empirical risk surface. As compared with other bounds that depend on the squared norms of gradients, empirical investigations show that the terms in our bounds are orders of magnitude smaller.

查看译文

关键词

stochastic gradient langevin dynamics

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要