Partial Differential Equations For Training Deep Neural Networks
2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS(2017)
摘要
This paper establishes a connection between non-convex optimization and nonlinear partial differential equations (PDEs). We interpret empirically successful relaxation techniques motivated from statistical physics for training deep neural networks as solutions of a viscous Hamilton-Jacobi (HJ) PDE. The underlying stochastic control interpretation allows us to prove that these techniques perform better than stochastic gradient descent. Our analysis provides insight into the geometry of the energy landscape and suggests new algorithms based on the non-viscous Hamilton-Jacobi PDE that can effectively tackle the high dimensionality of modern neural networks.
更多查看译文
关键词
training deep neural networks,viscous Hamilton-Jacobi PDE,stochastic gradient descent,nonviscous Hamilton-Jacobi PDE,modern neural networks,nonconvex optimization,nonlinear partial differential equations,relaxation techniques,statistical physics,stochastic control interpretation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络