A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network
arxiv(2024)
摘要
In this paper, we propose a structure-guided Gauss-Newton (SgGN) method for
solving least squares problems using a shallow ReLU neural network. The method
effectively takes advantage of both the least squares structure and the neural
network structure of the objective function. By categorizing the weights and
biases of the hidden and output layers of the network as nonlinear and linear
parameters, respectively, the method iterates back and forth between the
nonlinear and linear parameters. The nonlinear parameters are updated by a
damped Gauss-Newton method and the linear ones are updated by a linear solver.
Moreover, at the Gauss-Newton step, a special form of the Gauss-Newton matrix
is derived for the shallow ReLU neural network and is used for efficient
iterations. It is shown that the corresponding mass and Gauss-Newton matrices
in the respective linear and nonlinear steps are symmetric and positive
definite under reasonable assumptions. Thus, the SgGN method naturally produces
an effective search direction without the need of additional techniques like
shifting in the Levenberg-Marquardt method to achieve invertibility of the
Gauss-Newton matrix. The convergence and accuracy of the method are
demonstrated numerically for several challenging function approximation
problems, especially those with discontinuities or sharp transition layers that
pose significant challenges for commonly used training algorithms in machine
learning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要