SGD with Partial Hessian for Deep Neural Networks Optimization
arxiv(2024)
摘要
Due to the effectiveness of second-order algorithms in solving classical
optimization problems, designing second-order optimizers to train deep neural
networks (DNNs) has attracted much research interest in recent years. However,
because of the very high dimension of intermediate features in DNNs, it is
difficult to directly compute and store the Hessian matrix for network
optimization. Most of the previous second-order methods approximate the Hessian
information imprecisely, resulting in unstable performance. In this work, we
propose a compound optimizer, which is a combination of a second-order
optimizer with a precise partial Hessian matrix for updating channel-wise
parameters and the first-order stochastic gradient descent (SGD) optimizer for
updating the other parameters. We show that the associated Hessian matrices of
channel-wise parameters are diagonal and can be extracted directly and
precisely from Hessian-free methods. The proposed method, namely SGD with
Partial Hessian (SGD-PH), inherits the advantages of both first-order and
second-order optimizers. Compared with first-order optimizers, it adopts a
certain amount of information from the Hessian matrix to assist optimization,
while compared with the existing second-order optimizers, it keeps the good
generalization performance of first-order optimizers. Experiments on image
classification tasks demonstrate the effectiveness of our proposed optimizer
SGD-PH. The code is publicly available at
.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要