Chrome Extension
WeChat Mini Program
Use on ChatGLM

Sharpness-Aware Minimization and the Edge of Stability

JOURNAL OF MACHINE LEARNING RESEARCH(2024)

Cited 0|Views66
No score
Abstract
Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size n , the operator norm of the Hessian of the loss grows until it approximately reaches 2 /eta , after which it fluctuates around this value. The quantity 2 /eta has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness -Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM -edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.
More
Translated text
Key words
Sharpness-aware minimization,edge of stability,optimization,deep learning,wide minima
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined