Chrome Extension
WeChat Mini Program
Use on ChatGLM

High-dimensional SGD Aligns with Emerging Outlier Eigenspaces

Computing Research Repository (CoRR)(2024)

Full Professor | Assistant Professor

Cited 1|Views14
Abstract
We rigorously study the joint evolution of training dynamics via stochastic gradient descent (SGD) and the spectra of empirical Hessian and gradient matrices. We prove that in two canonical classification tasks for multi-class high-dimensional mixtures and either 1 or 2-layer neural networks, the SGD trajectory rapidly aligns with emerging low-rank outlier eigenspaces of the Hessian and gradient matrices. Moreover, in multi-layer settings this alignment occurs per layer, with the final layer's outlier eigenspace evolving over the course of training, and exhibiting rank deficiency when the SGD converges to sub-optimal classifiers. This establishes some of the rich predictions that have arisen from extensive numerical studies in the last decade about the spectra of Hessian and information matrices over the course of training in overparametrized networks.
More
Translated text
Key words
stochastic gradient descent,Hessian,multi-layer neural networks,high-dimensional classification,Gaussian mixture model,XOR problem
PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文研究了在两类典型的多类高维混合分类任务中,随机梯度下降(SGD)训练动态与经验Hessian和梯度矩阵的谱的联合演化过程,发现SGD轨迹能够快速对准Hessian和梯度矩阵出现的低秩异常特征空间,为理解过参数化网络训练过程中的谱特性提供了理论依据。

方法】:通过严格的数学证明,分析了SGD在训练过程中的轨迹如何与Hessian和梯度矩阵的异常特征空间对准。

实验】:未具体描述实验细节,但根据论文内容可知,作者通过理论分析和数学证明,研究了在多类高维混合分类任务以及一层或两层神经网络中的SGD训练动态,并在多层的设置中观察到每层都会发生对准现象,特别是最后一层的异常特征空间在训练过程中演变,并在SGD收敛到次优分类器时表现出秩亏缺。论文未提及使用的数据集名称。