Residual Alignment: Uncovering the Mechanisms of Residual Networks
NeurIPS 2023(2024)
摘要
The ResNet architecture has been widely adopted in deep learning due to its
significant boost to performance through the use of simple skip connections,
yet the underlying mechanisms leading to its success remain largely unknown. In
this paper, we conduct a thorough empirical study of the ResNet architecture in
classification tasks by linearizing its constituent residual blocks using
Residual Jacobians and measuring their singular value decompositions. Our
measurements reveal a process called Residual Alignment (RA) characterized by
four properties:
(RA1) intermediate representations of a given input are equispaced on a line,
embedded in high dimensional space, as observed by Gai and Zhang [2021];
(RA2) top left and right singular vectors of Residual Jacobians align with
each other and across different depths;
(RA3) Residual Jacobians are at most rank C for fully-connected ResNets,
where C is the number of classes; and
(RA4) top singular values of Residual Jacobians scale inversely with depth.
RA consistently occurs in models that generalize well, in both
fully-connected and convolutional architectures, across various depths and
widths, for varying numbers of classes, on all tested benchmark datasets, but
ceases to occur once the skip connections are removed. It also provably occurs
in a novel mathematical model we propose. This phenomenon reveals a strong
alignment between residual branches of a ResNet (RA2+4), imparting a highly
rigid geometric structure to the intermediate representations as they progress
linearly through the network (RA1) up to the final layer, where they undergo
Neural Collapse.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要