A Generic Improvement to Deep Residual Networks Based on Gradient Flow.

IEEE Transactions on Neural Networks and Learning Systems(2020)

引用 8|浏览18
暂无评分
摘要
Preactivation ResNets consistently outperforms the original postactivation ResNets on the CIFAR10/100 classification benchmark. However, these results surprisingly do not carry over to the standard ImageNet benchmark. First, we theoretically analyze this incongruity in terms of how the two variants differ in handling the propagation of gradients. Although identity shortcuts are critical in both variants for improving optimization and performance, we show that postactivation variants enable early layers to receive a diverse dynamic composition of gradients from effectively deeper paths in comparison to preactivation variants, enabling the network to make maximal use of its representational capacity. Second, we show that downsampling projections (while only a few in number) have a significantly detrimental effect on performance. We show that by simply replacing downsampling projections with identitylike dense-reshape shortcuts, the classification results of standard residual architectures such as ResNets, ResNeXts, and SE-Nets improve by up to 1.2% on ImageNet, without any increase in computational complexity (FLOPs).
更多
查看译文
关键词
Residual neural networks,Computer architecture,Optimization,Convergence,Training,Standards,Road transportation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要