AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Our experiments with fractal networks provide strong evidence that path length is fundamental for training ultra-deep neural networks; residuals are incidental

FractalNet: Ultra-Deep Neural Networks without Residuals.

international conference on learning representations, (2017)

Cited by: 712|Views225
EI
Full Text
Bibtex
Weibo

Abstract

We introduce a design strategy for neural network macro-architecture based on self-similarity. Repeated application of a simple expansion rule generates deep networks whose structural layouts are precisely truncated fractals. These networks contain interacting subpaths of different lengths, but do not include any pass-through or residua...More

Code:

Data:

0
Introduction
  • Residual networks (He et al, 2016a), or ResNets, lead a recent and dramatic increase in both depth and accuracy of convolutional neural networks, facilitated by constraining the network to learn residuals.
  • ResNet variants (He et al, 2016a;b; Huang et al, 2016b) and related architectures (Srivastava et al, 2015) employ the common technique of initializing and anchoring, via a pass-through channel, a network to the identity function.
  • The objective changes to learning residual outputs, rather than unreferenced absolute mappings.
  • These networks exhibit a type of deep supervision (Lee et al, 2014), as near-identity layers effectively reduce distance to the loss.
  • These networks exhibit a type of deep supervision (Lee et al, 2014), as near-identity layers effectively reduce distance to the loss. He et al (2016a) speculate that the former, the residual formulation itself, is crucial
Highlights
  • Residual networks (He et al, 2016a), or ResNets, lead a recent and dramatic increase in both depth and accuracy of convolutional neural networks, facilitated by constraining the network to learn residuals
  • The objective changes to learning residual outputs, rather than unreferenced absolute mappings. These networks exhibit a type of deep supervision (Lee et al, 2014), as near-identity layers effectively reduce distance to the loss
  • We develop drop-path, a novel regularization protocol for ultradeep fractal networks
  • Since fractal networks contain additional macro-scale structure, we propose to complement these techniques with an analogous coarse-scale regularization scheme
  • Our experiments with fractal networks provide strong evidence that path length is fundamental for training ultra-deep neural networks; residuals are incidental
  • Our analysis connects the internal behavior of fractal networks with phenomena engineered into other networks
Methods
Results
  • Table 1 compares performance of FractalNet on CIFAR and SVHN with competing methods.
  • With neither augmentation nor regularization, FractalNet’s performance on CIFAR is superior to both ResNet and ResNet with stochastic depth, suggesting that FractalNet may be less prone to overfitting.
  • Most methods perform on SVHN.
  • Increasing depth to 40, while borrowing some parameter reduction tricks (Iandola et al, 2016), reveals FractalNet’s performance to be consistent across a range of configuration choices
Conclusion
  • The authors' experiments with fractal networks provide strong evidence that path length is fundamental for training ultra-deep neural networks; residuals are incidental.
  • With drop-path, regularization of extremely deep fractal networks is intuitive and effective.
  • The authors' analysis connects the internal behavior of fractal networks with phenomena engineered into other networks.
  • Their substructure resembles hand-crafted modules used as components in prior work.
  • Their training evolution may emulate deep supervision and student-teacher learning
Tables
  • Table1: CIFAR-100/CIFAR-10/SVHN. We compare test error (%) with other leading methods, trained with either no data augmentation, translation/mirroring (+), or more substantial augmentation (++). Our main point of comparison is ResNet. We closely match its benchmark results using data augmentation, and outperform it by large margins without data augmentation. Training with drop-path, we can extract from FractalNet single-column (plain) networks that are highly competitive
  • Table2: ImageNet (validation set, 10-crop)
  • Table3: Ultra-deep fractal networks (CIFAR-100++). Increasing depth greatly improves accuracy until eventual diminishing returns. Contrast with plain networks, which are not trainable if made too deep (Table 4)
  • Table4: Fractal structure as a training apparatus (CIFAR-100++). Plain networks perform well if moderately deep, but exhibit worse convergence during training if instantiated with great depth. However, as a column trained within, and then extracted from, a fractal network with mixed drop-path, we recover a plain network that overcomes such depth limitation (possibly due to a student-teacher effect)
Download tables as Excel
Related work
  • Deepening feed-forward neural networks has generally returned dividends in performance. A striking example within the computer vision community is the improvement on the ImageNet (Deng et al, 2009) classification task when transitioning from AlexNet (Krizhevsky et al, 2012) to VGG (Simonyan & Zisserman, 2015) to GoogLeNet (Szegedy et al, 2015) to ResNet (He et al, 2016a). Unfortunately, greater depth also makes training more challenging, at least when employing a firstorder optimization method with randomly initialized layers. As the network grows deeper and more non-linear, the linear approximation of a gradient step becomes increasingly inappropriate. Desire to overcome these difficulties drives research on both optimization techniques and network architectures.

    On the optimization side, much recent work yields improvements. To prevent vanishing gradients, ReLU activation functions now widely replace sigmoid and tanh units (Nair & Hinton, 2010). This subject remains an area of active inquiry, with various tweaks on ReLUs, e.g. PReLUs (He et al, 2015), and ELUs (Clevert et al, 2016). Even with ReLUs, employing batch normalization (Ioffe & Szegedy, 2015) speeds training by reducing internal covariate shift. Good initialization can also ameliorate this problem (Glorot & Bengio, 2010; Mishkin & Matas, 2016). Path-SGD (Neyshabur et al, 2015) offers an alternative normalization scheme. Progress in optimization is somewhat orthogonal to our architectural focus, with the expectation that advances in either are ripe for combination.
Funding
  • We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research
Reference
  • Jimmy Ba and Rich Caruana. Do deep nets really need to be deep? NIPS, 2014.
    Google ScholarLocate open access versionFindings
  • Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (ELUs). ICLR, 2016.
    Google ScholarFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. AISTATS, 2010.
    Google ScholarLocate open access versionFindings
  • Benjamin Graham. Fractional max-pooling. arXiv:1412.6071, 2014.
    Findings
  • Klaus Greff, Rupesh Kumar Srivastava, and Jürgen Schmidhuber. Highway and residual networks learn unrolled iterative estimation. ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Bharath Hariharan, Pablo Arbelaez, Ross Girshick, and Jitendra Malik. Hypercolumns for object segmentation and fine-grained localization. CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. ICCV, 2015.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CVPR, 2016a.
    Google ScholarFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. ECCV, 2016b.
    Google ScholarLocate open access versionFindings
  • Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012.
    Findings
  • Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. arXiv:1608.06993, 2016a.
    Findings
  • Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Weinberger. Deep networks with stochastic depth. ECCV, 2016b.
    Google ScholarFindings
  • Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and ă1MB model size. arXiv:1602.07360, 2016.
    Findings
  • Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML, 2015.
    Google ScholarLocate open access versionFindings
  • Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093, 2014.
    Findings
  • Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
    Google ScholarFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deep convolutional neural networks. NIPS, 2012.
    Google ScholarLocate open access versionFindings
  • Quoc V Le, Navdeep Jaitly, and Geoffrey E Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941, 2015.
    Findings
  • Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu. Deeply-supervised nets. NIPS Workshop on Deep Learning and Representation Learning, 2014.
    Google ScholarLocate open access versionFindings
  • Chen-Yu Lee, Patrick W Gallagher, and Zhuowen Tu. Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree. AISTATS, 2016.
    Google ScholarLocate open access versionFindings
  • Ming Liang and Xiaolin Hu. Recurrent convolutional neural network for object recognition. CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • Zhibin Liao and Gustavo Carneiro. Competitive multi-scale convolution. arXiv:1511.05635, 2015.
    Findings
  • Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. ICLR, 2013.
    Google ScholarLocate open access versionFindings
  • Michael Maire, Stella X. Yu, and Pietro Perona. Reconstructive sparse code transfer for contour detection and semantic labeling. ACCV, 2014.
    Google ScholarLocate open access versionFindings
  • Dmytro Mishkin and Jiri Matas. All you need is a good init. ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. ICML, 2010.
    Google ScholarLocate open access versionFindings
  • Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural images with unsupervised feature learning. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
    Google ScholarLocate open access versionFindings
  • Behnam Neyshabur, Ruslan Salakhutdinov, and Nathan Srebro. Path-SGD: Path-normalized optimization in deep neural networks. NIPS, 2015.
    Google ScholarLocate open access versionFindings
  • Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md Patwary, Mostofa Ali, Ryan P Adams, et al. Scalable bayesian optimization using deep neural networks. ICML, 2015.
    Google ScholarLocate open access versionFindings
  • Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. ICLR (workshop track), 2014.
    Google ScholarFindings
  • Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. Highway networks. ICML, 2015.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • Sasha Targ, Diogo Almeida, and Kevin Lyman. Resnet in resnet: Generalizing residual architectures. arXiv:1603.08029, 2016.
    Findings
  • Published as a conference paper at ICLR 2017 Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Abdelrahman
    Google ScholarLocate open access versionFindings
  • Mohamed, Matthai Philipose, Matt Richardson, and Rich Caruana. Do deep convolutional nets really need to be deep and convolutional? ICLR, 2017. Andreas Veit, Michael Wilber, and Serge Belongie. Residual networks behave like ensembles of relatively shallow networks. NIPS, 2016. Li Wan, Matthew Zeiler, Sixin Zhang, Yann L Cun, and Rob Fergus. Regularization of neural networks using dropconnect. ICML, 2013. Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. BMVC, 2016.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科