AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Our experiments demonstrate that learning the activation functions in this way can lead to significant performance improvements in deep neural networks without significantly increasing the number of parameters

Learning Activation Functions to Improve Deep Neural Networks.

International Conference on Learning Representations, (2014)

Cited by: 398|Views100
EI
Full Text
Bibtex
Weibo

Abstract

Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architecture...More

Code:

Data:

0
Introduction
  • Deep learning with artificial neural networks has enabled rapid progress on applications in engineering (e.g., Krizhevsky et al, 2012; Hannun et al, 2014) and basic science (e.g., Di Lena et al, 2012; Lusci et al, 2013; Baldi et al, 2014).
  • The rectified linear activation function (Jarrett et al, 2009; Glorot et al, 2011), which does not saturate like sigmoidal functions, has made it easier to quickly train deep neural networks by alleviating the difficulties of weight-initialization and vanishing gradients
  • Another recent innovation is the “maxout” activation function, which has achieved state-of-the-art performance on multiple machine learning benchmarks (Goodfellow et al, 2013).
  • While the type of activation function can have a significant impact on learning, the space of possible functions has hardly been explored
Highlights
  • Deep learning with artificial neural networks has enabled rapid progress on applications in engineering (e.g., Krizhevsky et al, 2012; Hannun et al, 2014) and basic science (e.g., Di Lena et al, 2012; Lusci et al, 2013; Baldi et al, 2014)
  • The rectified linear activation function (Jarrett et al, 2009; Glorot et al, 2011), which does not saturate like sigmoidal functions, has made it easier to quickly train deep neural networks by alleviating the difficulties of weight-initialization and vanishing gradients
  • We have introduced a novel neural network activation function in which each neuron computes an independent, piecewise linear function
  • The parameters of each neuron-specific activation function are learned via gradient descent along with the network’s weight parameters
  • Our experiments demonstrate that learning the activation functions in this way can lead to significant performance improvements in deep neural networks without significantly increasing the number of parameters
  • The networks learn a diverse set of activation functions, suggesting that the standard one-activation-function-fits-all approach may be suboptimal
Methods
  • CNN + ReLU (Srivastava et al, 2014).
  • CNN (Ours) + ReLU 12.56 (0.26)%.
  • CNN (Ours) + Leaky ReLU 11.86 (0.04)%.
  • CNN (Ours) + APL units.
  • 3.3 EFFECTS OF APL UNIT HYPERPARAMETERS.
  • 3.4 VISUALIZATION AND ANALYSIS OF ADAPTIVE PIECEWISE LINEAR FUNCTIONS.
  • The diversity of adaptive piecewise linear functions was visualized by plotting hi(x) for sample neurons.
  • Figures 2 and 3 show adaptive piecewise linear functions for the CIFAR-100 and Higgs→ τ +τ − experiments, along with the random initialization of that function
Results
  • Table 1 shows that adding the APL units improved the baseline by over 1% in the case of CIFAR-10 and by almost 3% in the case of CIFAR-100.
  • The authors have S = 2 for CIFAR-10 and S = 1 for CIFAR-100.
  • The authors see that it improves performance for both datasets.
  • The authors' experiments demonstrate that learning the activation functions in this way can lead to significant performance improvements in deep neural networks without significantly increasing the number of parameters
Conclusion
  • The authors have introduced a novel neural network activation function in which each neuron computes an independent, piecewise linear function.
  • The parameters of each neuron-specific activation function are learned via gradient descent along with the network’s weight parameters.
  • The authors' experiments demonstrate that learning the activation functions in this way can lead to significant performance improvements in deep neural networks without significantly increasing the number of parameters.
  • The networks learn a diverse set of activation functions, suggesting that the standard one-activation-function-fits-all approach may be suboptimal
Tables
  • Table1: Error rates on CIFAR-10 and CIFAR-100 with and without data augmentation. This includes standard convolutional neural networks (CNNs) and the network-in-network (NIN) architecture (Lin et al, 2013). The networks were trained 5 times using different random initializations — we report the mean followed by the standard deviation in parenthesis. The best results are in bold
  • Table2: Performance on the Higgs boson decay dataset in terms of both AUC and expected discovery significance. The networks were trained 4 times using different random initializations — we report the mean followed by the standard deviation in parenthesis. The best results are in bold
  • Table3: Classification accuracy on CIFAR-10 for varying values of S. Shown are the mean and standard deviation over 5 trials
Download tables as Excel
Funding
  • Agostinelli was supported by the GEM Fellowship
  • We also wish to acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research, NSF grant IIS-0513376, and a Google Faculty Research award to P
Reference
  • Baldi, P, Sadowski, P, and Whiteson, D. Searching for exotic particles in high-energy physics with deep learning. Nature Communications, 5, 2014.
    Google ScholarLocate open access versionFindings
  • Baldi, Pierre, Sadowski, Peter, and Whiteson, Daniel. Enhanced higgs to τ +τ − searches with deep learning. Physics Review Letters, 2015. In press.
    Google ScholarLocate open access versionFindings
  • Cho, Youngmin and Saul, Lawrence. Large margin classification in infinite neural networks. Neural Computation, 22(10), 2010. 7 (a) CIFAR-10 Activation Functions.
    Google ScholarLocate open access versionFindings
Author
Matthew Hoffman
Matthew Hoffman
Pierre Baldi
Pierre Baldi
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科