Optimization for Neural Networks: Quest for Theoretical Understandings

user-5f165ac04c775ed682f5819f(2020)

引用 0|浏览25
暂无评分
摘要
Optimization is the key component of deep learning. Increasing depth, which is vital for reaching a good performance for deep neural networks, is realized only by recent advanced optimization techniques, including batch-normalization. Despite substantial empirical bene ts of these techniques, the inner mechanism of them is not theoretically understood. This thesis contributes in the understanding of optimization for neural networks. Particularly, we establish the following 4 contributions: i. We study batch-normalization (BN), which is a breakthrough in deep learning. The understanding of BN can provide us insights on optimization of deep neural networks, since is it speci cally developed for deep nets. Leveraging tools from Markov chain Theory and Ergodic Theory, we will prove BN avoids rank collapse in deep neural networks. Through a set of extensive experiments, we highlight the important role of the rank in optimization of deep nets. ii. We will show how stochastic gradient descent (SGD) can gain from its stochastic approximation of the gradient in the context of neural networks. Although the noise of SGD slows optimization in convex optimization, this noise is advantageous in training neural networks as it facilitates escaping saddles. iii. We introduce the framework of local saddle point optimization for neural networks, and underline the important barrier for gradient-based saddle-point methods: the existence of undesired saddle points that are stable attractors of gradient dynamics. We will show that how the exploitation of minimal second-order information allows gradient-based method to escape these undesired saddles. iv. We highlight …
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要