Distributed Stochastic Gradient Descent and Convergence to Local Minima

arxiv(2020)

引用 7|浏览38
暂无评分
摘要
In centralized settings, it is well known that stochastic gradient descent (SGD) avoids saddle points. However, similar guarantees are lacking for distributed first-order algorithms in nonconvex optimization.The paper studies distributed stochastic gradient descent (D-SGD)--a simple network-based implementation of SGD. Conditions under which D-SGD converges to local minima are studied. In particular, it is shown that, for each fixed initialization, with probability 1 we have that: (i) D-SGD converges to critical points of the objective and (ii) D-SGD avoids nondegenerate saddle points. To prove these results, we use ODE-based stochastic approximation techniques. The algorithm is approximated using a continuous-time ODE which is easier to study than the (discrete-time) algorithm. Results are first derived for the continuous-time process and then extended to the discrete-time algorithm. Consequently, the paper studies continuous-time distributed gradient descent (DGD) alongside D-SGD. Because the continuous-time process is easier to study, this approach allows for simplified proof techniques and builds important intuition that is obfuscated when studying the discrete-time process alone.
更多
查看译文
关键词
stochastic gradient descent,local minima
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要