Distributed Stochastic Gradient Descent and Convergence to Local Minima

Swenson Brian,Murray Ryan,Kar Soummya,Poor H. Vincent

arxiv（2020）

引用 7|浏览38

暂无评分

摘要

In centralized settings, it is well known that stochastic gradient descent (SGD) avoids saddle points. However, similar guarantees are lacking for distributed first-order algorithms in nonconvex optimization.The paper studies distributed stochastic gradient descent (D-SGD)--a simple network-based implementation of SGD. Conditions under which D-SGD converges to local minima are studied. In particular, it is shown that, for each fixed initialization, with probability 1 we have that: (i) D-SGD converges to critical points of the objective and (ii) D-SGD avoids nondegenerate saddle points. To prove these results, we use ODE-based stochastic approximation techniques. The algorithm is approximated using a continuous-time ODE which is easier to study than the (discrete-time) algorithm. Results are first derived for the continuous-time process and then extended to the discrete-time algorithm. Consequently, the paper studies continuous-time distributed gradient descent (DGD) alongside D-SGD. Because the continuous-time process is easier to study, this approach allows for simplified proof techniques and builds important intuition that is obfuscated when studying the discrete-time process alone.

查看译文

关键词

stochastic gradient descent,local minima

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要