The full text of this article is unavailable through your IP address: 172.17.0.1
Contents Online
Communications in Mathematical Sciences
Volume 19 (2021)
Number 3
A sharp convergence rate for a model equation of the asynchronous stochastic gradient descent
Pages: 851 – 863
(Fast Communication)
DOI: https://dx.doi.org/10.4310/CMS.2021.v19.n3.a13
Authors
Abstract
We give a sharp convergence rate for the asynchronous stochastic gradient descent (ASGD) algorithms when the loss function is a perturbed quadratic function based on the stochastic modified equations introduced in [An et al. “Stochastic modified equations for the asynchronous stochastic gradient descent”, arXiv:1805.08244]. We prove that when the number of local workers is larger than the expected staleness, then ASGD is more efficient than stochastic gradient descent. Our theoretical result also suggests that longer delays result in slower convergence rate. Besides, the learning rate cannot be smaller than a threshold inversely proportional to the expected staleness.
Keywords
asynchronous stochastic gradient descent, stochastic modified equations, distributed learning
2010 Mathematics Subject Classification
65K05, 68W15, 68W20, 90C15
Received 26 January 2020
Accepted 28 November 2020
Published 5 May 2021