Methods and Applications of Analysis

Volume 30 (2023)

Number 3

On uniform-in-time diffusion approximation for stochastic gradient descent

Pages: 95 – 112

DOI: https://dx.doi.org/10.4310/MAA.2023.v30.n3.a1

Authors

Lei Li (School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China; Institute of Natural Sciences, MOE-LSC, Shanghai Jiao Tong University, Shanghai, China; Qing Yuan Research Institute, Shanghai Jiao Tong University, Shanghai, China)

Yuliang Wang (School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China; Institute of Natural Sciences, MOE-LSC, Shanghai Jiao Tong University, Shanghai, China)

Abstract

The diffusion approximation of stochastic gradient descent (SGD) in current literature is only valid on a finite time interval. In this paper, we establish the uniform-in-time diffusion approximation of SGD, by only assuming that the expected loss is strongly convex and some other mild conditions, without assuming the convexity of each random loss function. The main technique is to establish the exponential decay rates of the derivatives of the solution to the backward Kolmogorov equation. The uniform-in-time approximation allows us to study asymptotic behaviors of SGD via the continuous stochastic differential equation (SDE) even when the random objective function $f(\cdot ; \xi)$ is not strongly convex.

Keywords

stochastic differential equation, backward Kolmogorov equation, Stroock-Varadhan support theorem, semigroup expansion

2010 Mathematics Subject Classification

60J20, 65C20, 90C15

Received 13 March 2023

Accepted 13 October 2023

Published 7 August 2024