Communications in Mathematical Sciences

Volume 22 (2024)

Number 5

Revisiting the central limit theorems for the SGD-type methods

Pages: 1427 – 1454

DOI: https://dx.doi.org/10.4310/CMS.2024.v22.n5.a10

Authors

Tiejun Li (Laboratory of Mathematics and Its Applications (LMAM) and School of Mathematical Sciences,and Center for Machine Learning Research, Peking University, Beijing, China)

Tiannan Xiao (LMAM and School of Mathematical Sciences, Peking University, Beijing, China)

Guoguo Yang (LMAM and School of Mathematical Sciences, Peking University, Beijing, China)

Abstract

We revisited the central limit theorem (CLT) for stochastic gradient descent (SGD) type methods, including the vanilla SGD, momentum SGD and Nesterov accelerated SGD methods with constant or vanishing damping parameters. By taking advantage of Lyapunov function technique and $L^p$ bound estimates, we established the CLT under more general conditions on learning rates for broader classes of SGD methods as compared to previous results. The CLT for the time average was also investigated, and we found that it held in the linear case, while it was not generally true in nonlinear situation. Numerical tests were also carried out to verify our theoretical analysis.

Keywords

central limit theorem, SGD, momentum SGD, Nesterov acceleration

2010 Mathematics Subject Classification

60F05, 60J22

The full text of this article is unavailable through your IP address: 172.17.0.1

Received 5 June 2023

Received revised 27 September 2023

Accepted 18 December 2023

Published 15 July 2024