Communications in Mathematical Sciences

Volume 18 (2020)

Number 6

Rademacher complexity and the generalization error of residual networks

Pages: 1755 – 1774

DOI: https://dx.doi.org/10.4310/CMS.2020.v18.n6.a10

Authors

Weinan E (Department of Mathematics and Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey, U.S.A.)

Chao Ma (Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey, U.S.A.)

Qingcan Wang (Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey, U.S.A.)

Abstract

Sharp bounds for the Rademacher complexity and the generalization error are derived for the residual network model. The Rademacher complexity bound has no explicit dependency on the depth of the network, while the generalization bounds are comparable to the Monte Carlo error rates, suggesting that they are nearly optimal in the high dimensional setting. These estimates are achieved by constraining the hypothesis space with an appropriately defined path norm such that the constrained space is large enough for the approximation error rates to be optimal and small enough for the estimation error rates to be optimal at the same time. Comparisons are made with other norm-based bounds.

Keywords

a priori estimate, residual network, weighted path norm

2010 Mathematics Subject Classification

41A46, 41A63, 62J02, 65D05

Received 3 June 2020

Accepted 23 August 2020

Published 4 November 2020