Communications in Mathematical Sciences

Volume 20 (2022)

Number 7

Reinforced optimal control

Pages: 1951 – 1978

DOI: https://dx.doi.org/10.4310/CMS.2022.v20.n7.a7

Authors

Christian Bayer (Weierstrass Institute for Applied Analysis and Stochastics (WIAS), Berlin, Germany)

Denis Belomestny (Faculty of Mathematics, Duisburg–Essen University, Essen, Germany; and National University Higher School of Economics, Moscow, Russia)

Paul Hager (Institut für Mathematik, Humboldt Universität zu Berlin, Berlin, Germany)

Paolo Pigato (Department of Economics and Finance, University of Rome Tor Vergata, Rome, Italy)

John Schoenmakers (Weierstrass Institute for Applied Analysis and Stochastics (WIAS), Berlin, Germany)

Vladimir Spokoiny (Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin, Germany; Department of Mathematics, Humboldt-Universität zu Berlin, Germany; and IITP RAS & National University Higher School of Economics, Moscow, Russia)

Abstract

Least-squares Monte Carlo methods are a popular numerical approximation method for solving stochastic control problems. Based on dynamic programming, their key feature is the approximation of the conditional expectation of future rewards by linear least squares regression. Hence, the choice of basis functions is crucial for the accuracy of the method. Earlier work by some of us [Belomestny, Schoenmakers, Spokoiny, Zharkynbay, Commun. Math. Sci., 18(1):109–121, 2020] proposes to reinforce the basis functions in the case of optimal stopping problems by already computed value functions for later times, thereby considerably improving the accuracy with limited additional computational cost. We extend the reinforced regression method to a general class of stochastic control problems including Markov Decision processes, while considerably improving the method’s efficiency, as demonstrated by substantial numerical examples as well as theoretical analysis.

Keywords

Monte Carlo, optimal control, regression, reinforcement learning

2010 Mathematics Subject Classification

91G20, 93E24

C.B., P.H, P.P. J.S. and V.S. were supported by the MATH+ project AA4-2 Optimal control in energy markets using rough analysis and deep networks. D.B. gratefully acknowledges the support of the German Science Foundation research grant (DFG Sachbeihilfe) 497300407. Results of Section 6 were obtained under the support of the RSF grant 19-71-30020 (HSE University).

Received 31 May 2021

Received revised 29 January 2022

Accepted 5 February 2022

Published 21 October 2022