Communications in Information and Systems

Volume 22 (2022)

Number 2

Physics-guided multiple regression analysis for calculating electrostatic free energies of proteins in different reference states

Pages: 187 – 221

DOI: https://dx.doi.org/10.4310/CIS.2022.v22.n2.a2

Authors

Tania Hazra (Department of Mathematics, Misericordia University, Dallas, Pennsylvania, U.S.A.)

Shan Zhao (Department of Mathematics, University of Alabama, Tuscaloosa, Ala., U.S.A.)

Abstract

An implicit solvent modeling problem is studied in this work, i.e., by calculating the electrostatic free energy between water and a new reference state, how to recover the original solvation free energy between water and vacuum states. Such a recovery is considered for the super-Gaussian Poisson–Boltzmann (PB) model [T. Hazra, S. Ahmed-Ullah, S. Wang, E. Alexov, and S. Zhao, Journal of Mathematical Biology, (2019) 79:631–672], which is a heterogeneous dielectric model to mimic the conformational changes of a macromolecule. Nevertheless, while the dielectric function should physically decrease in the vacuum state as it leaves the macromolecular region, the super-Gaussian dielectric function has an inflation over the narrow band of the solute-solvent boundary. To avoid such a non-monotonicity issue, a new reference state with a large enough dielectric value is employed in the super-Gaussian PB model. Based on the electrostatic free energy calculated using this new reference state, a multiple regression model is developed in this paper to estimate the original free energy. The proposed regression model is built physically by accounting for the contribution of each individual atom explicitly, which is modeled via the analytical result of the Kirkwood sphere. Moreover, a regression analysis is conducted for four simple physical descriptors that are related to electrostatic interactions between solute and solvent, i.e., the total number of atoms, the total charge, and the area and volume of the solvent excluded surface (SES). By using a data set of 74 proteins, the dependence of these four descriptors is analyzed. Numerical results indicate that the multiple regression model performs well in estimating the electrostatic free energies.

Keywords

Poisson–Boltzmann equation, electrostatic free energy, regression analysis, Kirkwood sphere, molecular surface, Gaussian dielectric model

Hazra’s research is partially supported by the Summer Research Grant awarded by Misericordia University, Dallas, PA 18612, USA, Summer 2020.

Zhao’s research is partially supported by the National Science Foundation (NSF) of USA under grants DMS-1812930 and DMS-2110914.

Received 20 July 2021

Published 19 May 2022