Machine Learning Predictions of Spectroscopic Properties and Carbonyl Reactivity From A Database of Charge Density Descriptors
Machine Learning Predictions of Spectroscopic Properties and Carbonyl Reactivity From A Database of Charge Density Descriptors
No Thumbnail Available
Date
2020
Authors
Donthula, Kiran Kumar
Journal Title
Journal ISSN
Volume Title
Publisher
Middle Tennessee State University
Abstract
Carbonyl compounds are important to study because of their biological and industrial
significance. A database of critical point descriptors for valence-shell charge concentrations
and depletions of carbon atoms in a range of aldehydes, ketones, imides, and amides has
been created. For each critical point, the database contains data related to the probability
distribution of electrons (value of the total electron density, r, at bond critical points which
have been correlated with bond strength). This includes, data related to the curvature of r at
maxima and minima in carbon’s valence shell of charge concentration (VSCC) (Ñ2r(r) and
Hessian eigen values, which have been correlated with chemical reactivity). For both types
of critical points, radii from the enveloped carbon nucleus are included in the database.
Artificial neural networks (ANNs) are strong tools for predicting nonlinear functions,
and they are used in this study to both leverage charge density-based descriptors and learn
about their relative chemical significance. An ANN prediction scheme was developed for
the spectroscopic properties and interaction energies of carbonyl compounds, based on the
topological properties of electron density obtained from QTAIM (The input data necessary
for training and testing the proposed ANN scheme was data obtained from Quantum Theory
of Atoms In Molecules.). In 2009, Balabin and Lomakina [1] used three-layer feed-forward
artificial neural networks, with back propagation, to predict density functional theory (DFT)
energies that are comparable to those obtained with large basis set using lower-level energy
values as training data. These studies, and others, indicate that data-mining techniques, used
in conjunction with artificial neural networks, can be productively applied in the prediction
of properties that would otherwise be computationally expensive and time-consuming to
calculate.
For our study, we have selected 225 small systems of carbonyl group-containing
molecules as a training set, with each molecule containing 18 bond critical point descriptors
and 30 Laplacian critical point descriptors. These properties were used to train ANN for
predicting C=O stretching frequencies and 13C chemical shifts. Additional properties, such
as intermolecular interaction energies with nucleophiles are also estimated. Predictions are
made using the Laplacian critical point data, as well as the bond critical point data, both
separately and combined. The study was carried out using the leave-one-out cross
validation method. Expected Mean Absolute Percent Errors (MAPE) and Mean Absolute
Errors (MAE) are compared between these three data sets. The calculated MAPE for neural
network predictions of 13C shifts and C=O stretching frequencies are 1.38, 0.53. MAEs for
neural network predictions of covalent and van der Waals interaction energies are 3.44
kcal/mol and 4.78 kcal/mol. Here, all molecular wave functions have been generated using
Gaussian 09 [2], and electron density analysis is done using programs AIMAll [3] and
DenProp [4].
For the stretch-test we chose the E. coli. enzyme D-fructose-6-phosphate aldolase (FSA)
[5], which catalyzes a nucleophilic addition reaction of a carbon nucleophile (ketone) to a
carbon electrophile (aldehyde). The covalent interaction energy between a nucleophile and
an electrophile within the binding pocket of an enzyme (FSA) is predicted by our ANN
with an absolute error of 3.2 kcal/mol.
Description
Keywords
Computational chemistry,
Artificial intelligence