Machine Learning Predictions of Spectroscopic Properties and Carbonyl Reactivity From A Database of Charge Density Descriptors

No Thumbnail Available
Donthula, Kiran Kumar
Journal Title
Journal ISSN
Volume Title
Middle Tennessee State University
Carbonyl compounds are important to study because of their biological and industrial significance. A database of critical point descriptors for valence-shell charge concentrations and depletions of carbon atoms in a range of aldehydes, ketones, imides, and amides has been created. For each critical point, the database contains data related to the probability distribution of electrons (value of the total electron density, r, at bond critical points which have been correlated with bond strength). This includes, data related to the curvature of r at maxima and minima in carbon’s valence shell of charge concentration (VSCC) (Ñ2r(r) and Hessian eigen values, which have been correlated with chemical reactivity). For both types of critical points, radii from the enveloped carbon nucleus are included in the database. Artificial neural networks (ANNs) are strong tools for predicting nonlinear functions, and they are used in this study to both leverage charge density-based descriptors and learn about their relative chemical significance. An ANN prediction scheme was developed for the spectroscopic properties and interaction energies of carbonyl compounds, based on the topological properties of electron density obtained from QTAIM (The input data necessary for training and testing the proposed ANN scheme was data obtained from Quantum Theory of Atoms In Molecules.). In 2009, Balabin and Lomakina [1] used three-layer feed-forward artificial neural networks, with back propagation, to predict density functional theory (DFT) energies that are comparable to those obtained with large basis set using lower-level energy values as training data. These studies, and others, indicate that data-mining techniques, used in conjunction with artificial neural networks, can be productively applied in the prediction of properties that would otherwise be computationally expensive and time-consuming to calculate. For our study, we have selected 225 small systems of carbonyl group-containing molecules as a training set, with each molecule containing 18 bond critical point descriptors and 30 Laplacian critical point descriptors. These properties were used to train ANN for predicting C=O stretching frequencies and 13C chemical shifts. Additional properties, such as intermolecular interaction energies with nucleophiles are also estimated. Predictions are made using the Laplacian critical point data, as well as the bond critical point data, both separately and combined. The study was carried out using the leave-one-out cross validation method. Expected Mean Absolute Percent Errors (MAPE) and Mean Absolute Errors (MAE) are compared between these three data sets. The calculated MAPE for neural network predictions of 13C shifts and C=O stretching frequencies are 1.38, 0.53. MAEs for neural network predictions of covalent and van der Waals interaction energies are 3.44 kcal/mol and 4.78 kcal/mol. Here, all molecular wave functions have been generated using Gaussian 09 [2], and electron density analysis is done using programs AIMAll [3] and DenProp [4]. For the stretch-test we chose the E. coli. enzyme D-fructose-6-phosphate aldolase (FSA) [5], which catalyzes a nucleophilic addition reaction of a carbon nucleophile (ketone) to a carbon electrophile (aldehyde). The covalent interaction energy between a nucleophile and an electrophile within the binding pocket of an enzyme (FSA) is predicted by our ANN with an absolute error of 3.2 kcal/mol.
Computational chemistry, Artificial intelligence