CaltechAUTHORS
  A Caltech Library Service

Exploring Deep Learning of Quantum Chemical Properties for Absorption, Distribution, Metabolism, and Excretion Predictions

Lim, Megan A. and Yang, Song and Mai, Huanghao and Cheng, Alan C. (2022) Exploring Deep Learning of Quantum Chemical Properties for Absorption, Distribution, Metabolism, and Excretion Predictions. Journal of Chemical Information and Modeling . ISSN 1549-9596. doi:10.1021/acs.jcim.2c00245. (In Press) https://resolver.caltech.edu/CaltechAUTHORS:20220729-894394000

[img] PDF (Figures, hyperparameters for models, additional analysis plots) - Supplemental Material
See Usage Policy.

498kB
[img] MS Excel (QM9-extension DFT descriptor data set) - Supplemental Material
See Usage Policy.

19MB
[img] MS Excel (ChEMBL DFT descriptor data set) - Supplemental Material
See Usage Policy.

74kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20220729-894394000

Abstract

Quantum mechanical (QM) descriptors of small molecules have wide applicability in understanding organic reactivity and molecular properties, but the substantial compute cost required for ab initio QM calculations limits their broad usage. Here, we investigate the use of deep learning for predicting QM descriptors, with the goal of enabling usage of near-QM accuracy electronic properties on large molecular data sets such as those seen in drug discovery. Several deep learning approaches have previously been benchmarked on a published data set called QM9, where 12 ground-state properties have been calculated for molecules with up to nine heavy atoms, limited to C, H, N, O, and F elements. To advance the work beyond the QM9 chemical space and enable application to molecules encountered in drug discovery, we extend the QM9 data set by creating a QM9-extended data set covering an additional ∼20,000 molecules containing S and Cl atoms. Using this extended set, we generate new deep learning models as well as leverage ANI-2x models to provide predictions on larger, more diverse molecules common in drug discovery, and we find the models estimate 11 of 12 ground-state properties reasonably. We use the predicted QM descriptors to augment graph convolutional neural network (GCNN) models for selected ADME end points (rat microsomal clearance, hepatic clearance, total clearance, and P-glycoprotein efflux) and found varying degrees of performance improvement compared to nonaugmented GCNN models, including pronounced improvement in P-glycoprotein efflux prediction.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1021/acs.jcim.2c00245DOIArticle
https://github.com/chempropRelated ItemSoftware
ORCID:
AuthorORCID
Cheng, Alan C.0000-0003-3645-172X
Additional Information:© 2022 American Chemical Society. Received 1 March 2022. Published online 27 June 2022. Data and Software Availability: ChEMBL data sets and computed descriptors are available in the Supporting Informaiton. This work also leverages proprietary data sets from Merck & Co. (Kenilworth, NJ) to provide higher confidence conclusions. Software used to train models is freely available from Yang et al. (14) at https://github.com/chemprop. Software used for identifying low energy 3D conformations is available from the Chemical Computing Group (Montreal, Canada). We thank our computational and structural chemistry colleagues for feedback on the work. This work was supported in full by Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc., Kenilworth, NJ, USA. Author Contributions. M. A. Lim and S. Yang have contributed equally. All authors contributed to the research, writing of the manuscript, and have given approval to the final version of the manuscript. The authors declare no competing financial interest.
Funders:
Funding AgencyGrant Number
Merck Sharp and DohmeUNSPECIFIED
Subject Keywords:Energy, Molecular modeling, Molecules, Peptides and proteins, Rodent models
DOI:10.1021/acs.jcim.2c00245
Record Number:CaltechAUTHORS:20220729-894394000
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20220729-894394000
Official Citation:Exploring Deep Learning of Quantum Chemical Properties for Absorption, Distribution, Metabolism, and Excretion Predictions Megan A. Lim, Song Yang, Huanghao Mai, and Alan C. Cheng Journal of Chemical Information and Modeling Article ASAP DOI: 10.1021/acs.jcim.2c00245
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:115977
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:01 Aug 2022 22:38
Last Modified:01 Aug 2022 22:38

Repository Staff Only: item control page