Lim, Megan A. and Yang, Song and Mai, Huanghao and Cheng, Alan C. (2022) Exploring Deep Learning of Quantum Chemical Properties for Absorption, Distribution, Metabolism, and Excretion Predictions. Journal of Chemical Information and Modeling . ISSN 1549-9596. doi:10.1021/acs.jcim.2c00245. (In Press) https://resolver.caltech.edu/CaltechAUTHORS:20220729-894394000
![]() |
PDF (Figures, hyperparameters for models, additional analysis plots)
- Supplemental Material
See Usage Policy. 498kB |
![]() |
MS Excel (QM9-extension DFT descriptor data set)
- Supplemental Material
See Usage Policy. 19MB |
![]() |
MS Excel (ChEMBL DFT descriptor data set)
- Supplemental Material
See Usage Policy. 74kB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20220729-894394000
Abstract
Quantum mechanical (QM) descriptors of small molecules have wide applicability in understanding organic reactivity and molecular properties, but the substantial compute cost required for ab initio QM calculations limits their broad usage. Here, we investigate the use of deep learning for predicting QM descriptors, with the goal of enabling usage of near-QM accuracy electronic properties on large molecular data sets such as those seen in drug discovery. Several deep learning approaches have previously been benchmarked on a published data set called QM9, where 12 ground-state properties have been calculated for molecules with up to nine heavy atoms, limited to C, H, N, O, and F elements. To advance the work beyond the QM9 chemical space and enable application to molecules encountered in drug discovery, we extend the QM9 data set by creating a QM9-extended data set covering an additional ∼20,000 molecules containing S and Cl atoms. Using this extended set, we generate new deep learning models as well as leverage ANI-2x models to provide predictions on larger, more diverse molecules common in drug discovery, and we find the models estimate 11 of 12 ground-state properties reasonably. We use the predicted QM descriptors to augment graph convolutional neural network (GCNN) models for selected ADME end points (rat microsomal clearance, hepatic clearance, total clearance, and P-glycoprotein efflux) and found varying degrees of performance improvement compared to nonaugmented GCNN models, including pronounced improvement in P-glycoprotein efflux prediction.
Item Type: | Article | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| |||||||||
ORCID: |
| |||||||||
Additional Information: | © 2022 American Chemical Society. Received 1 March 2022. Published online 27 June 2022. Data and Software Availability: ChEMBL data sets and computed descriptors are available in the Supporting Informaiton. This work also leverages proprietary data sets from Merck & Co. (Kenilworth, NJ) to provide higher confidence conclusions. Software used to train models is freely available from Yang et al. (14) at https://github.com/chemprop. Software used for identifying low energy 3D conformations is available from the Chemical Computing Group (Montreal, Canada). We thank our computational and structural chemistry colleagues for feedback on the work. This work was supported in full by Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc., Kenilworth, NJ, USA. Author Contributions. M. A. Lim and S. Yang have contributed equally. All authors contributed to the research, writing of the manuscript, and have given approval to the final version of the manuscript. The authors declare no competing financial interest. | |||||||||
Funders: |
| |||||||||
Subject Keywords: | Energy, Molecular modeling, Molecules, Peptides and proteins, Rodent models | |||||||||
DOI: | 10.1021/acs.jcim.2c00245 | |||||||||
Record Number: | CaltechAUTHORS:20220729-894394000 | |||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20220729-894394000 | |||||||||
Official Citation: | Exploring Deep Learning of Quantum Chemical Properties for Absorption, Distribution, Metabolism, and Excretion Predictions Megan A. Lim, Song Yang, Huanghao Mai, and Alan C. Cheng Journal of Chemical Information and Modeling Article ASAP DOI: 10.1021/acs.jcim.2c00245 | |||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | |||||||||
ID Code: | 115977 | |||||||||
Collection: | CaltechAUTHORS | |||||||||
Deposited By: | George Porter | |||||||||
Deposited On: | 01 Aug 2022 22:38 | |||||||||
Last Modified: | 01 Aug 2022 22:38 |
Repository Staff Only: item control page