CaltechAUTHORS
  A Caltech Library Service

OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy

Christensen, Anders S. and Sirumalla, Sai Krishna and Qiao, Zhuoran and O'Connor, Michael B. and Smith, Daniel G. A. and Ding, Feizhi and Bygrave, Peter J. and Anandkumar, Animashree and Welborn, Matthew and Manby, Frederick R. and Miller, Thomas F., III (2021) OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy. Journal of Chemical Physics, 155 (20). Art. No. 204103. ISSN 0021-9606. doi:10.1063/5.0061990. https://resolver.caltech.edu/CaltechAUTHORS:20210831-203931813

[img] PDF - Accepted Version
See Usage Policy.

4MB
[img] PDF - Submitted Version
Creative Commons Attribution.

1MB
[img] PDF - Supplemental Material
See Usage Policy.

340kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20210831-203931813

Abstract

We present OrbNet Denali, a machine learning model for an electronic structure that is designed as a drop-in replacement for ground-state density functional theory (DFT) energy calculations. The model is a message-passing graph neural network that uses symmetry-adapted atomic orbital features from a low-cost quantum calculation to predict the energy of a molecule. OrbNet Denali is trained on a vast dataset of 2.3 × 10⁶ DFT calculations on molecules and geometries. This dataset covers the most common elements in biochemistry and organic chemistry (H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, and I) and charged molecules. OrbNet Denali is demonstrated on several well-established benchmark datasets, and we find that it provides accuracy that is on par with modern DFT methods while offering a speedup of up to three orders of magnitude. For the GMTKN55 benchmark set, OrbNet Denali achieves WTMAD-1 and WTMAD-2 scores of 7.19 and 9.84, on par with modern DFT functionals. For several GMTKN55 subsets, which contain chemical problems that are not present in the training set, OrbNet Denali produces a mean absolute error comparable to those of DFT methods. For the Hutchison conformer benchmark set, OrbNet Denali has a median correlation coefficient of R² = 0.90 compared to the reference DLPNO-CCSD(T) calculation and R² = 0.97 compared to the method used to generate the training data (ωB97X-D3/def2-TZVP), exceeding the performance of any other method with a similar cost. Similarly, the model reaches chemical accuracy for non-covalent interactions in the S66x10 dataset. For torsional profiles, OrbNet Denali reproduces the torsion profiles of ωB97X-D3/def2-TZVP with an average mean absolute error of 0.12 kcal/mol for the potential energy surfaces of the diverse fragments in the TorsionNet500 dataset.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1063/5.0061990DOIArticle
https://arxiv.org/abs/2107.00299arXivDiscussion Paper
https://doi.org/10.6084/m9.figshare.14883867DOIOrbNet Denali training set
ORCID:
AuthorORCID
Christensen, Anders S.0000-0002-7253-6897
Sirumalla, Sai Krishna0000-0002-1875-2062
Qiao, Zhuoran0000-0002-5704-7331
Smith, Daniel G. A.0000-0001-8626-0900
Bygrave, Peter J.0000-0002-5505-5637
Welborn, Matthew0000-0001-8659-6535
Manby, Frederick R.0000-0001-7611-714X
Miller, Thomas F., III0000-0002-1882-5380
Additional Information:© 2021 Author(s). Published under an exclusive license by AIP Publishing. Submitted: 1 July 2021; Accepted: 26 October 2021; Published Online: 23 November 2021. Z.Q. acknowledges graduate research funding from Caltech and partial support from the Amazon–Caltech AI4Science fellowship. T.F.M. and A.A. acknowledge partial support from the Caltech DeLogi fund, and A.A. acknowledges support from a Caltech Bren professorship. The authors acknowledge NVIDIA, including Abe Stern, Thorsten Kurth, Josh Romero, and Tom Gibbs, for helpful discussions regarding GPU implementations of graph neural networks. Computational resources were provided by the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the DOE Office of Science, under Contract No. DE-AC02-05CH11231. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Conflict of Interest: Nine of the authors (A.S.C., S.K.S., M.B.O., D.G.A.S., F.D., P.J.B., M.W., F.R.M., and T.F.M.) are employees of Entos, Inc., or its affiliates. Author Contributions: A.S.C. and S.K.S. contributed equally to this work. Data Availability: The 2.3 × 10⁶ geometries and energy labels in the OrbNet Denali training set are openly available in FigShare at https://doi.org/10.6084/m9.figshare.14883867.
Funders:
Funding AgencyGrant Number
Amazon Web ServicesUNSPECIFIED
Caltech De Logi FundUNSPECIFIED
Bren Professor of Computing and Mathematical SciencesUNSPECIFIED
NVIDIA CorporationUNSPECIFIED
Department of Energy (DOE)DE-AC02-05CH11231
Department of Energy (DOE)DE-AC05-00OR22725
Issue or Number:20
DOI:10.1063/5.0061990
Record Number:CaltechAUTHORS:20210831-203931813
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20210831-203931813
Official Citation:Anders S. Christensen, Sai Krishna Sirumalla, Zhuoran Qiao, Michael B. O’Connor, Daniel G. A. Smith, Feizhi Ding, Peter J. Bygrave, Animashree Anandkumar, Matthew Welborn, Frederick R. Manby, and Thomas F. Miller III , "OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy", J. Chem. Phys. 155, 204103 (2021) https://doi.org/10.1063/5.0061990
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:110655
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:01 Sep 2021 14:10
Last Modified:23 Nov 2021 21:57

Repository Staff Only: item control page