CaltechAUTHORS
  A Caltech Library Service

Molecular Dipole Moment Learning via Rotationally Equivariant Gaussian Process Regression with Derivatives in Molecular-orbital-based Machine Learning

Sun, Jiace and Cheng, Lixue and Miller, Thomas F., III (2022) Molecular Dipole Moment Learning via Rotationally Equivariant Gaussian Process Regression with Derivatives in Molecular-orbital-based Machine Learning. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20220707-204130247

[img] PDF - Submitted Version
See Usage Policy.

662kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20220707-204130247

Abstract

This study extends the accurate and transferable molecular-orbital-based machine learning (MOB-ML) approach to modeling the contribution of electron correlation to dipole moments at the cost of Hartree-Fock computations. A molecular-orbital-based (MOB) pairwise decomposition of the correlation part of the dipole moment is applied, and these pair dipole moments could be further regressed as a universal function of molecular orbitals (MOs). The dipole MOB features consist of the energy MOB features and their responses to electric fields. An interpretable and rotationally equivariant Gaussian process regression (GPR) with derivatives algorithm is introduced to learn the dipole moment more efficiently. The proposed problem setup, feature design, and ML algorithm are shown to provide highly-accurate models for both dipole moment and energies on water and fourteen small molecules. To demonstrate the ability of MOB-ML to function as generalized density-matrix functionals for molecular dipole moments and energies of organic molecules, we further apply the proposed MOB-ML approach to train and test the molecules from the QM9 dataset. The application of local scalable GPR with Gaussian mixture model unsupervised clustering (GMM/GPR) scales up MOB-ML to a large-data regime while retaining the prediction accuracy. In addition, compared with literature results, MOB-ML provides the best test MAEs of 4.21 mDebye and 0.045 kcal/mol for dipole moment and energy models, respectively, when training on 110000 QM9 molecules. The excellent transferability of the resulting QM9 models is also illustrated by the accurate predictions for four different series of peptides.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
https://doi.org/10.48550/arXiv.2205.15510arXivDiscussion Paper
http://dx.doi.org/10.22002/D1.1177DOIData
https://github.com/SUSYUSTC/BBMM.gitRelated Itemimplementation of the multi-GPU AltBBMM and GMM
https://resolver.caltech.edu/CaltechAUTHORS:20221010-454096500.3Related ItemJournal Article
ORCID:
AuthorORCID
Cheng, Lixue0000-0002-7329-0585
Miller, Thomas F., III0000-0002-1882-5380
Additional Information:We thank Vignesh Bhethanabotla for his help in improving the quality of this manuscript. TFM acknowledges support from the US Army Research Laboratory (W911NF-12-2-0023), the US Department of Energy (DE-SC0019390), the Caltech DeLogi Fund, and the Camille and Henry Dreyfus Foundation (Award ML-20-196). Computational resources were provided by the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the DOE Office of Science under contract DE-AC02-05CH11231. The reference pairwise decomposed dipole moments and energies and MOB features of QM7b-T, QM9, and GDB-13-T are available at Caltech Data: https://data.caltech.edu/records/1177. The corresponding HF dipole moments and energies are also available. The total dipole moment and energy data of challenging datasets are also included in this online dataset. The implementation of the multi-GPU AltBBMM and GMM are available online at https://github.com/SUSYUSTC/BBMM.git. The parameters of training the models are included in the Supporting information. Table S1, S2, and S3 list the MOB-ML results plotted in Fig. 2, 4, and 6, respectively.
Funders:
Funding AgencyGrant Number
Army Research Office (ARO)W911NF-12-2-0023
Department of Energy (DOE)DE-SC0019390
Caltech De Logi FundUNSPECIFIED
Camille and Henry Dreyfus FoundationML-20-196
Department of Energy (DOE)DE-AC02-05CH11231
Record Number:CaltechAUTHORS:20220707-204130247
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20220707-204130247
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:115413
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:08 Jul 2022 22:25
Last Modified:17 Oct 2022 14:19

Repository Staff Only: item control page