1901.03309.pdf

A Universal Density Matrix Functional from Molecular Orbital-Based

Machine Learning: Transferability across Organic Molecules

Lixue Cheng,

Matthew Welborn,

Anders S. Christensen,

and Thomas F. Miller III

Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125,

USA

Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials,

Department of Chemistry, University of Basel, Basel, Switzerland

(Dated: 8 April 2019)

We address the degree to which machine learning can be used to accurately and transferably predict post-Hartree-Fock

correlation energies. Refined strategies for feature design and selection are presented, and the molecular-orbital-based

machine learning (MOB-ML) method is applied to several test systems. Strikingly, for the MP2, CCSD, and CCSD(T)

levels of theory, it is shown that the thermally accessible (350 K) potential energy surface for a single water molecule

can be described to within 1 millihartree using a model that is trained from only a single reference calculation at a

randomized geometry. To explore the breadth of chemical diversity that can be described, MOB-ML is also applied to

a new dataset of thermalized (350 K) geometries of 7211 organic models with up to seven heavy atoms. In comparison

with the previously reported

∆

-ML method, MOB-ML is shown to reach chemical accuracy with three-fold fewer

training geometries. Finally, a transferability test in which models trained for seven-heavy-atom systems are used to

predict energies for thirteen-heavy-atom systems reveals that MOB-ML reaches chemical accuracy with 36-fold fewer

training calculations than

∆

-ML (140 versus 5000 training calculations).

Machine learning (ML) has recently seen wide application

in chemistry, including in the fields of drug discovery,

1–3

ma-

terials design,

4–7

and reaction prediction.

8–12

In the context

of quantum chemistry, much work has focused on predicting

electronic energies or densities based on atom- or geometry-

specific features,

13–33

although other strategies have also been

employed.

Recently, we reported an accurate and trans-

ferable molecular-orbital-based machine learning (MOB-ML)

approach to the prediction of correlated wavefunction ener-

gies based on input features from a self-consistent field calcu-

lation, such as the Hartree-Fock (HF) method.

In this com-

munication, we present refinements to the MOB-ML method

with comparisons to test cases from our previous work. We

then demonstrate the performance of MOB-ML across a broad

swath of chemical space, as represented by the QM7b

and

GDB-13

test sets of organic molecules.

I. THEORY

The current work aims to predict post-Hartree-Fock cor-

related wavefunction energies using features of the Hartree-

Fock molecular orbitals (MOs). The starting point for the

MOB-ML method

is that the correlation energy can be de-

composed into pairwise occupied MO contributions

37,38

occ

∑

i j

,

(1)

where the pair correlation energy

i j

can be written as a func-

tional of the full set of MOs,

{

}

, appropriately indexed by

and

i j

[

{

}

i j

]

.

(2)

Electronic mail: tfm@caltech.edu.

The functional

is universal across all chemical systems;

for a given level of correlated wavefunction theory, there is a

corresponding

that maps the HF MOs to the pair correlation

energy, regardless of the molecular composition or geometry.

Furthermore,

simultaneously describes the pair correlation

energy for all pairs of occupied MOs (i.e., the functional form

does not depend on

and

). For example, in second-order

Møller-Plessett perturbation theory (MP2),

the pair correla-

tion energies are

MP2

i j

virt

∑

|〈

i j

〉|

−

(3)

where

and

index virtual MOs,

is the Hartree-Fock or-

bital energy corresponding to MO

, and

〈

i j

〉

are anti-

symmetrized electron repulsion integrals.

A corresponding

expression for the pair correlation energy exists for any post-

Hartree-Fock method, but it is typically costly to evaluate in

closed form.

In MOB-ML, a machine learning model is constructed for

the pair energy functional

i j

≈

[

i j

]

(4)

where

i j

denotes a vector of features associated with MOs

and

. Eq. 4 thus presents the opportunity for the machine

learning of a universal density matrix functional for correlated

wavefunction energies, which can be evaluated at the cost of

the MO calculation.

Following our previous work,

the features

i j

correspond

to unique elements of the Fock (

), Coulomb (

), and ex-

change (

) matrices between

, and the set of virtual

orbitals. In the current work, we additionally include features

associated with matrix elements between pairs of occupied or-

bitals for which one member of the pair differs from

(i.e., non-

occupied MO pairs). The feature vector takes the

arXiv:1901.03309v3 [physics.chem-ph] 4 Apr 2019

form

i j

,

i j

,

j j

,

,

,

i j

,

(5)

,

i j

,

j j

,

,

,

,

,

i j

,

i j

,

,

,

,

,

i j

)

.

where for a given matrix (

, or

) the superscript o denotes

a row of its occupied–occupied block, the superscript v de-

notes a row of its occupied–virtual block, and the superscript

vv denotes its virtual–virtual block. Redundant elements are

removed, such that the virtual–virtual block is represented by

its upper triangle and the diagonal elements of

(which are

identical to those of

) are omitted. To increase transferability

and accuracy, we choose

and

to be localized molecular

orbitals (LMOs) rather than canonical MOs and employ va-

lence virtual LMOs

in place of the set of all virtual MOs

(as detailed in Ref. 35). We separate Eq. 4 to independently

machine learn the cases of

and

i j

≈

{

[

]

[

i j

]

(6)

where

denotes

(Eq. 5) with redundant elements removed;

by separating the pair energies in this way, we avoid the situ-

ation where a single ML model is required to distinguish be-

tween the cases of

and

being nearly degenerate to

a distinction which can represent a sharp variation in the func-

tion to be learned.

In the current work, several technical refinements are in-

troduced to improve training efficiency (i.e., the accuracy and

transferability of the model as a function of the number of

training examples). These are now described.

Occupied LMO symmetrization.

The feature vector is pre-

processed to specify a canonical ordering of the occupied and

virtual LMO pairs. This reduces permutation of elements in

the feature vector, resulting in greater ML training efficiency.

Matrix elements

i j

(

) associated with

and

are rotated into gerade and ungerade combinations

←

j j

i j

(7)

j j

←

j j

−

i j

←

−

j j

←

√

←

√

−

√

with the sign convention that

i j

is negative. Here,

indexes

any LMO other than

(i.e. an occupied LMO

, such that

, or a valence virtual LMO).

LMO sorting.

The virtual LMO pairs are sorted by in-

creasing distance from occupied orbitals

and

. Sorting in

this way ensures that features corresponding to valence virtual

LMOs are listed in decreasing order of heuristic importance,

and that the mapping between valence virtual LMOs and their

associated features is roughly preserved. We recognize this

issue could also potentially be addressed through the use of

symmetry functions,

but these are not employed in the cur-

rent work.

For purposes of sorting, distance is defined as

i j

∥

〈

〉

−

〈

〉

∥

〈

∣

〉

−

〈

〉

∥

,

(8)

where

is a virtual LMO,

is the Cartesian position opera-

tor, and

‖

.

‖

denotes the 2-norm.

∥

〈

〉

−

〈

〉

∥

rep-

resents the Euclidean distance between the centroids of orbital

and orbital

. Previously,

distances were defined based on

Coulomb repulsion, which was found to sometimes lead to

inconsistent sorting in systems with strongly polarized bonds.

The non-

occupied LMO pairs are sorted in the same man-

ner as the virtual LMO pairs.

Orbital localization.

We employ Boys localization

obtain the occupied LMOs, rather than IBO localization

which was employed in our previous work.

Particularly for

molecules that include triple bonds or multiple lone pairs, it

is found that Boys localization provides more consistent lo-

calization as a function of small geometry changes than IBO

localization; and the chemically unintuitive mixing of

and

bonds in Boys localization (“banana bonds”)

does not

present a problem for the MOB-ML method.

Feature selection.

Prior to training, automatic feature se-

lection is performed using random forest regression

with the

mean decrease of accuracy criterion (sometimes called permu-

tation importance).

This technique was found to be more ef-

fective than our previous use

of the Gini importance score

which led to worse accuracy and failed to select any features

for the case of methane.

The reason for using feature selection in this way is

twofold. First, GPR performance is known to degrade for

high-dimensional datasets (in practice 50-100 features);

and

second, the use of the full feature set with small molecules can

lead to overfitting as features can become correlated.

A reference P

YTHON

implementation for generating MOB-

ML features is provided online.

II. COMPUTATIONAL DETAILS

Results are presented for a single water molecule; a se-

ries of alkane molecules; a thermalized version of the QM7b

set of 7211 molecules with up to seven C, O, N, S, and Cl

heavy atoms; and a thermalized version of the GDB-13 set of

molecules with thirteen C, O, N, S, and Cl heavy atoms. All

datasets employed in this work are provided in the Supporting

Information.

Training and test geometries are sampled at 50 fs in-

tervals from

ab initio

molecular dynamics trajectories per-

formed with the Q-C

HEM

5.0 software package,

using

the B3LYP

49–52

/6-31g*

level of theory and a Langevin

thermostat

at 350 K.

The features and training pair energies associated with

these geometries are computed using the M

OLPRO

2018.0

software package

in a cc-pVTZ basis set unless otherwise

noted.

Valence virtual orbitals used in feature construction

are determined with the Intrinsic Bond Orbital method.

Ref-

erence pair correlation energies are computed with second-

order Møller-Plessett perturbation theory (MP2)

39,57

and cou-

pled cluster with singles and doubles (CCSD)

58,59

as well as

with perturbative triples (CCSD(T)).

60,61

Density fitting for

both Coulomb and exchange integrals

is employed for all re-

sults below except those corresponding to the water molecule.

The frozen core approximation is used in all cases.

Gaussian process regression (GPR)

is employed to ma-

chine learn

and

(Eq. 6) using the GP

1.9.6 software

package.

The GPR kernel is Matérn 5/2 with white noise

regularization.

Kernel hyperparameters are optimized with

respect to the log marginal likelihood objective for the water

and alkane series results, as well as for

of the QM7b re-

sults. We use the Matérn 3/2 kernel instead of the Matérn 5/2

kernel for the case of

for QM7b results, as it was empiri-

cally found to yield slightly better accuracy.

Feature selection is performed using the random forest

regression implementation in the S

CIKIT

LEARN

v0.20.0

package.

III. RESULTS

The ML model of Eq. 6 is a universal functional for any

molecular Hamiltonian. In principle, with an adequate feature

list and unlimited training data (and time), it should accurately

and simultaneously describe all molecular systems. In prac-

tice, we must train the ML model using a truncated feature list

and finite data. These choices determine the accuracy of the

model.

Below, we examine the performance of the MOB-ML

method in three increasingly broad regions of chemical space:

(i) training on randomized water molecule geometries and

predicting the energies of other water molecule geometries;

(ii) training on geometries of short alkanes and predicting the

energies of longer alkanes, and (iii) training on a small set

of organic molecules and predicting the energies of a broader

set of organic molecules. The first two test cases were intro-

duced in our previous work,

and we explore how the refined

methodology reported herein leads to improvements in accu-

racy and transferability. The last case represents a demand-

ing new test of transferability across chemical space. In all

cases, we report the ML prediction accuracy as a function of

the number of training examples.

As a first example, we consider the performance of MOB-

ML for a single water molecule. A separate model is trained

to predict the correlation energy at the MP2, CCSD, and

CCSD(T) levels of theory, using reference calculations on a

subset of 1000 randomized water geometries to predict the

correlation energy for the remainder. Feature selection with

an importance threshold of 1

−

results in 12, 11 and 10

features for

for MP2, CCSD and CCSD(T), respectively;

ten features are selected for

for all three post-Hartree-

Fock methods.

Figure 1 presents the test set prediction accuracy of each

MOB-ML model as a function of the number of training ge-

ometries (i.e., the “learning curve"). MOB-ML predictions

are shown for MP2, CCSD, and CCSD(T), and the model

shows the same level of accuracy for all three methods. Re-

markably, all three models achieve a prediction mean absolute

error (MAE) of 1 mH when trained on only a single water ge-

ometry, indicating that only a single reference calculation is

needed to provide chemical accuracy for the remaining 999

geometries at each level of theory. Since it contains 10 distinct

LMO pairs, this single geometry provides enough information

to yield a chemically accurate MOB-ML model for the global

thermally accessible potential energy surface.

For all three methods (Fig. 1), the learning curve exhibits

the expected

power-law behavior as a function of training

data, and the total error reaches microhartree accuracy with

tens of water training geometries. As compared to our pre-

vious results, where training on 200 geometries resulted in a

prediction MAE of 0.027 mH for the case of CCSD,

the

current implementation of the MOB-ML model is substan-

tially improved; the improvement for this case stems primar-

ily from the use of Boys localization,

which specifies unique

and consistent LMOs corresponding to the oxygen lone pairs.

0.0001

0.001

0.01

0.1

100

Prediction

MAE

(mH)

Number

training

geometries

MP2

CCSD

CCSD(T)

0.0001

0.001

0.01

0.1

100

FIG. 1. Learning curves for MOB-ML models trained on the water

molecule and used to predict the correlation energy of different wa-

ter molecule geometries at three levels of post-Hartree-Fock theory.

Prediction errors are summarized in terms of mean absolute error

(MAE).

Next, we explore the transferability of MOB-ML predic-

tions for a model that is trained on thermalized geometries of

short alkanes and then used for predictions on thermalized ge-

ometries of larger and more branched alkanes (n-butane and

isobutane). For these predictions, the absolute zero of energy

is shifted for each molecule to compare relative energies on its

potential energy surface (i.e., parallelity errors are removed).

These shifts are reported in the figure caption; for no other

results reported in the paper are parallelity errors removed.

In our previous work,

this test was performed using train-

ing data that combined of 100 geometries of methane, 300 of

ethane, and 50 or propane; the resulting predictions are repro-

duced here in Fig. 2a. This earlier implementation of MOB-

ML led to predictions for n-butane and isobutane with sub-

stantial errors (0.59 mH for n-butane and 0.93 mH for isobu-

tane) and noticable skew with respect to the true correlation

energy.

The predictions of MOB-ML in the current work (Fig. 2b)

are markedly improved. First, the overall prediction accuracy

is improved for all four summary statistics (inset in Fig. 2) de-

spite substantial reduction in the number of training examples

used. (The current work uses only 50 geometries of ethane,

20 geometries of propane, and no methane data.) Second, n-

butane and isobutane are predicted with nearly identical ac-

curacy. Finally, the prediction errors are no longer skewed as

a function of true correlation energy. The primary method-

ological sources of these improvements are found to be sym-

metrization of occupied orbitals (Eq. 7) and the improved fea-

ture selection methodology. The MOB-ML features in the

current work are selected with an importance threshold of

−

, resulting in 27 features for

and 12 features for

; results presented in Fig 2b for CCSD(T) are qualitatively

identical to those obtained for CCSD (not shown).

-6

-4

-2

-778

-776

-774

-772

-770

-768

work

[Ref.

35]

Prediction

(MAE*,

Max*,

Rel.

MAE*,

r

)

Trained

100

300

(a)

Prediction

Error

(mH)

True

CCSD

E

(mH)

n-Butane

(0.59,

2.2

0.076%,

0.77)

Isobutane

(0.93,

3.2,

0.12%,

0.32)

-6

-4

-2

-806

-804

-802

-800

-798

Current

work

Prediction

(MAE*,

Max*,

Rel.

MAE*,

r

)

Trained

(b)

Prediction

Error

(mH)

True

CCSD(T)

E

(mH)

n-Butane

(0.32,

1.4

0.040%,

0.95)

Isobutane

(0.33,

1.4,

0.042%,

0.95)

FIG. 2. MOB-ML predictions of the correlation energy for 100 n-

butane and isobutane geometries, using MOB-ML features described

in the (b) current work , compared to (a) the previous MOB-ML fea-

tures of Ref. 35. Training sets are indicated in each panel of the

figure. MOB-ML prediction errors are plotted versus the (a) true

CCSD and (b) true CCSD(T) correlation energy. To remove paral-

lelity error, a global shift is applied to the predictions of n-butane

and isobutane by (a) 3.3 and 0.73 mH and (b) 0.90 and 0.17 mH, re-

spectively. Summary statistics that include this shift (indicated by an

asterisk) are presented, consisting of mean absolute error (MAE*),

maximum absolute error (Max*), MAE* as a percentage of

(Rel.

MAE*), and Pearson correlation coefficient (

The gray shaded

region corresponds to errors of

2 mH.

We now examine the transferability of the MOB-ML

method across a broad swath of chemical space. Specifically,

we consider the QM7b dataset,

which is comprised of 7,211

plausible organic molecules with 7 or fewer heavy atoms. The

chemical elements in QM7b are limited to those likely to be

found in drug-like compounds: C, H, O, N, S, and Cl. We re-

fer to the dataset used herein as QM7b-T to reflect the fact that

it contains geometries sampled at a temperature of 350 K (as

described in Sec. II), as opposed to DFT-optimized geome-

tries. The MOB-ML model is trained on a randomly chosen

subset of QM7b-T molecules and used to predict the correla-

tion energy of the remainder. Active learning was also tested

as a training data selection strategy, but was not found to im-

prove the predictions in the regime of chemical accuracy, and

in fact led to slightly worse transferability.

For comparison, a

∆

-ML model

was trained on the

same molecules using kernel-ridge regression using the FCHL

representation

with a Gaussian kernel function (FCHL/

∆

ML), as implemented in the QML package.

All hyperpa-

rameters of the model were set to those obtained in Ref. 70,

which have previously been demonstrated to work well for

datasets containing structures similar to those in QM7b-T.

A possible source of concern for MOB-ML is that the num-

ber of selected features would grow with the chemical com-

plexity of the training data. For example, 27 features for

and 12 features for

were selected in the alkane test case

using ethane + propane training data (Fig. 2b), whereas only

10 features for

and 10 features for

were selected for

the water test case at the CCSD(T) level of theory (Fig. 1).

To examine this, we perform feature selection on increasing

numbers of randomly selected molecules from the QM7b-T

dataset. Table I presents two statistics on the feature impor-

tance as a function of the number of training molecules: (i)

the number of “important features" (i.e., those whose permu-

tation importance

exceeds a set threshold of 2

−

and

−

for

and

, respectively) and (ii) the inverse

participation ratio

of the feature importance scores. The

latter is a threshold-less measure of the number of impor-

tant features; it takes a value of 1 when only 1 feature has

nonzero importance and

when all

features have equal im-

portance. Although the QM7b-T dataset contains many differ-

ent chemical elements and bonding motifs, Table I reveals that

the selected features remain compact and do not grow with the

number of training molecules. Indeed, for a large number of

training molecules, the number of selected features slightly

decreases, reaching 42 and 24 selected features for

and

, respectively, for the largest training sizes considered.

The learning curves for MOB-ML models trained at the

MP2/cc-pVTZ and CCSD(T)/cc-pVDZ levels of theory are

shown in Fig. 3a, as well as the FCHL/

∆

-ML learning curve

for MP2/cc-pVTZ. At the MP2 level of theory, the MOB-ML

model achieves an accuracy of 2 mH with 110 training cal-

culations (representing 1.5% of the molecules in the QM7b-T

dataset), whereas the FCHL/

∆

-ML requires over 300 train-

ing geometries to reach the same accuracy threshold. Fig. 3a

also illustrates the relative insensitivity of MOB-ML to the

level of electronic structure theory, with the learning curve for

CCSD(T)/cc-pVDZ reaching 2 mH accuracy with 140 train-

ing calculations. An analysis of the sensitivity of the MOB-

TABLE I. Number of features selected as a function of the number

randomly chosen training molecules for the QM7b-T dataset at the

CCSD(T)/cc-pVDZ level. The number of features that exceed an

importance threshold as well as the inverse participation ratio (IPR)

of the feature importance scores are reported (see text).

# of important features

feature weight IPR

Training size

4.720

1.116

3.718

1.097

100

3.450

1.115

200

3.430

1.120

ML predictions to the number of selected features is presented

in supporting information Fig. S1, which indicates that the re-

ported results are robust with respect to the number of selected

features.

As a final test of transferability of the MOB-ML and

FCHL/

∆

-ML methods across chemical space, Figs. 3b and 3c

show results in which the ML methods are trained on QM7b-

T molecules and then used to predict results for a dataset of

13-heavy-atom organic molecules at thermalized geometries,

GDB-13-T, which includes six thermally sampled geometries

each of 1,000 13-heavy-atom organic molecules chosen ran-

domly from the GDB-13 dataset.

Like QM7b, the mem-

bers of GDB-13 contain C, H, N, O, S, and Cl. The size

of these molecules precludes the use of coupled cluster the-

ory to generate reference data; we therefore make comparison

at the MP2/cc-pVTZ level of theory, noting that MOB-ML

has consistently been shown to be insensitive to the employed

post-Hartree-Fock method (as in Fig. 3a). Transfer learning

results as a function of the number of training molecules are

presented in Figs. 3b (on a linear-linear scale) and 3c (on a

log-log scale).

Using the MOB-ML model that is trained on 110 seven-

heavy-atom molecules (corresponding to a prediction MAE of

1.89 mH for QM7b-T), we observe a prediction MAE of 3.88

mH for GDB-13-T. Expressed in terms of size-intensive quan-

tities, the prediction MAE per heavy atom is 0.277 mH and

0.298 mH for QM7b-T and GDB-13-T, respectively, indicat-

ing that the accuracy of the MOB-ML results are only slightly

worse when the model is transferred to the dataset of larger

molecules. On a per-heavy-atom basis, MOB-ML reaches

chemical accuracy with the same number of QM7b-T train-

ing calculations (approximately 100), regardless of whether it

is tested on QM7b-T or GDB-13-T.

In contrast with MOB-ML, the FCHL/

∆

-ML method is

found to be significantly less transferable from QM7b-T to

GDB-13-T. For models trained using 100 seven-heavy-atom

molecules, the MAE per heavy atom of FCHL/

∆

-ML is over

twice that of MOB-ML (Fig. 3b). Moreover, whereas MOB-

ML reaches the per-heavy-atom chemical accuracy threshold

with 140 training calculations, the FCHL/

∆

-ML method only

reaches that threshold with 5000 training calculations.

100

200

300

(a)

QM7b-T

Prediction

MAE

(mH)

FCHL/

-ML

MP2

MOB-ML

MP2

MOB-ML

CCSD(T)

100

200

300

0.5

1.5

100

200

300

(b)

Prediction

MAE/

heavy

atom

(mH)

FCHL/

-ML,

QM7b-T

GDB-13-T

MOB-ML,

QM7b-T

GDB-13-T

MOB-ML,

QM7b-T

0.5

1.5

100

200

300

0.25

0.5

100

500

5000

(c)

Prediction

MAE/

heavy

atom

(mH)

Number

training

molecules

FCHL/

-ML,

QM7b-T

GDB-13-T

MOB-ML,

QM7b-T

GDB-13-T

MOB-ML,

QM7b-T

0.25

0.5

100

500

5000

FIG. 3.

Learning curves for MOB-ML trained on QM7b-T and

applied to QM7b-T and GDB-13-T (see text for definition of these

datasets). FCHL/

∆

-ML

results are provided for comparison. (a)

Predictions are made for QM7b-T at the MP2/cc-pVTZ (red) and

CCSD(T)/cc-pVDZ (orange) levels of theory. (b) Using the same

models trained on QM7b-T, predictions are made for GDB-13-T, and

reported in terms of MAE per heavy atom. (MOB-ML predictions

for QM7b-T are included for reference.) (c) As in the previous panel,

but plotted on a logarithmic scale and extended to show the full range

of FCHL/

∆

-ML predictions. Error bars for FCHL/

∆

-ML represent

prediction standard errors of the mean as measured over 10 models.

The gray shaded area corresponds to errors of 2 mH per 7 heavy

atoms.

IV. CONCLUSIONS

Molecular-orbital-based machine learning (MOB-ML) has

been shown to be a simple and strikingly accurate strategy

for predicting correlated wavefunction energies at the cost

of a Hartree-Fock calculation, benefiting from the intrinsic

transferability of the localized molecular orbital representa-

tion. The starting point for the MOB-ML method is a rigor-

ous mapping from the Hartree-Fock molecular orbitals to the

total correlation energy, which ensures that the use of suffi-

cient training data and molecular orbital features will produce

a model that matches the corresponding correlated wavefunc-

tion method across the entirety of chemical space. The current

work explores this possibility within the subspace of organic

molecules. It is shown that MOB-ML predicts energies of the

QM7b-T dataset to within a 2 millihartree accuracy using only

110 training calculations at the MP2/cc-pVTZ level of theory

and using 140 training calculations at the CCSD(T)/cc-pVDZ

level of theory. Direct comparison with FCHL/

∆

-ML reveals

that MOB-ML is threefold more efficient in reaching chemi-

cal accuracy for describing QM7b-T. Further, a transferability

test of a MOB-ML model trained on QM7b-T to GDB-13-T

reveals that MOB-ML exhibits negligible degradation in accu-

racy; as a result, chemical accuracy is achieved with 36-times

fewer training calculations using MOB-ML versus FCHL/

∆

ML. These results suggest that MOB-ML provides a promis-

ing approach toward the development of density matrix func-

tionals that are applicable across broad swathes of chemical

space.

ACKNOWLEDGMENTS

We thank Daniel Smith (Molecular Sciences Software In-

stitute) and Alberto Gobbi (Genentech) for a helpful discus-

sion about available training datasets. T.F.M. acknowledges

support from AFOSR award no. FA9550-17-1-0102. A.S.C.

acknowledges support from the National Centre of Compe-

tence in Research (NCCR) Materials Revolution: Computa-

tional Design and Discovery of Novel Materials (MARVEL)

of the Swiss National Science Foundation (SNSF). We addi-

tionally acknowledge support from the Resnick Sustainabil-

ity Institute postdoctoral fellowship (M.W.) and the Camille

Dreyfus Teacher-Scholar Award (T.F.M.). Computational re-

sources were provided by the National Energy Research Sci-

entific Computing Center (NERSC), a DOE Office of Science

User Facility supported by the DOE Office of Science under

contract DE-AC02-05CH11231.

SUPPORTING INFORMATION

The datasets used in this work are available for download;

they include MOB-ML features, HF energies, pair corre-

lation energies, and geometries. MOB-ML and FCHL/

∆

ML predictions corresponding to Fig.

3 and an anal-

ysis of the sensitivity of the results of Fig.

3 to the

number of selected features are available in the attached

supporting information.

A reference P

YTHON

implemen-

tation for generating MOB-ML features is available at

https://github.com/thomasfmiller/MOB-ML

A. Lavecchia, “Machine-learning approaches in drug discovery: methods

and applications,” Drug Discov. Today

, 318–331 (2015).

E. Gawehn, J. A. Hiss, and G. Schneider, “Deep learning in drug discov-

ery,” Mol. Inform.

, 3–14 (2016).

M. Popova, O. Isayev, and A. Tropsha, “Deep reinforcement learning for

de novo drug design,” Sci. Adv.

, eaap7885 (2018).

E. Kim, K. Huang, S. Jegelka, and E. Olivetti, “Virtual screening of in-

organic materials synthesis parameters with deep learning,” npj Comput.

Mater.

, 53 (2017).

F. Ren, L. Ward, T. Williams, K. J. Laws, C. Wolverton, J. Hattrick-

Simpers, and A. Mehta, “Accelerated discovery of metallic glasses through

iteration of machine learning and high-throughput experiments,” Sci. Adv.

, eaaq1566 (2018).

K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, and A. Walsh, “Ma-

chine learning for molecular and materials science,” Nature

559

, 547–555

(2018).

B. Sanchez-Lengeling and A. Aspuru-Guzik, “Inverse molecular design us-

ing machine learning: Generative models for matter engineering.” Science

361

, 360–365 (2018).

J. N. Wei, D. Duvenaud, and A. Aspuru-Guzik, “Neural networks for

the prediction of organic chemistry reactions,” ACS Cent. Sci.

, 725–732

(2016).

P. Raccuglia, K. C. Elbert, P. D. F. Adler, C. Falk, M. B. Wenny, A. Mollo,

M. Zeller, S. A. Friedler, J. Schrier, and A. J. Norquist, “Machine-learning-

assisted materials discovery using failed experiments,” Nature

533

, 73–76

(2016).

Z. W. Ulissi, A. J. Medford, T. Bligaard, and J. K. Nørskov, “To address

surface reaction network complexity using scaling relations machine learn-

ing and DFT calculations,” Nat. Commun.

, 14621 (2017).

M. H. S. Segler and M. P. Waller, “Neural-symbolic machine learning for

retrosynthesis and reaction prediction,” Chem. - A Eur. J.

, 5966–5971

(2017).

M. H. S. Segler, M. Preuss, and M. P. Waller, “Planning chemical syntheses

with deep neural networks and symbolic AI,” Nature

555

, 604–610 (2018).

J. S. Smith, O. Isayev, and A. E. Roitberg, “ANI-1: an extensible neural

network potential with DFT accuracy at force field computational cost,”

Chem. Sci.

, 3192–3203 (2017).

J. S Smith, B. T. Nebgen, R. Zubatyuk, N. Lubbers, C. Devereux, K. Barros,

S. Tretiak, O. Isayev, and A. Roitberg, “Outsmarting quantum chemistry

through transfer learning,” (2018).

N. Lubbers, J. S. Smith, and K. Barros, “Hierarchical modeling of molec-

ular energies using a deep neural network,” J. Chem. Phys.

148

, 241715

(2018).

A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, “Gaussian approx-

imation potentials: The accuracy of quantum mechanics, without the elec-

trons,” Phys. Rev. Lett.

104

, 136403 (2010).

M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, “Fast and

accurate modeling of molecular atomization energies with machine learn-

ing,” Phys. Rev. Lett.

108

, 58301 (2012).

G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Hansen,

A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, “Machine learning

of molecular electronic properties in chemical compound space,” New J.

Phys.

, 95003 (2013).

K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Scheffler, O. A.

von Lilienfeld, A. Tkatchenko, and K.-R. Müller, “Assessment and vali-

dation of machine learning methods for predicting molecular atomization

energies,” J. Chem. Theory Comput.

, 3404 (2013).

P. Gasparotto and M. Ceriotti, “Recognizing molecular patterns by machine

learning: an agnostic structural definition of the hydrogen bond,” J. Chem.

Phys.

141

, 174110 (2014).

R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von Lilienfeld, “Big

data meets quantum chemistry approximations: the

∆

-machine learning ap-

proach,” J. Chem. Theory Comput.

, 2087 (2015).

J. Behler, “Perspective: Machine learning potentials for atomistic simula-

tions,” J. Chem. Phys.

145

, 170901 (2016).

S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley, “Molecular

graph convolutions: moving beyond fingerprints,” J. Comput. Aided Mol.

Des.

, 595 (2016).

F. Paesani, “Getting the right answers for the right reasons: toward pre-

dictive molecular simulations of water with many-body potential energy

functions,” Acc. Chem. Res.

, 1844 (2016).

K. T. Schütt, F. Arbabzadah, S. Chmiela, K.-R. Müller, and A. Tkatchenko,

“Quantum-chemical insights from deep tensor neural networks,” Nat. Com-

mun.

, 13890 (2017).

F. Brockherde, L. Vogt, L. Li, M. E. Tuckerman, K. Burke, and K.-R.

Müller, “Bypassing the Kohn-Sham equations with machine learning,” Nat.

Commun.

, 872 (2017).

Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu,

K. Leswing, and V. Pande, “MoleculeNet: a benchmark for molecular ma-

chine learning,” Chem. Sci.

, 513 (2018).

T. T. Nguyen, E. Székely, G. Imbalzano, J. Behler, G. Csányi, M. Ceri-

otti, A. W. Götz, and F. Paesani, “Comparison of permutationally invariant

polynomials, neural networks, and Gaussian approximation potentials in

representing water interactions through many-body expansions,” J. Chem.

Phys.

148

, 241725 (2018).

K. Yao, J. E. Herr, D. W. Toth, R. McKintyre, and J. Parkhill, “The

TensorMol-0.1 model chemistry: a neural network augmented with long-

range physics,” Chem. Sci.

, 2261–2269 (2018).

S. Fujikake, V. L. Deringer, T. H. Lee, M. Krynski, S. R. Elliott, and

G. Csányi, “Gaussian approximation potential modeling of lithium inter-

calation in carbon nanostructures,” J. Chem. Phys.

148

, 241714 (2018).

H. Li, C. Collins, M. Tanha, G. J. Gordon, and D. J. Yaron, “A Density

Functional Tight Binding Layer for Deep Learning of Chemical Hamiltoni-

ans,” J. Chem. Theory Comput.

, 5764–5776 (2018).

A. Grisafi, A. Fabrizio, B. Meyer, D. M. Wilkins, C. Corminboeuf, and

M. Ceriotti, “Transferable machine-learning model of the electron density,”

ACS Cent. Sci.

, 57–64 (2019).

L. Zhang, J. Han, H. Wang, R. Car, and W. E, “Deep potential molecular

dynamics: A scalable model with the accuracy of quantum mechanics,”

Phys. Rev. Lett.

120

, 143001 (2018).

R. T. McGibbon, A. G. Taube, A. G. Donchev, K. Siva, F. Hernández,

C. Hargus, K.-H. Law, J. L. Klepeis, and D. E. Shaw, “Improving the accu-

racy of Møller-Plesset perturbation theory with neural networks,” J. Chem.

Phys.

147

, 161725 (2017).

M. Welborn, L. Cheng, and T. F. Miller III, “Transferability in machine

learning for electronic structure via the molecular orbital basis,” J. Chem.

Theory Comput.

, 4772–4779 (2018).

L. C. Blum and J.-L. Reymond, “970 million druglike small molecules for

virtual screening in the chemical universe database GDB-13,” J. Am. Chem.

Soc.

131

, 8732 (2009).

R. K. Nesbet, “Brueckner’s theory and the method of superposition of con-

figurations,” Phys. Rev.

109

, 1632 (1958).

A. Szabo and N. S. Ostlund,

Modern Quantum Chemistry

(Dover, Mineola,

1996) pp. 231–239.

C. Møller and M. S. Plesset, “Note on an approximation treatment for

many-electron systems,” Phys. Rev.

, 618 (1934).

G. Knizia, “Intrinsic atomic orbitals: an unbiased bridge between quantum

theory and chemical concepts,” J. Chem. Theory Comput.

, 4834 (2013).

J. Behler and M. Parrinello, “Generalized neural-network representation of

high-dimensional potential-energy surfaces,” Phys. Rev. Lett.

, 146401

(2007).

S. F. Boys, “Construction of some molecular orbitals to be approximately

invariant for changes from one molecule to another,” Rev. Mod. Phys.

296–299 (1960).

U. Kaldor, “Localized orbitals for NH

, C

, and C

,” J. Chem. Phys.

, 1981–1987 (1967).

L. Breiman, “Random forests,” Mach. Learn.

, 5–32 (2001).

L. Breiman, “Statistical modeling: The two cultures,” Stat. Sci.

, 199–

215 (2001).

R. Tripathy, I. Bilionis, and M. Gonzalez, “Gaussian processes with built-

in dimensionality reduction : Applications to high-dimensional uncertainty

propagation,” J. Comput. Phys.

321

, 191–223 (2016).

See

https://github.com/thomasfmiller/MOB-ML

for the available

code.

Y. Shao, Z. Gan, E. Epifanovsky, A. T. Gilbert, M. Wormit, J. Kussmann,

A. W. Lange, A. Behn, J. Deng, X. Feng, D. Ghosh, M. Goldey, P. R. Horn,

L. D. Jacobson, I. Kaliman, R. Z. Khaliullin, T. Ku

s, A. Landau, J. Liu,

E. I. Proynov, Y. M. Rhee, R. M. Richard, M. A. Rohrdanz, R. P. Steele,

E. J. Sundstrom, H. L. Woodcock, P. M. Zimmerman, D. Zuev, B. Al-

brecht, E. Alguire, B. Austin, G. J. O. Beran, Y. A. Bernard, E. Berquist,

K. Brandhorst, K. B. Bravaya, S. T. Brown, D. Casanova, C.-M. Chang,

Y. Chen, S. H. Chien, K. D. Closser, D. L. Crittenden, M. Diedenhofen,

R. A. DiStasio, H. Do, A. D. Dutoi, R. G. Edgar, S. Fatehi, L. Fusti-Molnar,

A. Ghysels, A. Golubeva-Zadorozhnaya, J. Gomes, M. W. Hanson-Heine,

P. H. Harbach, A. W. Hauser, E. G. Hohenstein, Z. C. Holden, T.-C. Ja-

gau, H. Ji, B. Kaduk, K. Khistyaev, J. Kim, J. Kim, R. A. King, P. Klun-

zinger, D. Kosenkov, T. Kowalczyk, C. M. Krauter, K. U. Lao, A. D. Lau-

rent, K. V. Lawler, S. V. Levchenko, C. Y. Lin, F. Liu, E. Livshits, R. C.

Lochan, A. Luenser, P. Manohar, S. F. Manzer, S.-P. Mao, N. Mardirossian,

A. V. Marenich, S. A. Maurer, N. J. Mayhall, E. Neuscamman, C. M. Oana,

R. Olivares-Amaya, D. P. O’Neill, J. A. Parkhill, T. M. Perrine, R. Peverati,

A. Prociuk, D. R. Rehn, E. Rosta, N. J. Russ, S. M. Sharada, S. Sharma,

D. W. Small, A. Sodt, T. Stein, D. Stück, Y.-C. Su, A. J. Thom, T. Tsuchi-

mochi, V. Vanovschi, L. Vogt, O. Vydrov, T. Wang, M. A. Watson, J. Wen-

zel, A. White, C. F. Williams, J. Yang, S. Yeganeh, S. R. Yost, Z.-Q. You,

I. Y. Zhang, X. Zhang, Y. Zhao, B. R. Brooks, G. K. Chan, D. M. Chip-

man, C. J. Cramer, W. A. Goddard, M. S. Gordon, W. J. Hehre, A. Klamt,

H. F. Schaefer, M. W. Schmidt, C. D. Sherrill, D. G. Truhlar, A. Warshel,

X. Xu, A. Aspuru-Guzik, R. Baer, A. T. Bell, N. A. Besley, J.-D. Chai,

A. Dreuw, B. D. Dunietz, T. R. Furlani, S. R. Gwaltney, C.-P. Hsu, Y. Jung,

J. Kong, D. S. Lambrecht, W. Liang, C. Ochsenfeld, V. A. Rassolov, L. V.

Slipchenko, J. E. Subotnik, T. Van Voorhis, J. M. Herbert, A. I. Krylov,

P. M. Gill, and M. Head-Gordon, “Advances in molecular quantum chem-

istry contained in the Q-Chem 4 program package,” Mol. Phys.

113

, 184

(2015).

S. H. Vosko, L. Wilk, and M. Nusair, “Accurate spin-dependent electron

liquid correlation energies for local spin density calculations: a critical anal-

ysis,” Can. J. Phys.

, 1200 (1980).

C. Lee, W. Yang, and R. G. Parr, “Development of the Colle-Salvetti

correlation-energy formula into a functional of the electron density,” Phys.

Rev. B

, 785 (1988).

A. D. Becke, “Density-functional thermochemistry. III. The role of exact

exchange,” J. Chem. Phys.

, 5648 (1993).

P. J. Stephens, F. J. Devlin, C. F. Chabalowski, and M. J. Frisch, “Ab initio

calculation of vibrational absorption and circular dichroism spectra using

density functional force fields,” J. Phys. Chem.

, 11623 (1994).

P. C. Hariharan and J. A. Pople, “The influence of polarization functions

on molecular orbital hydrogenation energies,” Theor. Chim. Acta

, 213

(1973).

G. Bussi and M. Parrinello, “Accurate sampling using Langevin dynamics,”

Phys. Rev. E

, 056707 (2007).

H.-J. Werner, P. J. Knowles, G. Knizia, F. R. Manby, M. Schütz, P. Celani,

W. Györffy, D. Kats, T. Korona, R. Lindh, A. Mitrushenkov, G. Rauhut,

K. R. Shamasundar, T. B. Adler, R. D. Amos, S. J. Bennie, A. Bernhards-

son, A. Berning, D. L. Cooper, M. J. O. Deegan, A. J. Dobbyn, F. Eck-

ert, E. Goll, C. Hampel, A. Hesselmann, G. Hetzer, T. Hrenar, G. Jansen,

C. Köppl, S. J. R. Lee, Y. Liu, A. W. Lloyd, Q. Ma, R. A. Mata, A. J. May,

S. J. McNicholas, W. Meyer, T. F. Miller III, M. E. Mura, A. Nicklass, D. P.

O’Neill, P. Palmieri, D. Peng, K. Pflüger, R. Pitzer, M. Reiher, T. Shiozaki,

H. Stoll, A. J. Stone, R. Tarroni, T. Thorsteinsson, M. Wang, and M. Wel-

born, “Molpro, version 2018.3, a package of ab initio programs,” (2018),

see http://www.molpro.net.

T. H. Dunning, “Gaussian basis sets for use in correlated molecular calcula-

tions. I. The atoms boron through neon and hydrogen,” J. Chem. Phys.

1007 (1989).

S. Saebo and P. Pulay, “Local treatment of electron correlation,” Annu. Rev.

Phys. Chem.

, 213–236 (1993).

Cížek, “On the correlation problem in atomic and molecular systems.

Calculation of wavefunction components in Ursell–type expansion using

quantum–field theoretical methods,” J. Chem. Phys.

, 4256 (1966).

C. Hampel and H. J. Werner, “Local treatment of electron correlation in

coupled cluster theory,” J. Chem. Phys.

104

, 6286–6297 (1996).

R. J. Bartlett, J. D. Watts, S. A. Kucharski, and J. Noga, “Non-iterative

fifth-order triple and quadruple excitation energy corrections in correlated

methods,” Chem. Phys. Lett.

165

, 513–522 (1990).

M. Schütz, “Low-order scaling local electron correlation methods. III. Lin-

ear scaling local perturbative triples correction (T),” J. Chem. Phys.

113

9986–10001 (2000).

R. Polly, H.-J. Werner, F. R. Manby, and P. J. Knowles, “Fast Hartree-Fock

theory using local density fitting approximations,” Mol. Phys.

102

, 2311–

2321 (2004).

C. E. Rasmussen and C. K. I. Williams,

Gaussian processes for machine

learning

(MIT Press, Cambridge, MA, 2006).

GPy, “GPy: A gaussian process framework in python,”

http://github.

com/SheffieldML/GPy

(since 2012).

In principle, the smoothness of the Matérn kernel could be taken as a kernel

hyperparameter; however, this possibility was not explored in this work.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,

M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,

D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn:

machine learning in python,” Journal of Machine Learning Research

2825 (2011).

C. Cortes, L. D. Jackel, S. A. Solla, V. Vapnik, and J. S. Denker, “Learning

curves: Asymptotic values and rate of convergence,” in

Advances in Neural

Information Processing Systems 6

, edited by J. D. Cowan, G. Tesauro, and

J. Alspector (Morgan-Kaufmann, 1994) pp. 327–334.

K. Pearson, “Mathematical contributions to the theory of evolution. III. re-

gression, heredity, and panmixia,” Philos. Trans. R. Soc. A Math. Phys.

Eng. Sci.

187

, 253–318 (1896).

G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Hansen,

A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, “Machine learning

of molecular electronic properties in chemical compound space,” New J.

Phys.

, 095003 (2013).

F. A. Faber, A. S. Christensen, B. Huang, and O. A. von Lilienfeld,

“Alchemical and structural distribution based representation for universal

quantum machine learning,” J. Chem. Phys.

148

, 241717 (2018).

A. S. Christensen, F. A. Faber, and O. A. von Lilienfeld, “Operators in

quantum machine learning: Response properties in chemical space,” J.

Chem. Phys.

150

, 064105 (2019).

B. Kramer and A. MacKinnon, “Localization: theory and experiment,” Rep.

Prog. Phys.

, 1469 (1993).

L. Cheng, M. Welborn, A. S. Christensen, and T. F. Miller III, “Thermal-

ized (350K) QM7b, GDB-13, water, and short alkane quantum chemistry

dataset including MOB-ML features,” (2019), 10.22002/D1.1177.