of 8
A Universal Density Matrix Functional from Molecular Orbital-Based
Machine Learning: Transferability across Organic Molecules
Lixue Cheng,
1
Matthew Welborn,
1
Anders S. Christensen,
2
and Thomas F. Miller III
1,
a)
1)
Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125,
USA
2)
Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials,
Department of Chemistry, University of Basel, Basel, Switzerland
(Dated: 8 April 2019)
We address the degree to which machine learning can be used to accurately and transferably predict post-Hartree-Fock
correlation energies. Refined strategies for feature design and selection are presented, and the molecular-orbital-based
machine learning (MOB-ML) method is applied to several test systems. Strikingly, for the MP2, CCSD, and CCSD(T)
levels of theory, it is shown that the thermally accessible (350 K) potential energy surface for a single water molecule
can be described to within 1 millihartree using a model that is trained from only a single reference calculation at a
randomized geometry. To explore the breadth of chemical diversity that can be described, MOB-ML is also applied to
a new dataset of thermalized (350 K) geometries of 7211 organic models with up to seven heavy atoms. In comparison
with the previously reported
-ML method, MOB-ML is shown to reach chemical accuracy with three-fold fewer
training geometries. Finally, a transferability test in which models trained for seven-heavy-atom systems are used to
predict energies for thirteen-heavy-atom systems reveals that MOB-ML reaches chemical accuracy with 36-fold fewer
training calculations than
-ML (140 versus 5000 training calculations).
Machine learning (ML) has recently seen wide application
in chemistry, including in the fields of drug discovery,
1–3
ma-
terials design,
4–7
and reaction prediction.
8–12
In the context
of quantum chemistry, much work has focused on predicting
electronic energies or densities based on atom- or geometry-
specific features,
13–33
although other strategies have also been
employed.
34
Recently, we reported an accurate and trans-
ferable molecular-orbital-based machine learning (MOB-ML)
approach to the prediction of correlated wavefunction ener-
gies based on input features from a self-consistent field calcu-
lation, such as the Hartree-Fock (HF) method.
35
In this com-
munication, we present refinements to the MOB-ML method
with comparisons to test cases from our previous work. We
then demonstrate the performance of MOB-ML across a broad
swath of chemical space, as represented by the QM7b
18
and
GDB-13
36
test sets of organic molecules.
I. THEORY
The current work aims to predict post-Hartree-Fock cor-
related wavefunction energies using features of the Hartree-
Fock molecular orbitals (MOs). The starting point for the
MOB-ML method
35
is that the correlation energy can be de-
composed into pairwise occupied MO contributions
37,38
E
c
=
occ
i j
ε
i j
,
(1)
where the pair correlation energy
ε
i j
can be written as a func-
tional of the full set of MOs,
{
φ
p
}
, appropriately indexed by
i
and
j
ε
i j
=
ε
[
{
φ
p
}
i j
]
.
(2)
a)
Electronic mail: tfm@caltech.edu.
The functional
ε
is universal across all chemical systems;
for a given level of correlated wavefunction theory, there is a
corresponding
ε
that maps the HF MOs to the pair correlation
energy, regardless of the molecular composition or geometry.
Furthermore,
ε
simultaneously describes the pair correlation
energy for all pairs of occupied MOs (i.e., the functional form
of
ε
does not depend on
i
and
j
). For example, in second-order
Møller-Plessett perturbation theory (MP2),
39
the pair correla-
tion energies are
ε
MP2
i j
=
1
4
virt
ab
|〈
i j
||
ab
〉|
2
e
a
+
e
b
e
i
e
j
(3)
where
a
and
b
index virtual MOs,
e
p
is the Hartree-Fock or-
bital energy corresponding to MO
φ
p
, and
i j
||
ab
are anti-
symmetrized electron repulsion integrals.
38
A corresponding
expression for the pair correlation energy exists for any post-
Hartree-Fock method, but it is typically costly to evaluate in
closed form.
In MOB-ML, a machine learning model is constructed for
the pair energy functional
ε
i j
ε
ML
[
f
i j
]
(4)
where
f
i j
denotes a vector of features associated with MOs
i
and
j
. Eq. 4 thus presents the opportunity for the machine
learning of a universal density matrix functional for correlated
wavefunction energies, which can be evaluated at the cost of
the MO calculation.
Following our previous work,
35
the features
f
i j
correspond
to unique elements of the Fock (
F
), Coulomb (
J
), and ex-
change (
K
) matrices between
φ
i
,
φ
j
, and the set of virtual
orbitals. In the current work, we additionally include features
associated with matrix elements between pairs of occupied or-
bitals for which one member of the pair differs from
φ
i
or
φ
j
(i.e., non-
i
,
j
occupied MO pairs). The feature vector takes the
arXiv:1901.03309v3 [physics.chem-ph] 4 Apr 2019
2
form
f
i j
=(
F
ii
,
F
i j
,
F
j j
,
F
o
i
,
F
o
j
,
F
vv
i j
,
(5)
J
ii
,
J
i j
,
J
j j
,
J
o
i
,
J
o
j
,
J
v
i
,
J
v
j
,
J
vv
i j
,
K
i j
,
K
o
i
,
K
o
j
,
K
v
i
,
K
v
j
,
K
vv
i j
)
.
where for a given matrix (
F
,
J
, or
K
) the superscript o denotes
a row of its occupied–occupied block, the superscript v de-
notes a row of its occupied–virtual block, and the superscript
vv denotes its virtual–virtual block. Redundant elements are
removed, such that the virtual–virtual block is represented by
its upper triangle and the diagonal elements of
K
(which are
identical to those of
J
) are omitted. To increase transferability
and accuracy, we choose
φ
i
and
φ
j
to be localized molecular
orbitals (LMOs) rather than canonical MOs and employ va-
lence virtual LMOs
40
in place of the set of all virtual MOs
(as detailed in Ref. 35). We separate Eq. 4 to independently
machine learn the cases of
i
=
j
and
i
6
=
j
,
ε
i j
{
ε
ML
d
[
f
i
]
if
i
=
j
ε
ML
o
[
f
i j
]
if
i
6
=
j
(6)
where
f
i
denotes
f
ii
(Eq. 5) with redundant elements removed;
by separating the pair energies in this way, we avoid the situ-
ation where a single ML model is required to distinguish be-
tween the cases of
i
=
j
and
φ
i
being nearly degenerate to
φ
j
,
a distinction which can represent a sharp variation in the func-
tion to be learned.
In the current work, several technical refinements are in-
troduced to improve training efficiency (i.e., the accuracy and
transferability of the model as a function of the number of
training examples). These are now described.
Occupied LMO symmetrization.
The feature vector is pre-
processed to specify a canonical ordering of the occupied and
virtual LMO pairs. This reduces permutation of elements in
the feature vector, resulting in greater ML training efficiency.
Matrix elements
M
i j
(
M
=
F
,
J
,
K
) associated with
φ
i
and
φ
j
are rotated into gerade and ungerade combinations
M
ii
1
2
M
ii
+
1
2
M
j j
+
M
i j
(7)
M
j j
1
2
M
ii
+
1
2
M
j j
M
i j
M
i j
1
2
M
ii
1
2
M
j j
M
ip
1
2
M
ip
+
1
2
M
jp
M
jp
1
2
M
ip
1
2
M
jp
with the sign convention that
F
i j
is negative. Here,
p
indexes
any LMO other than
i
or
j
(i.e. an occupied LMO
k
, such that
i
6
=
k
6
=
j
, or a valence virtual LMO).
LMO sorting.
The virtual LMO pairs are sorted by in-
creasing distance from occupied orbitals
φ
i
and
φ
j
. Sorting in
this way ensures that features corresponding to valence virtual
LMOs are listed in decreasing order of heuristic importance,
and that the mapping between valence virtual LMOs and their
associated features is roughly preserved. We recognize this
issue could also potentially be addressed through the use of
symmetry functions,
41
but these are not employed in the cur-
rent work.
For purposes of sorting, distance is defined as
R
i j
a
=
φ
i
|
ˆ
R
|
φ
i
φ
a
|
ˆ
R
|
φ
a
+
φ
j
ˆ
R
φ
j
φ
a
|
ˆ
R
|
φ
a
,
(8)
where
φ
a
is a virtual LMO,
ˆ
R
is the Cartesian position opera-
tor, and
.
denotes the 2-norm.
φ
i
|
ˆ
R
|
φ
i
φ
a
|
ˆ
R
|
φ
a
rep-
resents the Euclidean distance between the centroids of orbital
i
and orbital
a
. Previously,
35
distances were defined based on
Coulomb repulsion, which was found to sometimes lead to
inconsistent sorting in systems with strongly polarized bonds.
The non-
i
,
j
occupied LMO pairs are sorted in the same man-
ner as the virtual LMO pairs.
Orbital localization.
We employ Boys localization
42
to
obtain the occupied LMOs, rather than IBO localization
40
which was employed in our previous work.
35
Particularly for
molecules that include triple bonds or multiple lone pairs, it
is found that Boys localization provides more consistent lo-
calization as a function of small geometry changes than IBO
localization; and the chemically unintuitive mixing of
σ
and
π
bonds in Boys localization (“banana bonds”)
43
does not
present a problem for the MOB-ML method.
Feature selection.
Prior to training, automatic feature se-
lection is performed using random forest regression
44
with the
mean decrease of accuracy criterion (sometimes called permu-
tation importance).
45
This technique was found to be more ef-
fective than our previous use
35
of the Gini importance score
44
which led to worse accuracy and failed to select any features
for the case of methane.
The reason for using feature selection in this way is
twofold. First, GPR performance is known to degrade for
high-dimensional datasets (in practice 50-100 features);
46
and
second, the use of the full feature set with small molecules can
lead to overfitting as features can become correlated.
A reference P
YTHON
implementation for generating MOB-
ML features is provided online.
47
II. COMPUTATIONAL DETAILS
Results are presented for a single water molecule; a se-
ries of alkane molecules; a thermalized version of the QM7b
set of 7211 molecules with up to seven C, O, N, S, and Cl
heavy atoms; and a thermalized version of the GDB-13 set of
molecules with thirteen C, O, N, S, and Cl heavy atoms. All
datasets employed in this work are provided in the Supporting
Information.
Training and test geometries are sampled at 50 fs in-
tervals from
ab initio
molecular dynamics trajectories per-
formed with the Q-C
HEM
5.0 software package,
48
using
the B3LYP
49–52
/6-31g*
53
level of theory and a Langevin
thermostat
54
at 350 K.
The features and training pair energies associated with
these geometries are computed using the M
OLPRO
2018.0
software package
55
in a cc-pVTZ basis set unless otherwise
3
noted.
56
Valence virtual orbitals used in feature construction
are determined with the Intrinsic Bond Orbital method.
40
Ref-
erence pair correlation energies are computed with second-
order Møller-Plessett perturbation theory (MP2)
39,57
and cou-
pled cluster with singles and doubles (CCSD)
58,59
as well as
with perturbative triples (CCSD(T)).
60,61
Density fitting for
both Coulomb and exchange integrals
62
is employed for all re-
sults below except those corresponding to the water molecule.
The frozen core approximation is used in all cases.
Gaussian process regression (GPR)
63
is employed to ma-
chine learn
ε
ML
d
and
ε
ML
o
(Eq. 6) using the GP
Y
1.9.6 software
package.
64
The GPR kernel is Matérn 5/2 with white noise
regularization.
63
Kernel hyperparameters are optimized with
respect to the log marginal likelihood objective for the water
and alkane series results, as well as for
ε
ML
d
of the QM7b re-
sults. We use the Matérn 3/2 kernel instead of the Matérn 5/2
kernel for the case of
ε
ML
o
for QM7b results, as it was empiri-
cally found to yield slightly better accuracy.
65
Feature selection is performed using the random forest
regression implementation in the S
CIKIT
-
LEARN
v0.20.0
package.
66
III. RESULTS
The ML model of Eq. 6 is a universal functional for any
molecular Hamiltonian. In principle, with an adequate feature
list and unlimited training data (and time), it should accurately
and simultaneously describe all molecular systems. In prac-
tice, we must train the ML model using a truncated feature list
and finite data. These choices determine the accuracy of the
model.
Below, we examine the performance of the MOB-ML
method in three increasingly broad regions of chemical space:
(i) training on randomized water molecule geometries and
predicting the energies of other water molecule geometries;
(ii) training on geometries of short alkanes and predicting the
energies of longer alkanes, and (iii) training on a small set
of organic molecules and predicting the energies of a broader
set of organic molecules. The first two test cases were intro-
duced in our previous work,
35
and we explore how the refined
methodology reported herein leads to improvements in accu-
racy and transferability. The last case represents a demand-
ing new test of transferability across chemical space. In all
cases, we report the ML prediction accuracy as a function of
the number of training examples.
As a first example, we consider the performance of MOB-
ML for a single water molecule. A separate model is trained
to predict the correlation energy at the MP2, CCSD, and
CCSD(T) levels of theory, using reference calculations on a
subset of 1000 randomized water geometries to predict the
correlation energy for the remainder. Feature selection with
an importance threshold of 1
×
10
3
results in 12, 11 and 10
features for
ε
ML
o
for MP2, CCSD and CCSD(T), respectively;
ten features are selected for
ε
ML
d
for all three post-Hartree-
Fock methods.
Figure 1 presents the test set prediction accuracy of each
MOB-ML model as a function of the number of training ge-
ometries (i.e., the “learning curve"). MOB-ML predictions
are shown for MP2, CCSD, and CCSD(T), and the model
shows the same level of accuracy for all three methods. Re-
markably, all three models achieve a prediction mean absolute
error (MAE) of 1 mH when trained on only a single water ge-
ometry, indicating that only a single reference calculation is
needed to provide chemical accuracy for the remaining 999
geometries at each level of theory. Since it contains 10 distinct
LMO pairs, this single geometry provides enough information
to yield a chemically accurate MOB-ML model for the global
thermally accessible potential energy surface.
For all three methods (Fig. 1), the learning curve exhibits
the expected
67
power-law behavior as a function of training
data, and the total error reaches microhartree accuracy with
tens of water training geometries. As compared to our pre-
vious results, where training on 200 geometries resulted in a
prediction MAE of 0.027 mH for the case of CCSD,
35
the
current implementation of the MOB-ML model is substan-
tially improved; the improvement for this case stems primar-
ily from the use of Boys localization,
42
which specifies unique
and consistent LMOs corresponding to the oxygen lone pairs.
0.0001
0.001
0.01
0.1
1
1
10
100
Prediction
MAE
(mH)
Number
of
training
geometries
MP2
CCSD
CCSD(T)
0.0001
0.001
0.01
0.1
1
1
10
100
FIG. 1. Learning curves for MOB-ML models trained on the water
molecule and used to predict the correlation energy of different wa-
ter molecule geometries at three levels of post-Hartree-Fock theory.
Prediction errors are summarized in terms of mean absolute error
(MAE).
Next, we explore the transferability of MOB-ML predic-
tions for a model that is trained on thermalized geometries of
short alkanes and then used for predictions on thermalized ge-
ometries of larger and more branched alkanes (n-butane and
isobutane). For these predictions, the absolute zero of energy
is shifted for each molecule to compare relative energies on its
potential energy surface (i.e., parallelity errors are removed).
These shifts are reported in the figure caption; for no other
results reported in the paper are parallelity errors removed.
In our previous work,
35
this test was performed using train-
ing data that combined of 100 geometries of methane, 300 of
ethane, and 50 or propane; the resulting predictions are repro-
duced here in Fig. 2a. This earlier implementation of MOB-
ML led to predictions for n-butane and isobutane with sub-
stantial errors (0.59 mH for n-butane and 0.93 mH for isobu-
tane) and noticable skew with respect to the true correlation
energy.
4
The predictions of MOB-ML in the current work (Fig. 2b)
are markedly improved. First, the overall prediction accuracy
is improved for all four summary statistics (inset in Fig. 2) de-
spite substantial reduction in the number of training examples
used. (The current work uses only 50 geometries of ethane,
20 geometries of propane, and no methane data.) Second, n-
butane and isobutane are predicted with nearly identical ac-
curacy. Finally, the prediction errors are no longer skewed as
a function of true correlation energy. The primary method-
ological sources of these improvements are found to be sym-
metrization of occupied orbitals (Eq. 7) and the improved fea-
ture selection methodology. The MOB-ML features in the
current work are selected with an importance threshold of
1
×
10
4
, resulting in 27 features for
ε
ML
d
and 12 features for
ε
ML
o
; results presented in Fig 2b for CCSD(T) are qualitatively
identical to those obtained for CCSD (not shown).
-6
-4
-2
0
2
4
6
-778
-776
-774
-772
-770
-768
Previous
work
[Ref.
35]
Prediction
(MAE*,
Max*,
Rel.
MAE*,
r
)
Trained
on
100
CH
4
+
300
C
2
H
6
+
50
C
3
H
8
(a)
Prediction
Error
(mH)
True
CCSD
E
c
(mH)
n-Butane
(0.59,
2.2
0.076%,
0.77)
Isobutane
(0.93,
3.2,
0.12%,
0.32)
-6
-4
-2
0
2
4
6
-806
-804
-802
-800
-798
Current
work
Prediction
(MAE*,
Max*,
Rel.
MAE*,
r
)
Trained
on
50
C
2
H
6
+
20
C
3
H
8
(b)
Prediction
Error
(mH)
True
CCSD(T)
E
c
(mH)
n-Butane
(0.32,
1.4
0.040%,
0.95)
Isobutane
(0.33,
1.4,
0.042%,
0.95)
FIG. 2. MOB-ML predictions of the correlation energy for 100 n-
butane and isobutane geometries, using MOB-ML features described
in the (b) current work , compared to (a) the previous MOB-ML fea-
tures of Ref. 35. Training sets are indicated in each panel of the
figure. MOB-ML prediction errors are plotted versus the (a) true
CCSD and (b) true CCSD(T) correlation energy. To remove paral-
lelity error, a global shift is applied to the predictions of n-butane
and isobutane by (a) 3.3 and 0.73 mH and (b) 0.90 and 0.17 mH, re-
spectively. Summary statistics that include this shift (indicated by an
asterisk) are presented, consisting of mean absolute error (MAE*),
maximum absolute error (Max*), MAE* as a percentage of
E
c
(Rel.
MAE*), and Pearson correlation coefficient (
r
).
68
The gray shaded
region corresponds to errors of
±
2 mH.
We now examine the transferability of the MOB-ML
method across a broad swath of chemical space. Specifically,
we consider the QM7b dataset,
69
which is comprised of 7,211
plausible organic molecules with 7 or fewer heavy atoms. The
chemical elements in QM7b are limited to those likely to be
found in drug-like compounds: C, H, O, N, S, and Cl. We re-
fer to the dataset used herein as QM7b-T to reflect the fact that
it contains geometries sampled at a temperature of 350 K (as
described in Sec. II), as opposed to DFT-optimized geome-
tries. The MOB-ML model is trained on a randomly chosen
subset of QM7b-T molecules and used to predict the correla-
tion energy of the remainder. Active learning was also tested
as a training data selection strategy, but was not found to im-
prove the predictions in the regime of chemical accuracy, and
in fact led to slightly worse transferability.
For comparison, a
-ML model
21
was trained on the
same molecules using kernel-ridge regression using the FCHL
representation
70
with a Gaussian kernel function (FCHL/
-
ML), as implemented in the QML package.
71
All hyperpa-
rameters of the model were set to those obtained in Ref. 70,
which have previously been demonstrated to work well for
datasets containing structures similar to those in QM7b-T.
71
A possible source of concern for MOB-ML is that the num-
ber of selected features would grow with the chemical com-
plexity of the training data. For example, 27 features for
ε
ML
d
and 12 features for
ε
ML
o
were selected in the alkane test case
using ethane + propane training data (Fig. 2b), whereas only
10 features for
ε
ML
d
and 10 features for
ε
ML
o
were selected for
the water test case at the CCSD(T) level of theory (Fig. 1).
To examine this, we perform feature selection on increasing
numbers of randomly selected molecules from the QM7b-T
dataset. Table I presents two statistics on the feature impor-
tance as a function of the number of training molecules: (i)
the number of “important features" (i.e., those whose permu-
tation importance
45
exceeds a set threshold of 2
×
10
4
and
5
×
10
5
for
ε
ML
d
and
ε
ML
o
, respectively) and (ii) the inverse
participation ratio
72
of the feature importance scores. The
latter is a threshold-less measure of the number of impor-
tant features; it takes a value of 1 when only 1 feature has
nonzero importance and
N
when all
N
features have equal im-
portance. Although the QM7b-T dataset contains many differ-
ent chemical elements and bonding motifs, Table I reveals that
the selected features remain compact and do not grow with the
number of training molecules. Indeed, for a large number of
training molecules, the number of selected features slightly
decreases, reaching 42 and 24 selected features for
ε
ML
d
and
ε
ML
o
, respectively, for the largest training sizes considered.
The learning curves for MOB-ML models trained at the
MP2/cc-pVTZ and CCSD(T)/cc-pVDZ levels of theory are
shown in Fig. 3a, as well as the FCHL/
-ML learning curve
for MP2/cc-pVTZ. At the MP2 level of theory, the MOB-ML
model achieves an accuracy of 2 mH with 110 training cal-
culations (representing 1.5% of the molecules in the QM7b-T
dataset), whereas the FCHL/
-ML requires over 300 train-
ing geometries to reach the same accuracy threshold. Fig. 3a
also illustrates the relative insensitivity of MOB-ML to the
level of electronic structure theory, with the learning curve for
CCSD(T)/cc-pVDZ reaching 2 mH accuracy with 140 train-
ing calculations. An analysis of the sensitivity of the MOB-
5
TABLE I. Number of features selected as a function of the number
randomly chosen training molecules for the QM7b-T dataset at the
CCSD(T)/cc-pVDZ level. The number of features that exceed an
importance threshold as well as the inverse participation ratio (IPR)
of the feature importance scores are reported (see text).
# of important features
feature weight IPR
Training size
ε
ML
d
ε
ML
o
ε
ML
d
ε
ML
o
20
50
28
4.720
1.116
50
46
28
3.718
1.097
100
46
26
3.450
1.115
200
42
24
3.430
1.120
ML predictions to the number of selected features is presented
in supporting information Fig. S1, which indicates that the re-
ported results are robust with respect to the number of selected
features.
As a final test of transferability of the MOB-ML and
FCHL/
-ML methods across chemical space, Figs. 3b and 3c
show results in which the ML methods are trained on QM7b-
T molecules and then used to predict results for a dataset of
13-heavy-atom organic molecules at thermalized geometries,
GDB-13-T, which includes six thermally sampled geometries
each of 1,000 13-heavy-atom organic molecules chosen ran-
domly from the GDB-13 dataset.
36
Like QM7b, the mem-
bers of GDB-13 contain C, H, N, O, S, and Cl. The size
of these molecules precludes the use of coupled cluster the-
ory to generate reference data; we therefore make comparison
at the MP2/cc-pVTZ level of theory, noting that MOB-ML
has consistently been shown to be insensitive to the employed
post-Hartree-Fock method (as in Fig. 3a). Transfer learning
results as a function of the number of training molecules are
presented in Figs. 3b (on a linear-linear scale) and 3c (on a
log-log scale).
Using the MOB-ML model that is trained on 110 seven-
heavy-atom molecules (corresponding to a prediction MAE of
1.89 mH for QM7b-T), we observe a prediction MAE of 3.88
mH for GDB-13-T. Expressed in terms of size-intensive quan-
tities, the prediction MAE per heavy atom is 0.277 mH and
0.298 mH for QM7b-T and GDB-13-T, respectively, indicat-
ing that the accuracy of the MOB-ML results are only slightly
worse when the model is transferred to the dataset of larger
molecules. On a per-heavy-atom basis, MOB-ML reaches
chemical accuracy with the same number of QM7b-T train-
ing calculations (approximately 100), regardless of whether it
is tested on QM7b-T or GDB-13-T.
In contrast with MOB-ML, the FCHL/
-ML method is
found to be significantly less transferable from QM7b-T to
GDB-13-T. For models trained using 100 seven-heavy-atom
molecules, the MAE per heavy atom of FCHL/
-ML is over
twice that of MOB-ML (Fig. 3b). Moreover, whereas MOB-
ML reaches the per-heavy-atom chemical accuracy threshold
with 140 training calculations, the FCHL/
-ML method only
reaches that threshold with 5000 training calculations.
0
2
4
6
8
10
20
100
200
300
(a)
QM7b-T
to
QM7b-T
Prediction
MAE
(mH)
FCHL/
Δ
-ML
MP2
MOB-ML
MP2
MOB-ML
CCSD(T)
0
2
4
6
8
10
20
100
200
300
0
0.5
1
1.5
2
20
100
200
300
(b)
Prediction
MAE/
heavy
atom
(mH)
FCHL/
Δ
-ML,
QM7b-T
to
GDB-13-T
MOB-ML,
QM7b-T
to
GDB-13-T
MOB-ML,
QM7b-T
to
QM7b-T
0
0.5
1
1.5
2
20
100
200
300
0.25
0.5
1
2
20
100
500
5000
(c)
Prediction
MAE/
heavy
atom
(mH)
Number
of
training
molecules
FCHL/
Δ
-ML,
QM7b-T
to
GDB-13-T
MOB-ML,
QM7b-T
to
GDB-13-T
MOB-ML,
QM7b-T
to
QM7b-T
0.25
0.5
1
2
20
100
500
5000
FIG. 3.
Learning curves for MOB-ML trained on QM7b-T and
applied to QM7b-T and GDB-13-T (see text for definition of these
datasets). FCHL/
-ML
70
results are provided for comparison. (a)
Predictions are made for QM7b-T at the MP2/cc-pVTZ (red) and
CCSD(T)/cc-pVDZ (orange) levels of theory. (b) Using the same
models trained on QM7b-T, predictions are made for GDB-13-T, and
reported in terms of MAE per heavy atom. (MOB-ML predictions
for QM7b-T are included for reference.) (c) As in the previous panel,
but plotted on a logarithmic scale and extended to show the full range
of FCHL/
-ML predictions. Error bars for FCHL/
-ML represent
prediction standard errors of the mean as measured over 10 models.
The gray shaded area corresponds to errors of 2 mH per 7 heavy
atoms.
IV. CONCLUSIONS
Molecular-orbital-based machine learning (MOB-ML) has
been shown to be a simple and strikingly accurate strategy
for predicting correlated wavefunction energies at the cost
of a Hartree-Fock calculation, benefiting from the intrinsic
transferability of the localized molecular orbital representa-
6
tion. The starting point for the MOB-ML method is a rigor-
ous mapping from the Hartree-Fock molecular orbitals to the
total correlation energy, which ensures that the use of suffi-
cient training data and molecular orbital features will produce
a model that matches the corresponding correlated wavefunc-
tion method across the entirety of chemical space. The current
work explores this possibility within the subspace of organic
molecules. It is shown that MOB-ML predicts energies of the
QM7b-T dataset to within a 2 millihartree accuracy using only
110 training calculations at the MP2/cc-pVTZ level of theory
and using 140 training calculations at the CCSD(T)/cc-pVDZ
level of theory. Direct comparison with FCHL/
-ML reveals
that MOB-ML is threefold more efficient in reaching chemi-
cal accuracy for describing QM7b-T. Further, a transferability
test of a MOB-ML model trained on QM7b-T to GDB-13-T
reveals that MOB-ML exhibits negligible degradation in accu-
racy; as a result, chemical accuracy is achieved with 36-times
fewer training calculations using MOB-ML versus FCHL/
-
ML. These results suggest that MOB-ML provides a promis-
ing approach toward the development of density matrix func-
tionals that are applicable across broad swathes of chemical
space.
ACKNOWLEDGMENTS
We thank Daniel Smith (Molecular Sciences Software In-
stitute) and Alberto Gobbi (Genentech) for a helpful discus-
sion about available training datasets. T.F.M. acknowledges
support from AFOSR award no. FA9550-17-1-0102. A.S.C.
acknowledges support from the National Centre of Compe-
tence in Research (NCCR) Materials Revolution: Computa-
tional Design and Discovery of Novel Materials (MARVEL)
of the Swiss National Science Foundation (SNSF). We addi-
tionally acknowledge support from the Resnick Sustainabil-
ity Institute postdoctoral fellowship (M.W.) and the Camille
Dreyfus Teacher-Scholar Award (T.F.M.). Computational re-
sources were provided by the National Energy Research Sci-
entific Computing Center (NERSC), a DOE Office of Science
User Facility supported by the DOE Office of Science under
contract DE-AC02-05CH11231.
SUPPORTING INFORMATION
The datasets used in this work are available for download;
73
they include MOB-ML features, HF energies, pair corre-
lation energies, and geometries. MOB-ML and FCHL/
-
ML predictions corresponding to Fig.
3 and an anal-
ysis of the sensitivity of the results of Fig.
3 to the
number of selected features are available in the attached
supporting information.
A reference P
YTHON
implemen-
tation for generating MOB-ML features is available at
https://github.com/thomasfmiller/MOB-ML
.
1
A. Lavecchia, “Machine-learning approaches in drug discovery: methods
and applications,” Drug Discov. Today
20
, 318–331 (2015).
2
E. Gawehn, J. A. Hiss, and G. Schneider, “Deep learning in drug discov-
ery,” Mol. Inform.
35
, 3–14 (2016).
3
M. Popova, O. Isayev, and A. Tropsha, “Deep reinforcement learning for
de novo drug design,” Sci. Adv.
4
, eaap7885 (2018).
4
E. Kim, K. Huang, S. Jegelka, and E. Olivetti, “Virtual screening of in-
organic materials synthesis parameters with deep learning,” npj Comput.
Mater.
3
, 53 (2017).
5
F. Ren, L. Ward, T. Williams, K. J. Laws, C. Wolverton, J. Hattrick-
Simpers, and A. Mehta, “Accelerated discovery of metallic glasses through
iteration of machine learning and high-throughput experiments,” Sci. Adv.
4
, eaaq1566 (2018).
6
K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, and A. Walsh, “Ma-
chine learning for molecular and materials science,” Nature
559
, 547–555
(2018).
7
B. Sanchez-Lengeling and A. Aspuru-Guzik, “Inverse molecular design us-
ing machine learning: Generative models for matter engineering.” Science
361
, 360–365 (2018).
8
J. N. Wei, D. Duvenaud, and A. Aspuru-Guzik, “Neural networks for
the prediction of organic chemistry reactions,” ACS Cent. Sci.
2
, 725–732
(2016).
9
P. Raccuglia, K. C. Elbert, P. D. F. Adler, C. Falk, M. B. Wenny, A. Mollo,
M. Zeller, S. A. Friedler, J. Schrier, and A. J. Norquist, “Machine-learning-
assisted materials discovery using failed experiments,” Nature
533
, 73–76
(2016).
10
Z. W. Ulissi, A. J. Medford, T. Bligaard, and J. K. Nørskov, “To address
surface reaction network complexity using scaling relations machine learn-
ing and DFT calculations,” Nat. Commun.
8
, 14621 (2017).
11
M. H. S. Segler and M. P. Waller, “Neural-symbolic machine learning for
retrosynthesis and reaction prediction,” Chem. - A Eur. J.
23
, 5966–5971
(2017).
12
M. H. S. Segler, M. Preuss, and M. P. Waller, “Planning chemical syntheses
with deep neural networks and symbolic AI,” Nature
555
, 604–610 (2018).
13
J. S. Smith, O. Isayev, and A. E. Roitberg, “ANI-1: an extensible neural
network potential with DFT accuracy at force field computational cost,”
Chem. Sci.
8
, 3192–3203 (2017).
14
J. S Smith, B. T. Nebgen, R. Zubatyuk, N. Lubbers, C. Devereux, K. Barros,
S. Tretiak, O. Isayev, and A. Roitberg, “Outsmarting quantum chemistry
through transfer learning,” (2018).
15
N. Lubbers, J. S. Smith, and K. Barros, “Hierarchical modeling of molec-
ular energies using a deep neural network,” J. Chem. Phys.
148
, 241715
(2018).
16
A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, “Gaussian approx-
imation potentials: The accuracy of quantum mechanics, without the elec-
trons,” Phys. Rev. Lett.
104
, 136403 (2010).
17
M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, “Fast and
accurate modeling of molecular atomization energies with machine learn-
ing,” Phys. Rev. Lett.
108
, 58301 (2012).
18
G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Hansen,
A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, “Machine learning
of molecular electronic properties in chemical compound space,” New J.
Phys.
15
, 95003 (2013).
19
K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Scheffler, O. A.
von Lilienfeld, A. Tkatchenko, and K.-R. Müller, “Assessment and vali-
dation of machine learning methods for predicting molecular atomization
energies,” J. Chem. Theory Comput.
9
, 3404 (2013).
20
P. Gasparotto and M. Ceriotti, “Recognizing molecular patterns by machine
learning: an agnostic structural definition of the hydrogen bond,” J. Chem.
Phys.
141
, 174110 (2014).
21
R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von Lilienfeld, “Big
data meets quantum chemistry approximations: the
-machine learning ap-
proach,” J. Chem. Theory Comput.
11
, 2087 (2015).
22
J. Behler, “Perspective: Machine learning potentials for atomistic simula-
tions,” J. Chem. Phys.
145
, 170901 (2016).
23
S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley, “Molecular
graph convolutions: moving beyond fingerprints,” J. Comput. Aided Mol.
Des.
30
, 595 (2016).
24
F. Paesani, “Getting the right answers for the right reasons: toward pre-
dictive molecular simulations of water with many-body potential energy
functions,” Acc. Chem. Res.
49
, 1844 (2016).
25
K. T. Schütt, F. Arbabzadah, S. Chmiela, K.-R. Müller, and A. Tkatchenko,
“Quantum-chemical insights from deep tensor neural networks,” Nat. Com-
mun.
8
, 13890 (2017).
7
26
F. Brockherde, L. Vogt, L. Li, M. E. Tuckerman, K. Burke, and K.-R.
Müller, “Bypassing the Kohn-Sham equations with machine learning,” Nat.
Commun.
8
, 872 (2017).
27
Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu,
K. Leswing, and V. Pande, “MoleculeNet: a benchmark for molecular ma-
chine learning,” Chem. Sci.
9
, 513 (2018).
28
T. T. Nguyen, E. Székely, G. Imbalzano, J. Behler, G. Csányi, M. Ceri-
otti, A. W. Götz, and F. Paesani, “Comparison of permutationally invariant
polynomials, neural networks, and Gaussian approximation potentials in
representing water interactions through many-body expansions,” J. Chem.
Phys.
148
, 241725 (2018).
29
K. Yao, J. E. Herr, D. W. Toth, R. McKintyre, and J. Parkhill, “The
TensorMol-0.1 model chemistry: a neural network augmented with long-
range physics,” Chem. Sci.
9
, 2261–2269 (2018).
30
S. Fujikake, V. L. Deringer, T. H. Lee, M. Krynski, S. R. Elliott, and
G. Csányi, “Gaussian approximation potential modeling of lithium inter-
calation in carbon nanostructures,” J. Chem. Phys.
148
, 241714 (2018).
31
H. Li, C. Collins, M. Tanha, G. J. Gordon, and D. J. Yaron, “A Density
Functional Tight Binding Layer for Deep Learning of Chemical Hamiltoni-
ans,” J. Chem. Theory Comput.
14
, 5764–5776 (2018).
32
A. Grisafi, A. Fabrizio, B. Meyer, D. M. Wilkins, C. Corminboeuf, and
M. Ceriotti, “Transferable machine-learning model of the electron density,”
ACS Cent. Sci.
5
, 57–64 (2019).
33
L. Zhang, J. Han, H. Wang, R. Car, and W. E, “Deep potential molecular
dynamics: A scalable model with the accuracy of quantum mechanics,”
Phys. Rev. Lett.
120
, 143001 (2018).
34
R. T. McGibbon, A. G. Taube, A. G. Donchev, K. Siva, F. Hernández,
C. Hargus, K.-H. Law, J. L. Klepeis, and D. E. Shaw, “Improving the accu-
racy of Møller-Plesset perturbation theory with neural networks,” J. Chem.
Phys.
147
, 161725 (2017).
35
M. Welborn, L. Cheng, and T. F. Miller III, “Transferability in machine
learning for electronic structure via the molecular orbital basis,” J. Chem.
Theory Comput.
14
, 4772–4779 (2018).
36
L. C. Blum and J.-L. Reymond, “970 million druglike small molecules for
virtual screening in the chemical universe database GDB-13,” J. Am. Chem.
Soc.
131
, 8732 (2009).
37
R. K. Nesbet, “Brueckner’s theory and the method of superposition of con-
figurations,” Phys. Rev.
109
, 1632 (1958).
38
A. Szabo and N. S. Ostlund,
Modern Quantum Chemistry
(Dover, Mineola,
1996) pp. 231–239.
39
C. Møller and M. S. Plesset, “Note on an approximation treatment for
many-electron systems,” Phys. Rev.
46
, 618 (1934).
40
G. Knizia, “Intrinsic atomic orbitals: an unbiased bridge between quantum
theory and chemical concepts,” J. Chem. Theory Comput.
9
, 4834 (2013).
41
J. Behler and M. Parrinello, “Generalized neural-network representation of
high-dimensional potential-energy surfaces,” Phys. Rev. Lett.
98
, 146401
(2007).
42
S. F. Boys, “Construction of some molecular orbitals to be approximately
invariant for changes from one molecule to another,” Rev. Mod. Phys.
32
,
296–299 (1960).
43
U. Kaldor, “Localized orbitals for NH
3
, C
2
H
4
, and C
2
H
2
,” J. Chem. Phys.
46
, 1981–1987 (1967).
44
L. Breiman, “Random forests,” Mach. Learn.
45
, 5–32 (2001).
45
L. Breiman, “Statistical modeling: The two cultures,” Stat. Sci.
16
, 199–
215 (2001).
46
R. Tripathy, I. Bilionis, and M. Gonzalez, “Gaussian processes with built-
in dimensionality reduction : Applications to high-dimensional uncertainty
propagation,” J. Comput. Phys.
321
, 191–223 (2016).
47
See
https://github.com/thomasfmiller/MOB-ML
for the available
code.
48
Y. Shao, Z. Gan, E. Epifanovsky, A. T. Gilbert, M. Wormit, J. Kussmann,
A. W. Lange, A. Behn, J. Deng, X. Feng, D. Ghosh, M. Goldey, P. R. Horn,
L. D. Jacobson, I. Kaliman, R. Z. Khaliullin, T. Ku
́
s, A. Landau, J. Liu,
E. I. Proynov, Y. M. Rhee, R. M. Richard, M. A. Rohrdanz, R. P. Steele,
E. J. Sundstrom, H. L. Woodcock, P. M. Zimmerman, D. Zuev, B. Al-
brecht, E. Alguire, B. Austin, G. J. O. Beran, Y. A. Bernard, E. Berquist,
K. Brandhorst, K. B. Bravaya, S. T. Brown, D. Casanova, C.-M. Chang,
Y. Chen, S. H. Chien, K. D. Closser, D. L. Crittenden, M. Diedenhofen,
R. A. DiStasio, H. Do, A. D. Dutoi, R. G. Edgar, S. Fatehi, L. Fusti-Molnar,
A. Ghysels, A. Golubeva-Zadorozhnaya, J. Gomes, M. W. Hanson-Heine,
P. H. Harbach, A. W. Hauser, E. G. Hohenstein, Z. C. Holden, T.-C. Ja-
gau, H. Ji, B. Kaduk, K. Khistyaev, J. Kim, J. Kim, R. A. King, P. Klun-
zinger, D. Kosenkov, T. Kowalczyk, C. M. Krauter, K. U. Lao, A. D. Lau-
rent, K. V. Lawler, S. V. Levchenko, C. Y. Lin, F. Liu, E. Livshits, R. C.
Lochan, A. Luenser, P. Manohar, S. F. Manzer, S.-P. Mao, N. Mardirossian,
A. V. Marenich, S. A. Maurer, N. J. Mayhall, E. Neuscamman, C. M. Oana,
R. Olivares-Amaya, D. P. O’Neill, J. A. Parkhill, T. M. Perrine, R. Peverati,
A. Prociuk, D. R. Rehn, E. Rosta, N. J. Russ, S. M. Sharada, S. Sharma,
D. W. Small, A. Sodt, T. Stein, D. Stück, Y.-C. Su, A. J. Thom, T. Tsuchi-
mochi, V. Vanovschi, L. Vogt, O. Vydrov, T. Wang, M. A. Watson, J. Wen-
zel, A. White, C. F. Williams, J. Yang, S. Yeganeh, S. R. Yost, Z.-Q. You,
I. Y. Zhang, X. Zhang, Y. Zhao, B. R. Brooks, G. K. Chan, D. M. Chip-
man, C. J. Cramer, W. A. Goddard, M. S. Gordon, W. J. Hehre, A. Klamt,
H. F. Schaefer, M. W. Schmidt, C. D. Sherrill, D. G. Truhlar, A. Warshel,
X. Xu, A. Aspuru-Guzik, R. Baer, A. T. Bell, N. A. Besley, J.-D. Chai,
A. Dreuw, B. D. Dunietz, T. R. Furlani, S. R. Gwaltney, C.-P. Hsu, Y. Jung,
J. Kong, D. S. Lambrecht, W. Liang, C. Ochsenfeld, V. A. Rassolov, L. V.
Slipchenko, J. E. Subotnik, T. Van Voorhis, J. M. Herbert, A. I. Krylov,
P. M. Gill, and M. Head-Gordon, “Advances in molecular quantum chem-
istry contained in the Q-Chem 4 program package,” Mol. Phys.
113
, 184
(2015).
49
S. H. Vosko, L. Wilk, and M. Nusair, “Accurate spin-dependent electron
liquid correlation energies for local spin density calculations: a critical anal-
ysis,” Can. J. Phys.
58
, 1200 (1980).
50
C. Lee, W. Yang, and R. G. Parr, “Development of the Colle-Salvetti
correlation-energy formula into a functional of the electron density,” Phys.
Rev. B
37
, 785 (1988).
51
A. D. Becke, “Density-functional thermochemistry. III. The role of exact
exchange,” J. Chem. Phys.
98
, 5648 (1993).
52
P. J. Stephens, F. J. Devlin, C. F. Chabalowski, and M. J. Frisch, “Ab initio
calculation of vibrational absorption and circular dichroism spectra using
density functional force fields,” J. Phys. Chem.
98
, 11623 (1994).
53
P. C. Hariharan and J. A. Pople, “The influence of polarization functions
on molecular orbital hydrogenation energies,” Theor. Chim. Acta
28
, 213
(1973).
54
G. Bussi and M. Parrinello, “Accurate sampling using Langevin dynamics,”
Phys. Rev. E
75
, 056707 (2007).
55
H.-J. Werner, P. J. Knowles, G. Knizia, F. R. Manby, M. Schütz, P. Celani,
W. Györffy, D. Kats, T. Korona, R. Lindh, A. Mitrushenkov, G. Rauhut,
K. R. Shamasundar, T. B. Adler, R. D. Amos, S. J. Bennie, A. Bernhards-
son, A. Berning, D. L. Cooper, M. J. O. Deegan, A. J. Dobbyn, F. Eck-
ert, E. Goll, C. Hampel, A. Hesselmann, G. Hetzer, T. Hrenar, G. Jansen,
C. Köppl, S. J. R. Lee, Y. Liu, A. W. Lloyd, Q. Ma, R. A. Mata, A. J. May,
S. J. McNicholas, W. Meyer, T. F. Miller III, M. E. Mura, A. Nicklass, D. P.
O’Neill, P. Palmieri, D. Peng, K. Pflüger, R. Pitzer, M. Reiher, T. Shiozaki,
H. Stoll, A. J. Stone, R. Tarroni, T. Thorsteinsson, M. Wang, and M. Wel-
born, “Molpro, version 2018.3, a package of ab initio programs,” (2018),
see http://www.molpro.net.
56
T. H. Dunning, “Gaussian basis sets for use in correlated molecular calcula-
tions. I. The atoms boron through neon and hydrogen,” J. Chem. Phys.
90
,
1007 (1989).
57
S. Saebo and P. Pulay, “Local treatment of electron correlation,” Annu. Rev.
Phys. Chem.
44
, 213–236 (1993).
58
J.
ˇ
Cížek, “On the correlation problem in atomic and molecular systems.
Calculation of wavefunction components in Ursell–type expansion using
quantum–field theoretical methods,” J. Chem. Phys.
45
, 4256 (1966).
59
C. Hampel and H. J. Werner, “Local treatment of electron correlation in
coupled cluster theory,” J. Chem. Phys.
104
, 6286–6297 (1996).
60
R. J. Bartlett, J. D. Watts, S. A. Kucharski, and J. Noga, “Non-iterative
fifth-order triple and quadruple excitation energy corrections in correlated
methods,” Chem. Phys. Lett.
165
, 513–522 (1990).
61
M. Schütz, “Low-order scaling local electron correlation methods. III. Lin-
ear scaling local perturbative triples correction (T),” J. Chem. Phys.
113
,
9986–10001 (2000).
62
R. Polly, H.-J. Werner, F. R. Manby, and P. J. Knowles, “Fast Hartree-Fock
theory using local density fitting approximations,” Mol. Phys.
102
, 2311–
2321 (2004).
63
C. E. Rasmussen and C. K. I. Williams,
Gaussian processes for machine
learning
(MIT Press, Cambridge, MA, 2006).
8
64
GPy, “GPy: A gaussian process framework in python,”
http://github.
com/SheffieldML/GPy
(since 2012).
65
In principle, the smoothness of the Matérn kernel could be taken as a kernel
hyperparameter; however, this possibility was not explored in this work.
66
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,
D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn:
machine learning in python,” Journal of Machine Learning Research
12
,
2825 (2011).
67
C. Cortes, L. D. Jackel, S. A. Solla, V. Vapnik, and J. S. Denker, “Learning
curves: Asymptotic values and rate of convergence,” in
Advances in Neural
Information Processing Systems 6
, edited by J. D. Cowan, G. Tesauro, and
J. Alspector (Morgan-Kaufmann, 1994) pp. 327–334.
68
K. Pearson, “Mathematical contributions to the theory of evolution. III. re-
gression, heredity, and panmixia,” Philos. Trans. R. Soc. A Math. Phys.
Eng. Sci.
187
, 253–318 (1896).
69
G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Hansen,
A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, “Machine learning
of molecular electronic properties in chemical compound space,” New J.
Phys.
15
, 095003 (2013).
70
F. A. Faber, A. S. Christensen, B. Huang, and O. A. von Lilienfeld,
“Alchemical and structural distribution based representation for universal
quantum machine learning,” J. Chem. Phys.
148
, 241717 (2018).
71
A. S. Christensen, F. A. Faber, and O. A. von Lilienfeld, “Operators in
quantum machine learning: Response properties in chemical space,” J.
Chem. Phys.
150
, 064105 (2019).
72
B. Kramer and A. MacKinnon, “Localization: theory and experiment,” Rep.
Prog. Phys.
56
, 1469 (1993).
73
L. Cheng, M. Welborn, A. S. Christensen, and T. F. Miller III, “Thermal-
ized (350K) QM7b, GDB-13, water, and short alkane quantum chemistry
dataset including MOB-ML features,” (2019), 10.22002/D1.1177.