Supplementary Material for OrbNet Denali: A machine learning potential
for biological and organic chemistry with semi-empirical cost and DFT
accuracy
Anders S. Christensen,
1,
a)
Sai Krishna Sirumalla,
1,
a)
Zhuoran Qiao,
2
Michael B. O’Connor,
1
Daniel G. A. Smith,
1
Feizhi Ding,
1
Peter J. Bygrave,
1
Animashree Anandkumar,
3, 4
Matthew Welborn,
1
Frederick R. Manby,
1
and
Thomas F. Miller III
1, 2
1)
Entos, Inc.
Los Angeles, CA 90027
2)
Division of Chemistry and Chemical Engineering
California Institute of Technology
Pasadena, CA 91125
3)
Division of Engineering and Applied Sciences
California Institute of Technology
Pasadena, CA 91125
4)
NVIDIA
Santa Clara, CA 95051
(*Electronic mail: tom@entos.ai)
(Dated: 21 October 2021)
I. ADDITIONAL GMTKN55 STATISTICS
a)
Indicates equal contribution.
2
FIG. S1. The figure shows the performance of OrbNet Denali on the GMTKN55 subsets that only contain elements, charge- and spin-states that
are supported by OrbNet Denali, but contain chemistry that is not represented in the training set (for example water clusters, C
60
molecules,
transition states, etc.). In the left colum, the MAE values are with respect to the CCSD(T) references in the GMTKN55 dataset
1
while, in the
right column, the MAE values are with respect to
ω
B97X-D3/def2-TZVP reference energies (the same method used to generate the OrbNet
training data). Due to an error in the training of OrbNet Denali 10%, that model does not support lithium and the ALK8 test set is excluded
here.
3
FIG. S2. The figure shows the performance of OrbNet Denali on the GMTKN55 subsets that only contain elements, charge- and spin-states
that are supported by OrbNet Denali, and where molecules with similar chemistry are found in the training set for OrbNet Denali. In the left
column, the MAE values are with respect to the CCSD(T) references in the GMTKN55 dataset
1
while, in the right column, the MAE values
are with respect to
ω
B97X-D3/def2-TZVP reference energies (the same method used to generate the OrbNet training data).
4
TABLE S1. Overview of the mean absolute error (MAE) of every GMTKN55 subset for every method where applicable as discussed in
the text. The reference values are the high-level reference values from the GMTKN55 database.
1
The MAE values displayed are in units of
kcal/mol, along with the WTMAD-
n
which is calculated as discussed in the main text.
MAE [kcal/mol]
Subset
ω
B97X-D3
B97-3c OrbNet Denali ANI-1ccx ANI-2x GFN2-xTB GFN1-xTB GFN0-xTB
W4-11
3.52
7.48
G21EA
7.07
8.24
G21IP
3.07
3.69
DIPCS10
5.58
4.16
274.75
301.44
1936.47
PA26
3.32
5.58
5.29
163.05
162.30
938.31
SIE4x4
12.18
22.54
ALKBDE10
5.28
7.90
YBDE18
2.17
5.13
AL2X6
3.07
2.22
14.63
15.18
8.30
HEAVYSB11
2.56
2.51
NBPRC
1.67
1.56
10.27
10.51
10.96
33.00
ALK8
4.07
3.54
60.55
23.91
52.55
97.75
RC21
3.28
6.39
G2RC
4.58
8.34
21.92
29.27
52.60
BH76RC
2.31
3.64
FH51
2.47
4.42
3.58
13.38
11.41
12.05
22.14
TAUT15
1.05
1.71
1.13
2.51
0.98
5.79
3.89
DC13
6.81
11.33
MB16-43
40.75
27.39
DARC
1.27
4.34
1.31
1.13
5.48
17.77
15.82
15.46
RSE43
1.44
3.49
BSR36
4.36
1.90
0.77
2.69
8.85
2.76
2.34
2.03
CDIE20
0.72
1.98
0.61
2.09
3.63
1.80
2.04
2.27
ISO34
1.18
1.87
1.21
1.24
12.94
6.90
6.30
10.15
ISOL24
2.75
5.19
2.64
11.68
10.92
10.78
C60ISO
1.18
6.27
11.82
91.96
44.94
5.80
7.88
11.89
PArel
0.67
1.80
1.60
5.86
4.54
7.09
BH76
2.25
6.89
BHPERI
2.85
4.59
4.72
10.24
9.32
6.84
BHDIV10
1.01
5.80
6.83
8.12
8.40
9.99
INV24
1.63
1.96
4.59
3.32
5.80
5.06
BHROT27
0.47
0.61
0.39
1.42
1.17
2.38
1.68
PX13
3.18
7.08
14.84
13.32
2.74
8.30
17.16
WCPT18
2.14
5.46
4.91
6.13
3.84
5.30
8.40
RG18
0.11
0.12
0.11
0.32
0.44
ADIM6
0.36
0.53
0.40
0.64
0.34
1.15
1.01
0.51
S22
0.36
0.29
0.45
5.00
1.51
0.76
1.33
1.63
S66
0.52
0.32
0.48
2.52
1.11
0.73
1.07
1.29
HEAVY28
0.26
0.80
0.61
0.65
0.91
WATER27
14.23
9.41
2.39
3.05
7.39
17.81
CARBHB12
0.83
2.07
0.91
1.79
0.67
2.32
PNICO23
0.38
1.64
1.71
1.11
2.33
2.00
HAL59
0.34
1.62
2.15
1.28
1.34
1.76
AHB21
3.40
3.27
1.81
2.97
4.68
26.02
CHB6
1.32
1.37
3.58
5.40
3.95
20.51
IL16
2.09
2.34
4.60
4.32
5.69
55.34
IDISP
2.78
3.91
2.61
12.39
20.64
6.78
6.53
13.27
ICONF
0.34
0.38
1.25
1.63
2.63
1.55
ACONF
0.09
0.21
0.06
0.56
0.16
0.19
0.66
0.44
Amino20x4
0.26
0.33
0.35
1.00
0.95
1.11
1.22
PCONF21
0.33
0.83
0.47
3.18
2.20
1.76
2.17
1.99
MCONF
0.48
0.33
0.42
0.92
0.82
1.72
1.44
1.34
SCONF
0.30
0.77
0.32
1.58
2.47
1.64
2.50
2.10
UPU23
0.94
0.51
0.87
2.91
1.24
3.74
BUT14DIOL
0.41
0.41
0.40
0.61
1.42
1.25
0.95
0.60
WTMAD-1 [arb. units]
3.37
5.76
7.19
14.12
13.56
12.00
14.72
24.83
WTMAD-2 [arb. units]
5.87
10.22
9.84
23.80
24.11
18.89
22.63
37.99
5
TABLE S2. Overview of the mean absolute error (MAE) of every GMTKN55 subset for every method where applicable as discussed in the
text. The reference energies are calculated at the
ω
B97X-D3 level of theory The MAE values displayed are in units of kcal/mol, along with
the WTMAD-
n
which is calculated as discussed in the main text.
MAE [kcal/mol]
SUBSET
B97-3c
Denali Denali 10% ANI-1ccx ANI-2x GFN2-xTB GFN1-xTB GFN0-xTB
W4-11
7.77
G21EA
2.16
G21IP
2.87
DIPCS10
2.63
279.84
306.53
1941.57
PA26
2.57
5.06
5.80
166.38
165.62
941.63
SIE4x4
10.82
ALKBDE10
4.71
YBDE18
3.81
AL2X6
1.55
11.56
13.98
8.24
HEAVYSB11
2.10
NBPRC
2.92
9.23
14.87
10.41
10.53
33.40
ALK8
6.34
63.24
22.58
53.20
98.41
RC21
4.43
G2RC
5.75
19.71
25.91
53.23
BH76RC
2.58
FH51
4.02
1.33
4.36
11.15
9.81
10.67
22.15
TAUT15
1.15
0.27
0.45
1.84
1.26
5.25
3.54
DC13
11.15
MB16-43
28.97
RSE43
2.06
BSR36
2.47
3.97
4.64
1.68
4.48
6.39
4.13
5.97
CDIE20
1.65
0.82
1.24
2.48
4.14
1.14
1.42
2.43
ISO34
1.56
0.32
0.87
1.78
12.07
6.52
6.40
10.20
ISOL24
5.43
1.17
1.39
11.37
10.03
10.79
C60ISO
19.66
2.08
2.61
105.51
58.49
19.34
21.43
25.43
PArel
1.59
1.58
3.20
6.04
4.82
7.40
BH76
5.28
BHPERI
7.34
3.09
2.87
12.22
11.29
9.53
BHDIV10
5.86
6.95
7.24
8.58
9.10
10.54
INV24
2.79
4.72
6.59
4.05
5.78
6.06
BHROT27
0.60
0.29
0.51
1.61
1.40
2.77
2.05
PX13
3.90
18.02
18.56
15.72
3.41
6.98
17.15
WCPT18
4.33
6.00
6.05
6.56
3.88
5.36
7.94
RG18
0.17
0.16
0.38
0.45
ADIM6
0.89
0.07
0.11
1.00
0.70
1.51
1.37
0.87
S22
0.30
0.41
0.59
5.36
1.62
1.01
1.64
1.94
S66
0.47
0.32
0.46
3.03
0.97
1.17
1.56
1.70
HEAVY28
0.86
0.57
0.71
0.89
WATER27
5.26
14.28
13.02
14.20
15.49
24.07
CARBHB12
1.27
0.78
1.51
2.45
1.05
3.06
PNICO23
1.55
1.84
2.15
1.04
2.30
2.18
HAL59
1.69
2.09
2.45
1.28
1.29
1.84
AHB21
0.97
2.60
1.96
4.79
5.10
23.89
CHB6
1.01
3.45
5.44
4.41
21.31
IL16
0.81
3.10
6.80
6.33
7.30
53.26
IDISP
4.18
0.69
3.02
11.52
18.43
8.11
7.04
14.43
ICONF
0.49
1.08
2.73
1.61
2.72
1.54
ACONF
0.26
0.07
0.11
0.56
0.16
0.18
0.62
0.41
Amino20x4
0.32
0.16
0.36
1.09
0.93
1.15
1.25
PCONF21
0.92
0.29
0.86
3.27
2.26
1.63
2.04
1.78
MCONF
0.36
0.10
0.30
1.18
0.93
1.97
1.85
1.61
SCONF
0.53
0.15
0.19
1.51
2.21
1.39
2.25
1.85
UPU23
0.54
0.62
0.84
2.12
1.21
4.49
BUT14DIOL
0.11
0.07
0.22
1.01
1.03
1.43
1.20
0.65
WTMAD-1 [arb. units]
4.98
6.42
7.40
15.82
13.23
12.32
15.06
25.89
WTMAD-2 [arb. units]
8.84
8.40
11.41
27.44
22.52
20.19
23.79
39.72
6
TABLE S3. Overview of which GMTKN55 subsets are supported by OrbNet Denali, ANI-1ccx, ANI-2x, and the GFN
n
-xTB methods. For
OrbNet Denali, the allowed subsets are those that only contain singlet-state molecules with the elements H, Li, B, C, N, O, F, Na, Mg, Si, P, S,
Cl, K, Ca, Br, and I. For ANI-1ccx
2
and ANI-2x
3
only neutral single-state molecules containing the elements H, C, N, O (for ANI-1ccx) or H,
C, N, O, F, Cl, S (for ANI-2x) are allowed. For the GFN
n
-xTB family of methods, only singlet-state molecules are allowed in this list.
Category
Subset
OrbNet Denali ANI-1ccx ANI-2x GFN
n
-xTB
Basic properties and reaction energies
YBDE18
-
-
-
-
W4-11
-
-
-
-
TAUT15
+
-
+
+
SIE4x4
-
-
-
-
RC21
-
-
-
-
PA26
+
-
-
+
NBPRC
+
-
-
+
HEAVYSB11
-
-
-
-
G2RC
-
-
-
+
G21IP
-
-
-
-
G21EA
-
-
-
-
FH51
+
-
+
+
DIPCS10
-
-
-
+
DC13
-
-
-
-
BH76RC
-
-
-
-
ALKBDE10
-
-
-
-
ALK8
+
-
-
+
AL2X6
-
-
-
+
Reaction and isomerisation energies
RSE43
-
-
-
-
PArel
+
-
-
+
MB16-43
-
-
-
-
ISOL24
+
-
-
+
ISO34
+
+
+
+
DARC
+
+
+
+
CDIE20
+
+
+
+
C60ISO
+
+
+
+
BSR36
+
+
+
+
Reaction barrier heights
WCPT18
+
-
+
+
PX13
+
-
+
+
INV24
+
-
-
+
BHROT27
+
-
+
+
BHPERI
+
-
-
+
BHDIV10
+
-
-
+
BH76
+
-
-
-
Intermolecular noncovalent interactions WATER27
+
-
-
+
S66
+
+
+
+
S22
+
+
+
+
RG18
-
-
-
+
PNICO23
+
-
-
+
IL16
+
-
-
+
HEAVY28
-
-
-
+
HAL59
+
-
-
+
CHB6
+
-
-
+
CARBHB12
+
-
-
+
AHB21
+
-
-
+
ADIM6
+
+
+
+
Intramolecular noncovalent interactions UPU23
+
-
-
+
SCONF
+
+
+
+
PCONF21
+
+
+
+
MCONF
+
+
+
+
IDISP
+
+
+
+
ICONF
+
-
-
+
BUT14DIOL
+
+
+
+
Amino20x4
+
-
+
+
ACONF
+
+
+
+
7
1
L. Goerigk, A. Hansen, C. Bauer, S. Ehrlich, A. Najibi, and S. Grimme,
Physical Chemistry Chemical Physics
19
, 32184 (2017).
2
J. S. Smith, O. Isayev, and A. E. Roitberg, Sci. Data
4
, 170193 (2017).
3
C. Devereux, J. S. Smith, K. K. Davis, K. Barros, R. Zubatyuk, O. Isayev,
and A. E. Roitberg, Journal of Chemical Theory and Computation
16
, 4192
(2020).