Supporting Information:
Multi-Label Classification Models for the
Prediction of Cross-Coupling Reaction Conditions
Michael R. Maser,
†
,
§
Alexander Y. Cui,
‡
,
§
Serim Ryou,
¶
,
§
Travis J. DeLano,
†
Yisong Yue,
‡
and Sarah E. Reisman
∗
,
†
†
Division of Chemistry and Chemical Engineering, California Institute of Technology,
Pasadena, California 91125, United States
‡
Department of Computing and Mathematical Sciences, California Institute of Technology,
Pasadena, California 91125, United States
¶
Computational Vision Lab, California Institute of Technology, Pasadena, California 91125,
United States
§
Equal contribution.
E-mail: reisman@caltech.edu
S1 Data preparation and reaction dictionaries
Full procedures for data processing are outlined in our previous preprint.
S1
An example
protocol with full code is included in the associated github repository:
https://github.com
/slryou41/reaction-gcnn.git
in the path:
data/data
processing
example.ipynb
. The
worked example includes procedures for sorting reagents into categories by reaction role and
aggregating into a full reaction dictionary. Final dictionaries for all four datasets as .csv files
can be found in the repository path:
data/all
dictionaries/
, and are tabulated below.
S-1
Table S1: Suzuki dataset dictionary.
category
bin label
dataset name
instances
metal
M1
tetrakis(triphenylphosphine) palladium(0)
55829
M2
palladium diacetate
16927
M3
(1,1’-
bis(diphenylphosphino)ferrocene)palladium(II)
dichloride
13723
M4
dichloro(1,1’-
bis(diphenylphosphanyl)ferrocene)palladium(II)*CH2Cl2
8918
M5
bis-triphenylphosphine-palladium(II) chloride
8761
M6
tris-(dibenzylideneacetone)dipalladium(0)
5241
M7
palladium dichloride
1512
M8
bis(dibenzylideneacetone)-palladium(0)
1013
M9
dichloro[1,1’-bis(di-t-
butylphosphino)ferrocene]palladium(II)
1074
M10
bis(tri-t-butylphosphine)palladium(0)
736
M11
chloro(2-dicyclohexylphosphino-2?,4?,6?-
triisopropyl-1,1?-biphenyl)[2-(2?-amino-1,1?-
biphenyl?)]palladium(II)
729
M12
bis(di-tert-?butyl(4-
?dimethylaminophenyl)?phosphine)?dichloropalladium(II)
711
M13
bis(eta3-allyl-mu-chloropalladium(II))
559
M14
tris(dibenzylideneacetone)dipalladium(0)
chloroform complex
509
M15
palladium 10% on activated carbon
861
M16
sodium tetrachloropalladate(II)
283
M17
palladium
280
M18
(2-dicyclohexylphosphino-2?,4?,6?-triisopropyl-
1,1?-biphenyl)[2-(2?-amino-1,1?-
biphenyl)]palladium(II)
methanesulfonate
191
M19
bis(benzonitrile)palladium(II) dichloride
179
M20
(1,2-dimethoxyethane)dichloronickel(II)
158
M21
bis(1,5-cyclooctadiene)nickel (0)
155
M22
[1,3-bis(2,6-diisopropylphenyl)imidazol-2-
ylidene](3-chloropyridyl)palladium(ll)
dichloride
151
M23
(bis(tricyclohexyl)phosphine)palladium(II)
dichloride
148
M24
dichloro bis(acetonitrile) palladium(II)
143
M25
Pd EnCat-30TM
137
M26
nickel(II) nitrate hexahydrate
106
M27
palladium(II) trifluoroacetate
106
Continued on next page
S-2
Table S1 – continued from previous page
category
bin label
dataset name
instances
M28
dichlorobis[1-
(dicyclohexylphosphanyl)piperidine]palladium(II)
102
L1
triphenylphosphine
4489
L2
dicyclohexyl-(2’,6’-dimethoxybiphenyl-2-yl)-
phosphane
3163
L3
XPhos
2100
L4
tricyclohexylphosphine
1808
L5
tris-(o-tolyl)phosphine
902
L6
tri-tert-butyl phosphine
694
L7
tri tert-butylphosphoniumtetrafluoroborate
616
L8
trisodium tris(3-sulfophenyl)phosphine
556
L9
1,1’-bis-(diphenylphosphino)ferrocene
486
ligand
L10
4,5-bis(diphenylphos4,5-bis(diphenylphosphino)-
9,9-dimethylxanthenephino)-9,9-dimethylxanthene
424
L11
CyJohnPhos
370
L12
ruphos
293
L13
1,3,5,7-tetramethyl-8-phenyl-2,4,6-trioxa-8-
phosphatricyclo[3.3.1.13,7]decane
279
L14
tricyclohexylphosphine tetrafluoroborate
240
L15
johnphos
223
L16
4,4’-di-tert-butyl-2,2’-bipyridine
216
L17
catacxium A
192
L18
trifuran-2-yl-phosphane
183
L19
triphenyl-arsane
182
L20
1,1’-bis(di-tertbutylphosphino)ferrocene
142
L21
2,2’-bis-(diphenylphosphino)-1,1’-binaphthyl
129
L22
Tedicyp
218
L23
bis[2-(diphenylphosphino)phenyl] ether
108
B1
potassium carbonate
48981
B2
sodium carbonate
39769
B3
potassium phosphate
17799
B4
caesium carbonate
13345
B5
sodium hydrogencarbonate
3722
B6
cesium fluoride
2810
B7
sodium hydroxide
2156
B8
potassium hydroxide
2155
B9
potassium fluoride
2097
B10
triethylamine
1370
B11
potassium phosphate tribasic trihydrate
1016
B12
potassium acetate
931
B13
potassium tert-butylate
912
Continued on next page
S-3
Table S1 – continued from previous page
category
bin label
dataset name
instances
B14
potassium phosphate monohydrate
826
B15
sodium acetate
418
base
B16
sodium t-butanolate
392
B17
barium dihydroxide
374
B18
N-ethyl-N,N-diisopropylamine
336
B19
lithium hydroxide
321
B20
potassium phosphate tribasic heptahydrate
317
B21
diisopropylamine
209
B22
sodium methylate
175
B23
tetrabutyl ammonium fluoride
173
B24
barium hydroxide octahydrate
171
B25
potassium dihydrogenphosphate
166
B26
potassium fluoride dihydrate
156
B27
1,4-diaza-bicyclo[2.2.2]octane
154
B28
lithium hydroxide monohydrate
143
B29
tetra-butylammonium acetate
137
B30
sodium phosphate
133
B31
potassium hydrogencarbonate
131
B32
dipotassium hydrogenphosphate
127
B33
tripotassium phosphate n hydrate
123
B34
cesiumhydroxide monohydrate
112
B35
sodium phosphate dodecahydrate
103
solvent
S1
tetrahydrofuran
18113
S2
ethanol
24836
S3
methanol
4374
S4
1,4-dioxane
39107
S5
1,2-dimethoxyethane
19131
S6
acetonitrile
4366
S7
toluene
28304
S8
N,N-dimethyl formamide
15110
S9
water
92175
S10
1-methyl-pyrrolidin-2-one
472
additive
A1
tetrabutylammomium bromide
3003
A2
water
1606
A3
lithium chloride
819
A4
hydrogenchloride
780
A5
copper(l) iodide
546
A6
silver(l) oxide
405
A7
copper diacetate
183
A8
dmap
181
A9
Aliquat 336
169
Continued on next page
S-4
Table S1 – continued from previous page
category
bin label
dataset name
instances
A10
cetyltrimethylammonim bromide
167
A11
copper(l) chloride
164
A12
potassium bromide
157
A13
trifluoroacetic acid
151
A14
oxygen
148
A15
air
112
A16
18-crown-6 ether
127
A17
sodium dodecyl-sulfate
113
Table S2: C–N dataset dictionary.
category
bin label
dataset name
instances
M1
copper(l) iodide
8180
M2
tris-(dibenzylideneacetone)dipalladium(0)
6995
M3
palladium diacetate
4668
M4
copper
1875
M5
bis(dibenzylideneacetone)-palladium(0)
1292
M6
copper(I) oxide
932
M7
copper(II) oxide
402
M8
copper(l) chloride
386
M9
copper(I) bromide
348
M10
bis(eta3-allyl-mu-chloropalladium(II))
433
M11
copper(II) acetate monohydrate
352
M12
(1,1’-
bis(diphenylphosphino)ferrocene)palladium(II)
dichloride
159
M13
bis(tri-t-butylphosphine)palladium(0)
181
M14
iron(III) chloride
116
M15
copper(II) bis(trifluoromethanesulfonate)
91
M16
copper(ll) bromide
88
M17
bis-triphenylphosphine-palladium(II) chloride
82
M18
copper(II) sulfate
154
M19
bis(acetylacetonate)nickel(II)
78
M20
palladium 10% on activated carbon
71
M21
tetrakis(triphenylphosphine) palladium(0)
68
M22
dichlorobis(tri-O-tolylphosphine)palladium
67
M23
(1,2-dimethoxyethane)dichloronickel(II)
66
M24
palladium dichloride
63
M25
copper(I) thiophene-2-carboxylate
58
M26
cobalt(II) oxalate dihydrate
56
Continued on next page
S-5
Table S2 – continued from previous page
category
bin label
dataset name
instances
M27
copper dichloride
52
metal
M28
dichloro(1,3-bis(2,6-bis(3-
pentyl)phenyl)imidazolin-2-ylidene)(3-
chloropyridyl)palladium(II)
49
M29
chloro[2-(dicyclohexylphosphino)-3
,6-dimethoxy-2?,4?, 6?-triisopropyl- 1,1?-biphenyl]
[2-(2-aminoethyl)phenyl]palladium(II)
97
M30
[2-(di-tert-butylphosphino)-2?,4?,6?-triisopropyl-
1,1?-biphenyl][2-((2-
aminoethyl)phenyl)]palladium(II)
chloride
49
M31
iron(III) oxide
48
M32
C36H45Cl2N3OPd
46
M33
nickel(II) bromide trihydrate
45
M34
copper acetylacetonate
45
M35
C36H43Cl2N3Pd
45
M36
C30H43O2P*C13H12N(1-)*CH3O3S(1-)*Pd(2+)
45
M37
bis(1,5-cyclooctadiene)nickel (0)
45
M38
CuPy2Cl2
42
M39
dichloro(3-chloropyridinyl)(1,3-
(diisopropylphenyl)-4,5-
bis(dimethylamino)imidazol-2-
ylidene)palladium(II)
41
M40
Al2O3*Cu(2+)
40
M41
C33H40ClN3O2Pd
38
M42
dichloro(1,1’-
bis(diphenylphosphanyl)ferrocene)palladium(II)*CH2Cl2
36
M43
(1,3-bis(2,6-diisopropylphenyl)-3,4,5,6-
tetrahydropyrimidin-2-ylidene)Pd(cinnamyl,
3-phenylallyl)Cl
36
M44
copper(II)iodide
35
L1
2,2’-bis-(diphenylphosphino)-1,1’-binaphthyl
3014
L2
tri-tert-butyl phosphine
2137
L3
4,5-bis(diphenylphos4,5-bis(diphenylphosphino)-
9,9-dimethylxanthenephino)-9,9-dimethylxanthene
1995
L4
N,N‘-dimethylethylenediamine
1543
L5
XPhos
830
L6
1,10-Phenanthroline
703
L7
L-proline
620
L8
1,1’-bis-(diphenylphosphino)ferrocene
653
L9
johnphos
444
L10
DavePhos
374
Continued on next page
S-6
Table S2 – continued from previous page
category
bin label
dataset name
instances
L11
triphenylphosphine
275
L12
ruphos
266
L13
tri tert-butylphosphoniumtetrafluoroborate
265
L14
tert-butyl XPhos
242
L15
dicyclohexyl-(2’,6’-dimethoxybiphenyl-2-yl)-
phosphane
261
L16
trans-1,2-Diaminocyclohexane
724
L17
8-quinolinol
206
L18
CyJohnPhos
192
L19
trans-N,N’-dimethylcyclohexane-1,2-diamine
535
L20
ethylenediamine
175
L21
dimethylaminoacetic acid
167
L22
dicyclohexyl[3,6-dimethoxy-2?,4?,6?-tris(1-
methylethyl)[1,1?-biphenyl]-2-yl]phosphine
165
L23
2,2,6,6-tetramethylheptane-3,5-dione
163
L24
1,1’-bi-2-naphthol
162
L25
bis[2-(diphenylphosphino)phenyl] ether
170
L26
1-dicyclohexylphosphino-2-di-tert-
butylphosphinoethylferrocene
142
ligand
L27
P(i-BuNCH2)3CMe
110
L28
di-tert-butyl2?-isopropoxy-[1,1?-binaphthalen]-2-
ylphosphane
108
L29
di-tert-butyl(2,2-diphenyl-1-methyl-1-
cyclopropyl)phosphine
104
L30
P(i-BuNCH2CH2)3N
98
L31
N,N-dimethylglycine hydrochoride
96
L32
N-[2-(di(1-
adamantyl)phosphino)phenyl]morpholine
92
L33
5-(di-tert-butylphosphino)-1?, 3?,
5?-triphenyl-1?H-[1,4?]bipyrazole
91
L34
2-[2-(dicyclohexylphosphino)-phenyl]-1-methyl-1H-
indole
86
L35
4,4’-di-tert-butyl-2,2’-bipyridine
85
L36
tris-(o-tolyl)phosphine
77
L37
2,8,9-tris(2-methylpropyl)-2,5,8,9-tetraaza-1-
phosphabicyclo[3.3.3]undecane
75
L38
cis-N,N’-dimethyl-1,2-diaminocyclohexane
74
L39
monophosphine 1,2,3,4,5-pentaphenyl-1’-(di-tert-
butylphosphino)ferrocene
55
L40
5-(di(adamantan-1-yl)phosphino)-1?,3?,5?-
triphenyl-1?H-1,4?-bipyrazole
55
L41
t-BuBrettPhos
53
Continued on next page
S-7
Table S2 – continued from previous page
category
bin label
dataset name
instances
L42
2-(N,N-dimethylamino)athanol
53
L43
tricyclohexylphosphine
46
L44
(E)-3-(dimethylamino)-1-(2-hydroxyphenyl)prop-
2-en-1-one
46
L45
di-tert-butylneopentylphosphonium
tetrafluoroborate
38
L46
2-di-tertbutylphosphino-3,4,5,6-tetramethyl-
2’,4’,6’-triisopropyl-1,1’-biphenyl
37
L47
N,N,N,N,-tetramethylethylenediamine
26
base
B1
sodium t-butanolate
9103
B2
potassium carbonate
7129
B3
caesium carbonate
6957
B4
potassium phosphate
3274
B5
potassium tert-butylate
2167
B6
potassium hydroxide
1420
B7
triethylamine
500
B8
lithium hexamethyldisilazane
432
B9
sodium hydroxide
430
B10
sodium hydride
228
B11
sodium carbonate
200
B12
potassium phosphate monohydrate
130
B13
sodium hydrogencarbonate
128
solvent
S1
toluene
11970
S2
1,4-dioxane
5273
S3
N,N-dimethyl-formamide
4246
S4
dimethyl sulfoxide
3790
S5
water
2464
S6
tetrahydrofuran
1457
S7
1,2-dimethoxyethane
878
S8
tert-butyl alcohol
841
S9
acetonitrile
780
S10
ethanol
549
S11
5,5-dimethyl-1,3-cyclohexadiene
497
S12
isopropyl alcohol
316
S13
nitrobenzene
315
S14
1-methyl-pyrrolidin-2-one
292
S15
hexane
286
S16
N,N-dimethyl acetamide
281
S17
1,2-dichloro-benzene
254
S18
neat (no solvent)
240
S19
o-xylene
219
Continued on next page
S-8
Table S2 – continued from previous page
category
bin label
dataset name
instances
S20
xylene
208
S21
methanol
180
S22
ethyl acetate
163
A1
18-crown-6 ether
455
A2
tetrabutylammomium bromide
372
A3
8-quinolinol
206
A4
dimethylaminoacetic acid
167
A5
1,1’-bi-2-naphthol
162
A6
water
160
A7
sodium sulfate
132
A8
2-(2-methyl-1-oxopropyl)cyclohexanone
121
A9
phenylboronic acid
120
A10
1,3-bis[(2,6-diisopropyl)phenyl]imidazolinium
chloride
109
A11
potassium iodide
108
A12
hydrogenchloride
107
A13
ethylene glycol
102
A14
N,N-dimethylglycine hydrochoride
96
A15
1,3-bis[2,6-diisopropylphenyl]imidazolium chloride
95
A16
N-ethylmorpholine
93
A17
tert-butyl alcohol
87
A18
aluminum oxide
84
A19
D-glucose
83
A20
cetyltrimethylammonim bromide
71
A21
1,3-dimethyl-3,4,5,6-tetrahydro-2(1H)-
pyrimidinone
68
A22
N’,N’-diphenyl-1H-pyrrole-2-carbohydrazide
63
A23
manganese(II) fluoride
63
A24
dimethyl sulfoxide
55
A25
2-(N,N-dimethylamino)athanol
53
A26
air
48
A27
iron(III) oxide
48
A28
(E)-3-(dimethylamino)-1-(2-hydroxyphenyl)prop-
2-en-1-one
46
A29
lithium bromide
44
A30
6,7-dihydro-5H-quinolin-8-one oxime
43
A31
CVT-2537
42
A32
ammonium chloride
42
A33
1-methyl-pyrrolidin-2-one
42
A34
tetra(n-butyl)ammonium hydroxide
40
A35
salicylaldehyde-oxime
39
A36
potassium fluoride on basic alumina
39
Continued on next page
S-9
Table S2 – continued from previous page
category
bin label
dataset name
instances
additive
A37
toluene-4-sulfonic acid
38
A38
lithium chloride
38
A39
pipecolic Acid
37
A40
oxygen
37
A41
metformin hydrochloride
37
A42
8-Hydroxyquinoline-N-oxide
37
A43
1-(5,6,7,8-tetrahydroquinolin-8-yl)ethan-1-one
36
A44
tetrabutyl ammonium fluoride
36
A45
N1,N2-bis(thiophen-2-ylmethyl)oxalamide
36
A46
N-phenyl-2-pyridincarboxamide-1-oxide
35
A47
N-((1-oxy-pyridin-2-yl)methyl)oxalamic acid
35
A48
C19H19N5O
35
A49
manganese(II) chloride tetrahydrate
34
A50
1-tetralone oxime
32
A51
N1,N2-bis(2,4,6-trimethoxyphenyl)oxalamide
31
A52
N-methoxy-1H-pyrrole-2-carboxamide
29
A53
ammonia
29
A54
1,2,3-Benzotriazole
29
A55
dimethylenecyclourethane
28
A56
isopropylmagnesium chloride
27
A57
N-(2-cyanophenyl)pyridine-2-carboxamide
27
A58
C20H18N2O2
27
A59
2-acetylcyclohexanone
27
A60
2,6-di-tert-butyl-4-methyl-phenol
26
A61
2-hydroxy-pyridine N-oxide
26
A62
TPGS-750-M
25
A63
N?-phenyl-1H-pyrrole-2-carbohydrazide
25
A64
lanthanum(III) oxide
25
A65
ethylmagnesium bromide
25
A66
ethyl 2-oxocyclohexane carboxylate
25
A67
1,4-dimethyl-1,2,3,4-tetrahydro-5H-
benzo[e][1,4]diazepin-5-one
25
A68
tetraethoxy orthosilicate
24
A69
N,N,N’,N’-tetramethylguanidine
24
A70
C20H26N4O4
24
A71
2-methyl-8-quinolinol
24
A72
2-carbomethoxy-3-hydroxyquinoxaline-di-N-oxide
24
A73
1,3-diisopropyl-1H-imidazol-3-ium chloride
24
A74
MOF-199
24
S-10
Table S3: Negishi dataset dictionary.
category
bin label
dataset name
instances
M1
tetrakis(triphenylphosphine) palladium(0)
1902
M2
tris-(dibenzylideneacetone)dipalladium(0)
572
M3
bis-triphenylphosphine-palladium(II) chloride
418
M4
palladium diacetate
370
M5
bis(dibenzylideneacetone)-palladium(0)
344
M6
(1,1’-
bis(diphenylphosphino)ferrocene)palladium(II)
dichloride
334
M7
bis(tri-t-butylphosphine)palladium(0)
273
M8
dichloro(1,1’-
bis(diphenylphosphanyl)ferrocene)palladium(II)*CH2Cl2
248
M9
dichlorobis[1-
(dicyclohexylphosphanyl)piperidine]palladium(II)
168
M10
palladium(l) tri-tert-butylphosphine iodide dimer
101
M11
bis(tricyclohexylphosphine)nickel(II) dichloride
99
M12
[(C10H13-1,3-(CH2P(C6H11)2)2)Pd(Cl)]
87
M13
1,3-
bis[(diphenylphosphino)propane]dichloronickel(II)
63
M14
bis(1,5-cyclooctadiene)nickel (0)
56
metal
M15
nickel dichloride
56
M16
tris(dibenzylideneacetone)dipalladium(0)
chloroform complex
46
M17
dichlorobis(tri-O-tolylphosphine)palladium
46
M18
palladium
44
M19
[1,3-bis(2,6-diisopropylphenyl)imidazol-2-
ylidene](3chloro-pyridyl)palladium(II)
dichloride
136
M20
C20H20ClN3Ni
42
M21
dichloro(1,3-bis(2,6-bis(3-
pentyl)phenyl)imidazolin-2-ylidene)(3-
chloropyridyl)palladium(II)
39
M22
bis(triphenylphosphine)nickel(II) chloride
38
M23
C26H24ClN2NiP*0.1C7H8
35
M24
cobalt(II) chloride
34
M25
copper(I) bromide
31
M26
C40H55Cl5N3Pd
30
M27
[1,3-bis(2,6-diisoheptylphenyl)-4,5-
dichloroimidazol-2-ylidene](3-
chloropyridyl)palladium(II)
dichloride
29
M28
dichloro bis(acetonitrile) palladium(II)
29
Continued on next page
S-11
Table S3 – continued from previous page
category
bin label
dataset name
instances
M29
palladium(II) trifluoroacetate
27
M30
1,2-bis(diphenylphosphino)ethane nickel(II)
chloride
27
M31
C27H22Cl2N3NiP
24
M32
C38H34Br2N4Ni2P2
23
L1
1,1’-bis-(diphenylphosphino)ferrocene
233
L2
dicyclohexyl-(2’,6’-dimethoxybiphenyl-2-yl)-
phosphane
196
L3
XPhos
187
L4
triphenylphosphine
161
L5
trifuran-2-yl-phosphane
128
L6
monophosphine 1,2,3,4,5-pentaphenyl-1’-(di-tert-
butylphosphino)ferrocene
95
L7
tris-(o-tolyl)phosphine
70
ligand
L8
Ruphos
61
L9
2?-(dicyclohexylphophanyl)-N2,N2,N6,N6-
tetramethyl[1,1?-biphenyl]-2,6-diamine
37
L10
tripiperidino-phosphine
37
L11
tri tert-butylphosphoniumtetrafluoroborate
35
L12
1,2-bis-(dicyclohexylphosphino)ethane
33
L13
4,5-bis(diphenylphos4,5-bis(diphenylphosphino)-
9,9-dimethylxanthenephino)-9,9-dimethylxanthene
31
L14
N,N,N,N,-tetramethylethylenediamine
24
L15
[2,2]bipyridinyl
22
L16
4,4’-di-tert-butyl-2,2’-bipyridine
21
L17
1,2-Ph2-3,4-bis(2,4,6-(t-Bu)3-
phenylphophinidene)cyclobutene
20
L18
johnphos
20
L19
tri-tert-butyl phosphine
19
L20
tricyclohexylphosphine
18
temperature
T1
-163 - 18
101
T2
18 - 23
2313
T3
23 - 50
643
T4
50 - 61
975
T5
61 - 80
658
T6
80 - 100
673
T7
100 - 120
696
T8
120 - 220
479
S1
tetrahydrofuran
4525
S2
N,N-dimethyl-formamide
1003
S3
1-methyl-pyrrolidin-2-one
674
Continued on next page
S-12
Table S3 – continued from previous page
category
bin label
dataset name
instances
solvent
S4
toluene
541
S5
1,4-dioxane
335
S6
N,N-dimethyl acetamide
247
S7
hexane
219
S8
diethyl ether
203
S9
water
122
S10
1,2-dimethoxyethane
67
additive
A1
lithium chloride
243
A2
zinc
207
A3
copper(l) iodide
154
A4
water
62
A5
diisobutylaluminium hydride
59
A6
tetrabutylammomium bromide
52
A7
ammonium chloride
51
A8
n-butyllithium
46
A9
1-Methylpyrrolidine
42
A10
Li2CoCl4
42
A11
sodium formate
42
A12
hydrogenchloride
36
A13
caesium carbonate
36
A14
zinc diacetate
32
A15
potassium carbonate
30
A16
norborn-2-ene
30
A17
lithium bromide
28
A18
1,3-dimethyl-3,4,5,6-tetrahydro-2(1H)-
pyrimidinone
23
A19
methylzinc chloride
22
A20
1-methyl-pyrrolidin-2-one
21
A21
zinc(II) chloride
21
A22
isoquinoline
20
A23
sodium carbonate
19
A24
1-ethyl-2-pyrrolidinone
18
A25
sodium
16
A26
1-methyl-1H-imidazole
15
A27
oxovanadium(V) ethoxydichloride
12
A28
2-(N,N-dimethylamino)athanol
11
A29
[bdmim][BF4]
11
A30
1-butyl-2-(diphenylphosphanyl)-3-
methylimidazolium
hexafluorophosphate
11
S-13
Table S4: PKR dataset dictionary.
category
bin label
dataset name
instances
M1
dicobalt octacarbonyl
614
M2
di(rhodium)tetracarbonyl dichloride
333
M3
chloro(1,5-cyclooctadiene)rhodium(I) dimer
140
M4
[RhCl(CO)dppp]2
92
M5
cobalt(II) bromide
44
M6
palladium dichloride
33
metal
M7
dodecacarbonyl-triangulo-triruthenium
32
M8
Co2Rh2 nanoparticles immobilized on charcoal
50
M9
tetracobaltdodecacarbonyl
44
M10
molybdenum hexacarbonyl
23
M11
Rh(dppp)2Cl
19
M12
cobalt nanoparticles on charcoal
36
M13
methylidynetricobalt nonacarbonyl
25
M14
bis(triphenylphosphine)(carbonyl)rhodium chloride
11
M15
PdCl(OHNCCH3C6H4)(C5H5N)
10
M16
bis(1,5-cyclooctadiene)diiridium(I) dichloride
9
M17
diiron nonacarbonyl
9
M18
iron(II) bis(trimethylsilyl)amide
9
ligand
L1
1,1,3,3-tetramethyl-2-thiourea
128
L2
1,3-bis-(diphenylphosphino)propane
93
L3
2,2’-bis-(diphenylphosphino)-1,1’-binaphthyl
31
L4
triphenylphosphine
16
L5
tri-n-butylphosphine sulfide
15
L6
(S)-3,5-di-tert-butyl-4-methoxyphenyl-(6,6?-
dimethoxybiphenyl-2,2?-diyl)-
bis(diphenylphosphine)
12
temperature
T1
-98 - 20
83
T2
20
961
T3
20 - 60
299
T4
60 - 77
370
T5
77 - 94
338
T6
94 - 120
395
T7
120 - 180
303
S1
toluene
966
S2
dichloromethane
601
S3
tetrahydrofuran
318
S4
1,2-dichloro-ethane
171
S5
1,2-dimethoxyethane
145
S6
acetonitrile
141
S7
not listed
102
Continued on next page
S-14
Table S4 – continued from previous page
category
bin label
dataset name
instances
solvent
S8
water
71
S9
benzene
76
S10
para-xylene
136
S11
hexane
43
S12
dimethyl sulfoxide
39
S13
1,4-dioxane
33
S14
dibutyl ether
33
S15
diethyl ether
22
activator
A1
4-methylmorpholine N-oxide
420
A2
trimethylamine-N-oxide
212
A3
dimethyl sulfoxide
137
A4
cyclohexylamine
68
A5
n-butyl methyl sulfide
27
A6
silver trifluoromethanesulfonate
23
A7
silver tetrafluoroborate
18
A8
silver hexafluoroantimonate
19
A9
(4-fluorobenzyl)(methyl)sulfide
14
A10
dinitrogen monoxide
14
A11
4-methylmorpholine 4-oxide monohydrate
13
CO (g)
G1
carbon monoxide
1169
G2
none
1580
additive
O1
4 A molecular sieve
84
O2
zinc
50
O3
hydrogen
40
O4
ethylene glycol
30
O5
cetyltrimethylammonim bromide
22
O6
Celite
17
O7
Triton X(R)-100
37
O8
acetic anhydride
15
O9
lithium chloride
15
O10
water
11
O11
oxygen
10
O12
potassium carbonate
8
O13
triethylsilane
8
pressure
P1
37 - 760
35
P2
760
2392
P3
760 - 7600
169
P4
7600 - 7500600
153
S-15
S2 Computational details and hyperparameters
S2.1 Gradient-boosting machines (GBMs)
Numerical inputs for GBM models were constructed by tokenizing SMILES strings for
each molecule in a reaction with character–to–number mappings, and calculating chemical
descriptor vectors using Mordred.
S2
Coded examples for these processing protocols are
provided in the associated github repository at the path
models/gbms/parsing
example
suzuki/
. All GBM classifiers were implemented using Microsoft’s lightGBM.
S3
Specific
non-default parameter settings are included in Table S5.
Table S5: Computational details and general parameters used for GBM models.
parameter
value
description
train/valid/test
81/9/10
data splitting
a
max
depth
7
maximum tree depth for base learners
tree
method
‘gpu
hist’
split continuous features into discrete bins
eval
metric
‘aucpr’
evaluation metric
a
Training, validation, and test sets were identical to those in GCNs.
S2.1.1 Binary relevance method (BM)
In BM experiments, an independent
lightgbm.LGBMClassifier
was fit for each label bin in
a dataset’s dictionary using the full input representation.
S2.1.2 Classifier trellises (CTs)
In CT experiments,
lightgbm.LGBMClassifier
s were fit for each label bin in a dataset’s
dictionary as part of a grid structure in which predictions are made sequentially and are
passed to downstream models as additional inputs (see main text for explanation). Mutual
information (MI) matrices were constructed for each dataset’s label dictionary using sci-
kit learn’s
sklearn.metrics.mutual
info
score
module.
S4
Classifier trellises were then
constructed following the algorithm reported by Read et al. (see main text and associated
code for details).
S5
As shown in the example in the main text, each model takes additional
S-16
input from the bins in directions
north
,
west
, and
northwest
of it. Models on the edges
of the trellis take input only from those bins in the available directions (i.e., propagation
does not wrap between rows). Here each trellis was initialized using the label
M1
, the most
commonly used metal in each dataset. This can be chosen by user preference, expert intuition,
or at random. Full MI matrices and trellis structures for all four datasets are provided below.
S-17
Figure S1: Optimized classifier trellis for the Suzuki dataset.
Figure S2: Mutual information matrix for the Suzuki dataset.
S-18
Figure S3: Optimized classifier trellis for the C–N dataset.
Figure S4: Mutual information matrix for the C–N dataset.
S-19