Multilabel Classification Models for the Prediction of Cross-Coupling Reaction Conditions
Abstract
Machine-learned ranking models have been developed for the prediction of substrate-specific cross-coupling reaction conditions. Data sets of published reactions were curated for Suzuki, Negishi, and C–N couplings, as well as Pauson–Khand reactions. String, descriptor, and graph encodings were tested as input representations, and models were trained to predict the set of conditions used in a reaction as a binary vector. Unique reagent dictionaries categorized by expert-crafted reaction roles were constructed for each data set, leading to context-aware predictions. We find that relational graph convolutional networks and gradient-boosting machines are very effective for this learning task, and we disclose a novel reaction-level graph attention operation in the top-performing model.
Additional Information
© 2021 American Chemical Society. Received: October 23, 2020; Publication Date: January 8, 2021. We thank Prof Pietro Perona for mentorship guidance and helpful project discussions and Chase Blagden for help in structuring the GBM experiments. Fellowship support was provided by the NSF (M.R.M., T.J.D. Grant No. DGE-1144469). S.E.R. is a Heritage Medical Research Institute Investigator. Y.Y. is supported in part by NSF 1645832 and NSF 1918839 and funding from Raytheon and Beyond Limits. S.R. is supported by grants from Disney Research and from Nissan Corporation. Financial support from Research Corporation is warmly acknowledged. Author Contributions: M.R.M., A.Y.C., and S.R. contributed equally to this work. The authors declare no competing financial interest.Attached Files
Submitted - Multi-Label_Classification_Models_for_the_Prediction_of_Cross-Coupling_Reaction_Conditions_v1.pdf
Supplemental Material - ci0c01234_si_001.pdf
Files
Name | Size | Download all |
---|---|---|
md5:417a8fe52db78718748a6c1e0af8861c
|
5.9 MB | Preview Download |
md5:6aa4bc50f66e76e68e47a7b385a8176f
|
3.4 MB | Preview Download |
Additional details
- Alternative title
- Multi-Label Classification Models for the Prediction of Cross-Coupling Reaction Conditions
- Eprint ID
- 106094
- Resolver ID
- CaltechAUTHORS:20201015-152733539
- NSF Graduate Research Fellowship
- DGE-1144469
- Heritage Medical Research Institute
- NSF
- CNS-1645832
- NSF
- 1918839
- Raytheon Company
- Beyond Limits
- Disney Research
- Nissan Corporation
- Research Corporation
- Created
-
2020-10-16Created from EPrint's datestamp field
- Updated
-
2023-06-01Created from EPrint's last_modified field
- Caltech groups
- Heritage Medical Research Institute, Center for Autonomous Systems and Technologies (CAST)