Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published December 8, 2023 | in press
Journal Article Open

Dataset Design for Building Models of Chemical Reactivity

Abstract

Models can codify our understanding of chemical reactivity and serve a useful purpose in the development of new synthetic processes via, for example, evaluating hypothetical reaction conditions or in silico substrate tolerance. Perhaps the most determining factor is the composition of the training data and whether it is sufficient to train a model that can make accurate predictions over the full domain of interest. Here, we discuss the design of reaction datasets in ways that are conducive to data-driven modeling, emphasizing the idea that training set diversity and model generalizability rely on the choice of molecular or reaction representation. We additionally discuss the experimental constraints associated with generating common types of chemistry datasets and how these considerations should influence dataset design and model building.

Copyright and License

© 2023 The Authors. Published by American Chemical Society. This publication is licensed under CC-BY 4.0.

Acknowledgement

The authors gratefully acknowledge the financial support of this work through the NSF Center for Computer Assisted Synthesis (C-CAS) under Grant CHE-2202693. We also thank CAS for providing the subset of the CAS Content Collection to enable our visualization of the reaction yields shown in Figure 2.

Conflict of Interest

The authors declare no competing financial interest.

Files

raghavan-et-al-2023-dataset-design-for-building-models-of-chemical-reactivity.pdf

Additional details

Created:
December 13, 2023
Modified:
December 13, 2023