Active learning-assisted directed evolution
Abstract
Directed evolution (DE) is a powerful tool to optimize protein fitness for a specific application. However, DE can be inefficient when mutations exhibit non-additive, or epistatic, behavior. Here, we present Active Learning-assisted Directed Evolution (ALDE), an iterative machine learning-assisted DE workflow that leverages uncertainty quantification to explore the search space of proteins more efficiently than current DE methods. We apply ALDE to an engineering landscape that is challenging for DE: optimization of five epistatic residues in the active site of an enzyme. In three rounds of wet-lab experimentation, we improve the yield of a desired product of a non-native cyclopropanation reaction from 12% to 93%. We also perform computational simulations on existing protein sequence-fitness datasets to support our argument that ALDE can be more effective than DE. Overall, ALDE is a practical and broadly applicable strategy to unlock improved protein engineering outcomes.
Copyright and License
© 2025, The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Acknowledgement
This work was supported by the U.S. Army Research Office cooperative agreement for the Institute for Collaborative Biotechnologies (W911NF-19-2-0026 to F.H.A.). J.Y. and R.G.L. are partially supported by National Science Foundation Graduate Research Fellowships. The authors thank Yueming Long and Emre Guersoy for help with sequencing and Shilong Gao for collecting useful initial data. The authors also thank Christopher Yeh for helpful discussions and Sabine Brinkmann-Chen for critical reading of the manuscript. Finally, the authors thank Miguel González-Duque and Richard Michael for pointing out consideration of the length-scale prior for GP models and Hunter Nisonoff for insight into the poor calibration of DKL models.
Contributions
J.Y.: conceptualization, methodology, software, investigation, analysis, writing—original draft, writing—editing. R.G.L: conceptualization, methodology, investigation, analysis, writing—original draft, writing—editing. J.C.B: methodology, software, writing—editing. R.A.: methodology, software, writing—editing. M.A.H.: investigation, writing—editing. S.K.: resources, DNA synthesis. M.H.: resources, DNA synthesis. Y.Y.: resources, writing—editing, supervision, funding. F.H.A: resources, writing—editing, supervision, funding.
Supplemental Material
Supplementary information: https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-025-55987-8/MediaObjects/41467_2025_55987_MOESM1_ESM.pdf
Files
Name | Size | Download all |
---|---|---|
md5:a8fc9dabd44b486fe5dd05c4948b6433
|
15.5 MB | Preview Download |
md5:a5c6c2b3b948249c774c24a439b6c9fc
|
2.3 MB | Preview Download |
Additional details
- Institute for Collaborative Biotechnologies
- W911NF-19-2-0026
- National Science Foundation
- Graduate Research Fellowship -
- Accepted
-
2025-01-02Accepted
- Available
-
2025-01-16Published online
- Caltech groups
- Division of Chemistry and Chemical Engineering (CCE), Division of Biology and Biological Engineering (BBE), Division of Engineering and Applied Science (EAS)
- Publication Status
- Published