Published January 16, 2025 | Published
Journal Article Open

Active learning-assisted directed evolution

  • 1. ROR icon California Institute of Technology
  • 2. ROR icon University of California, Berkeley
  • 3. Elegen Corp, 1300 Industrial Road #16, San Carlos, CA, USA
An error occurred while generating the citation.

Abstract

Directed evolution (DE) is a powerful tool to optimize protein fitness for a specific application. However, DE can be inefficient when mutations exhibit non-additive, or epistatic, behavior. Here, we present Active Learning-assisted Directed Evolution (ALDE), an iterative machine learning-assisted DE workflow that leverages uncertainty quantification to explore the search space of proteins more efficiently than current DE methods. We apply ALDE to an engineering landscape that is challenging for DE: optimization of five epistatic residues in the active site of an enzyme. In three rounds of wet-lab experimentation, we improve the yield of a desired product of a non-native cyclopropanation reaction from 12% to 93%. We also perform computational simulations on existing protein sequence-fitness datasets to support our argument that ALDE can be more effective than DE. Overall, ALDE is a practical and broadly applicable strategy to unlock improved protein engineering outcomes.

Copyright and License

© 2025, The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Acknowledgement

This work was supported by the U.S. Army Research Office cooperative agreement for the Institute for Collaborative Biotechnologies (W911NF-19-2-0026 to F.H.A.). J.Y. and R.G.L. are partially supported by National Science Foundation Graduate Research Fellowships. The authors thank Yueming Long and Emre Guersoy for help with sequencing and Shilong Gao for collecting useful initial data. The authors also thank Christopher Yeh for helpful discussions and Sabine Brinkmann-Chen for critical reading of the manuscript. Finally, the authors thank Miguel González-Duque and Richard Michael for pointing out consideration of the length-scale prior for GP models and Hunter Nisonoff for insight into the poor calibration of DKL models.

Contributions

J.Y.: conceptualization, methodology, software, investigation, analysis, writing—original draft, writing—editing. R.G.L: conceptualization, methodology, investigation, analysis, writing—original draft, writing—editing. J.C.B: methodology, software, writing—editing. R.A.: methodology, software, writing—editing. M.A.H.: investigation, writing—editing. S.K.: resources, DNA synthesis. M.H.: resources, DNA synthesis. Y.Y.: resources, writing—editing, supervision, funding. F.H.A: resources, writing—editing, supervision, funding.

Supplemental Material

Supplementary information: https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-025-55987-8/MediaObjects/41467_2025_55987_MOESM1_ESM.pdf

Files

s41467-025-55987-8.pdf
Files (17.8 MB)
Name Size Download all
md5:a8fc9dabd44b486fe5dd05c4948b6433
15.5 MB Preview Download
md5:a5c6c2b3b948249c774c24a439b6c9fc
2.3 MB Preview Download

Additional details

Created:
April 3, 2025
Modified:
April 3, 2025