Machine-learning-guided directed evolution for protein engineering
- Creators
- Yang, Kevin K.
- Wu, Zachary
- Arnold, Frances H.
Abstract
Protein engineering through machine-learning-guided directed evolution enables the optimization of protein functions. Machine-learning approaches predict how sequence maps to function in a data-driven manner without requiring a detailed model of the underlying physics or biological pathways. Such methods accelerate directed evolution by learning from the properties of characterized variants and using that information to select sequences that are likely to exhibit improved properties. Here we introduce the steps required to build machine-learning sequence–function models and to use those models to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to the use of machine learning for protein engineering, as well as the current literature and applications of this engineering paradigm. We illustrate the process with two case studies. Finally, we look to future opportunities for machine learning to enable the discovery of unknown protein functions and uncover the relationship between protein sequence and function.
Additional Information
© 2019 Springer Nature Publishing AG. Received 25 October 2018; Accepted 17 June 2019; Published 15 July 2019. The authors thank Y. Chen, K. Johnston, B. Wittmann, and H. Yang for comments on early versions of the manuscript, as well as members of the Arnold lab, J. Bois, and Y. Yue for general advice and discussions on protein engineering and machine learning. This work was supported by the US Army Research Office Institute for Collaborative Biotechnologies (W911F-09-0001 to F.H.A.), the Donna and Benjamin M. Rosen Bioengineering Center (to K.K.Y.), and the National Science Foundation (GRF2017227007 to Z.W.). Author Contributions: K.K.Y., Z.W., and F.H.A. conceptualized the project. K.K.Y. wrote the manuscript with input and editing from all authors. The authors declare no competing interests.Attached Files
Submitted - 1811.10775.pdf
Files
Name | Size | Download all |
---|---|---|
md5:5db08710a5c90779a8f95e5a1571caf5
|
3.9 MB | Preview Download |
Additional details
- Eprint ID
- 97142
- DOI
- 10.1038/s41592-019-0496-6
- Resolver ID
- CaltechAUTHORS:20190715-092459913
- Army Research Office (ARO)
- W911F-09-0001
- Donna and Benjamin M. Rosen Bioengineering Center
- NSF Graduate Research Fellowship
- GRF2017227007
- Created
-
2019-07-15Created from EPrint's datestamp field
- Updated
-
2021-11-16Created from EPrint's last_modified field
- Caltech groups
- Rosen Bioengineering Center