A Caltech Library Service

Machine-learning-guided directed evolution for protein engineering

Yang, Kevin K. and Wu, Zachary and Arnold, Frances H. (2019) Machine-learning-guided directed evolution for protein engineering. Nature Methods, 16 (8). pp. 687-694. ISSN 1548-7091. doi:10.1038/s41592-019-0496-6.

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


Protein engineering through machine-learning-guided directed evolution enables the optimization of protein functions. Machine-learning approaches predict how sequence maps to function in a data-driven manner without requiring a detailed model of the underlying physics or biological pathways. Such methods accelerate directed evolution by learning from the properties of characterized variants and using that information to select sequences that are likely to exhibit improved properties. Here we introduce the steps required to build machine-learning sequence–function models and to use those models to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to the use of machine learning for protein engineering, as well as the current literature and applications of this engineering paradigm. We illustrate the process with two case studies. Finally, we look to future opportunities for machine learning to enable the discovery of unknown protein functions and uncover the relationship between protein sequence and function.

Item Type:Article
Related URLs:
URLURL TypeDescription ReadCube access Paper
Yang, Kevin K.0000-0001-9045-6826
Arnold, Frances H.0000-0002-4027-364X
Additional Information:© 2019 Springer Nature Publishing AG. Received 25 October 2018; Accepted 17 June 2019; Published 15 July 2019. The authors thank Y. Chen, K. Johnston, B. Wittmann, and H. Yang for comments on early versions of the manuscript, as well as members of the Arnold lab, J. Bois, and Y. Yue for general advice and discussions on protein engineering and machine learning. This work was supported by the US Army Research Office Institute for Collaborative Biotechnologies (W911F-09-0001 to F.H.A.), the Donna and Benjamin M. Rosen Bioengineering Center (to K.K.Y.), and the National Science Foundation (GRF2017227007 to Z.W.). Author Contributions: K.K.Y., Z.W., and F.H.A. conceptualized the project. K.K.Y. wrote the manuscript with input and editing from all authors. The authors declare no competing interests.
Group:Rosen Bioengineering Center
Funding AgencyGrant Number
Army Research Office (ARO)W911F-09-0001
Donna and Benjamin M. Rosen Bioengineering CenterUNSPECIFIED
NSF Graduate Research FellowshipGRF2017227007
Subject Keywords:Machine learning; Proteins
Issue or Number:8
Record Number:CaltechAUTHORS:20190715-092459913
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:97142
Deposited By: Tony Diaz
Deposited On:15 Jul 2019 16:37
Last Modified:16 Nov 2021 17:27

Repository Staff Only: item control page