A Caltech Library Service

Machine learning-assisted directed protein evolution with combinatorial libraries

Wu, Zachary and Kan, S. B. Jennifer and Lewis, Russell D. and Wittmann, Bruce J. and Arnold, Frances H. (2019) Machine learning-assisted directed protein evolution with combinatorial libraries. Proceedings of the National Academy of Sciences of the United States of America, 116 (18). pp. 8852-8858. ISSN 0027-8424. PMCID PMC6500146. doi:10.1073/pnas.1901979116.

[img] PDF - Published Version
See Usage Policy.

[img] PDF - Supplemental Material
See Usage Policy.

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning into the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine-learning models trained on tested variants provide a fast method for testing sequence space computationally. We validated this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (i.e., stereodivergence) of a new-to-nature carbene Si–H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% ee (enantiomeric excess). By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem.

Item Type:Article
Related URLs:
URLURL TypeDescription Information ItemData CentralArticle Paper
Kan, S. B. Jennifer0000-0001-6371-8042
Lewis, Russell D.0000-0002-5776-7347
Wittmann, Bruce J.0000-0001-8144-9157
Arnold, Frances H.0000-0002-4027-364X
Additional Information:© 2019 National Academy of Sciences. Published under the PNAS license. Contributed by Frances H. Arnold, March 18, 2019 (sent for review February 4, 2019; reviewed by Marc Ostermeier and Justin B. Siegel). The authors thank Yisong Yue for initial guidance and Scott Virgil (Caltech Center for Catalysis and Chemical Synthesis) for providing critical instrument support; and Kevin Yang, Anders Knight, Oliver Brandenburg, and Ruijie Kelly Zhang for helpful discussions. This work is supported by National Science Foundation Grant GRF2017227007 (to Z.W.), the Rothenberg Innovation Initiative Program (S.B.J.K. and F.H.A.), and the Jacobs Institute for Molecular Engineering for Medicine at Caltech (S.B.J.K. and F.H.A.). Author contributions: Z.W., S.B.J.K., R.D.L., and F.H.A. designed research; Z.W. and B.J.W. performed research; Z.W. contributed new reagents/analytic tools; Z.W., S.B.J.K., R.D.L., and B.J.W. analyzed data; and Z.W., S.B.J.K., R.D.L., B.J.W., and F.H.A. wrote the paper. Reviewers: M.O., Johns Hopkins University; and J.B.S., UC Davis Health System. The authors declare no conflict of interest. Data deposition: The data reported in this paper have been deposited in the ProtaBank database,, at This article contains supporting information online at
Group:Jacobs Institute for Molecular Engineering for Medicine
Funding AgencyGrant Number
NSF Graduate Research FellowshipGRF2017227007
Rothenberg Innovation Initiative (RI2)UNSPECIFIED
Jacobs Institute for Molecular Engineering for MedicineUNSPECIFIED
Subject Keywords:protein engineering; machine learning; directed evolution; enzyme; catalysis
Issue or Number:18
PubMed Central ID:PMC6500146
Record Number:CaltechAUTHORS:20190415-082330973
Persistent URL:
Official Citation:Machine learning-assisted directed protein evolution with combinatorial libraries. Zachary Wu, S. B. Jennifer Kan, Russell D. Lewis, Bruce J. Wittmann, Frances H. Arnold. Proceedings of the National Academy of Sciences Apr 2019, 116 (18) 8852-8858; DOI: 10.1073/pnas.1901979116
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:94697
Deposited By: Tony Diaz
Deposited On:16 Apr 2019 21:54
Last Modified:16 Nov 2021 17:07

Repository Staff Only: item control page