CaltechAUTHORS
  A Caltech Library Service

Navigating the protein fitness landscape with Gaussian processes

Romero, Philip A. and Krause, Andreas and Arnold, Frances H. (2013) Navigating the protein fitness landscape with Gaussian processes. Proceedings of the National Academy of Sciences of the United States of America, 110 (3). E193-E201. ISSN 0027-8424. PMCID PMC3549130. https://resolver.caltech.edu/CaltechAUTHORS:20130225-163008905

[img]
Preview
PDF - Published Version
See Usage Policy.

908Kb
[img]
Preview
PDF - Supplemental Material
See Usage Policy.

1091Kb
[img] Plain Text (Dataset S01) - Supplemental Material
See Usage Policy.

2651b
[img] Plain Text (Dataset S02) - Supplemental Material
See Usage Policy.

20Kb
[img] Plain Text (Dataset S03) - Supplemental Material
See Usage Policy.

16Kb

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20130225-163008905

Abstract

Knowing how protein sequence maps to function (the “fitness landscape”) is critical for understanding protein evolution as well as for engineering proteins with new and useful properties. We demonstrate that the protein fitness landscape can be inferred from experimental data, using Gaussian processes, a Bayesian learning technique. Gaussian process landscapes can model various protein sequence properties, including functional status, thermostability, enzyme activity, and ligand binding affinity. Trained on experimental data, these models achieve unrivaled quantitative accuracy. Furthermore, the explicit representation of model uncertainty allows for efficient searches through the vast space of possible sequences. We develop and test two protein sequence design algorithms motivated by Bayesian decision theory. The first one identifies small sets of sequences that are informative about the landscape; the second one identifies optimized sequences by iteratively improving the Gaussian process model in regions of the landscape that are predicted to be optimized. We demonstrate the ability of Gaussian processes to guide the search through protein sequence space by designing, constructing, and testing chimeric cytochrome P450s. These algorithms allowed us to engineer active P450 enzymes that are more thermostable than any previously made by chimeragenesis, rational design, or directed evolution.


Item Type:Article
Related URLs:
URLURL TypeDescription
http://dx.doi.org/10.1073/pnas.1215251110 DOIArticle
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549130/PubMed CentralArticle
ORCID:
AuthorORCID
Romero, Philip A.0000-0002-2586-7263
Krause, Andreas0000-0001-7260-9673
Arnold, Frances H.0000-0002-4027-364X
Additional Information:© 2013 National Academy of Sciences. Edited by Michael Levitt, Stanford University School of Medicine, Stanford, CA, and approved November 28, 2012 (received for review September 9, 2012). Published online before print December 31, 2012. We thank C. D. Snow for helpful discussions, E. M. Brustad for assistance with the P450 cloning and expression, and E. T. Bax for feedback on the manuscript. P.A.R. was supported by a National Institutes of Health training grant. This work was supported by the Institute for Collaborative Biotechnologies through Grant W911NF-09-0001 from the US Army Research Office (to F.H.A.), as well as by Swiss National Science Foundation Grant 200021_137971 (to A.K.). Author contributions: P.A.R., A.K., and F.H.A. designed research; P.A.R. performed research; P.A.R. and A.K. contributed new reagents/analytic tools; P.A.R., A.K., and F.H.A. analyzed data; and P.A.R., A.K., and F.H.A. wrote the paper.
Funders:
Funding AgencyGrant Number
NIH Predoctoral FellowshipUNSPECIFIED
Army Research Office (ARO)W911NF-09-0001
Swiss National Science Foundation (SNSF)200021_137971
Subject Keywords:protein engineering; recombination; machine learning; experimental design; active learning
Issue or Number:3
PubMed Central ID:PMC3549130
Record Number:CaltechAUTHORS:20130225-163008905
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20130225-163008905
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:37123
Collection:CaltechAUTHORS
Deposited By: Jason Perez
Deposited On:26 Feb 2013 21:50
Last Modified:23 Apr 2020 23:09

Repository Staff Only: item control page