Towards a Universal Characterization of the Membrane Protein Expression Landscape
Integral membrane proteins (IMPs) are critical players in cellular homeostasis and as such represent over 60% of all drug targets. However, their complex biochemical makeup means that integration into membrane bilayers and heterologous overexpression in E. coli is particularly difficult—a major roadblock to biochemical and structural studies, and until now, a problem largely intractable to empirical and computational methods alike. Our recent work demonstrated that a statistical model can predict IMP expression in E. coli from sequence-derived features alone. Since this model can assign a score to any IMP, this enables us to rigorously investigate for the first time expression landscapes across a broad set of sequence spaces and modifications and extract principles therefrom for sequence redesign. Here, we assess the effect of synonymous codon substitutions on IMP expression, and explore the corresponding expression landscape. Using a custom-developed Metropolis-Hastings algorithm, a type of Markov chain Monte Carlo method well suited for searching high-dimensional and complex spaces, we generated random mutational walks in the synonymous-coding sequence space for several representative IMPs. We examined effects across entire sequences as well as across membrane topology-informed spans of the protein sequence. Our work confirms earlier reports that sequence modifications in the first 150 base pairs contribute disproportionately to improving expression, and further shows that very small portions of sequence have strong influence on expression score and display a Fujiyama-type landscape. We present a quantitative characterization of the expression fitness landscape, consider implications for broadly improving IMP expression in E. coli, and propose experiments that can be done to assess our predictions across the landscape.
© 2017 Elsevier B.V. Available online 3 February 2017. 932-Plat