CaltechAUTHORS
  A Caltech Library Service

Decoding sequence-level information to predict membrane protein expression

Saladi, Shyam M. and Javed, Nauman and Müller, Axel and Clemons, William M., Jr. (2017) Decoding sequence-level information to predict membrane protein expression. . (Submitted) https://resolver.caltech.edu/CaltechAUTHORS:20170619-100806436

[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial.

2438Kb

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20170619-100806436

Abstract

The expression and purification of integral membrane proteins remains a major bottleneck in the characterization of these important proteins. Expression levels are currently unpredictable, which renders the pursuit of these targets challenging and highly inefficient. Evidence demonstrates that small changes in the nucleotide or amino-acid sequence can dramatically affect membrane protein biogenesis; yet these observations have not resulted in generalizable approaches to improve expression. In this study, we develop a data-driven statistical model that predicts membrane protein expression in E. coli directly from sequence. The model, trained on experimental data, combines a set of sequence-derived variables resulting in a score that predicts the likelihood of expression. We test the model against various independent datasets from the literature that contain a variety of scales and experimental outcomes demonstrating that the model significantly enriches expressed proteins. The model is then used to score expression for membrane proteomes and protein families highlighting areas where the model excels. Surprisingly, analysis of the underlying features reveals an importance in nucleotide sequence-derived parameters for expression. This computational model, as illustrated here, can immediately be used to identify favorable targets for characterization.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
http://dx.doi.org/10.1101/098673DOIArticle
http://biorxiv.org/content/early/2017/01/05/098673OrganizationArticle
ORCID:
AuthorORCID
Clemons, William M., Jr.0000-0002-0021-889X
Additional Information:The copyright holder for this preprint is the author/funder. It is made available under a CC-BY-NC 4.0 International license. We thank Daniel Daley and Thomas Miller’s group for discussion, Yaser Abu-Mostafa and Yisong Yue for guidance regarding machine learning, Niles Pierce for providing NUPACK source code, and Welison Floriano and Naveed Near-Ansari for maintaining local computing resources. We thank James Bowie, Michiel Niesen, Stephen Marshall, Thomas Miller, Reid van Lehn, and Tom Rapoport for critical reading of the manuscript. Models and analyses are possible thanks to raw experimental data provided by Daniel Daley and Mikaela Rapp; Nir Fluman; Edda Kloppmann, Brian Kloss, and Marco Punta from NYCOMPS; Pikyee Ma; Renaud Wagner; and Florent Bernaudat. We acknowledge funding from an NIH Pioneer Award to WMC (5DP1GM105385); a Benjamin M. Rosen graduate fellowship, a NIH/NRSA training grant (5T32GM07616), and a NSF Graduate Research fellowship to SMS; and an Arthur A. Noyes Summer Undergraduate Research Fellowship to NJ. Computational time was provided by Stephen Mayo and Douglas Rees. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1053575. Author Contributions: S.M.S., A.M., and W.M.C. conceived the project. S.M.S. developed the approach. S.M.S., A.M., and N.J. compiled sequence and experimental data. N.J. created code to demonstrate feasibility. S.M.S. performed all published calculations. S.M.S. and WMC wrote the manuscript.
Funders:
Funding AgencyGrant Number
NIH5DP1GM105385
Benjamin M. Rosen FellowshipUNSPECIFIED
NIH Predoctoral Fellowship5T32GM07616
NSF Graduate Research FellowshipUNSPECIFIED
Arthur A. Noyes Summer Undergraduate Research FellowshipUNSPECIFIED
NSFACI-1053575
Caltech Summer Undergraduate Research Fellowship (SURF)UNSPECIFIED
Record Number:CaltechAUTHORS:20170619-100806436
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20170619-100806436
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:78327
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:19 Jun 2017 17:28
Last Modified:03 Oct 2019 18:07

Repository Staff Only: item control page