Triandafillou, Catherine G. and Pan, Rosalind Wenshan and Dinner, Aaron R. and Drummond, D. Allan (2023) Pervasive, conserved secondary structure in highly charged protein regions. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20230316-182174000.17
![]() |
PDF
- Submitted Version
Creative Commons Attribution Non-commercial No Derivatives. 5MB |
![]() |
PDF (Supplemental Figures)
- Supplemental Material
Creative Commons Attribution Non-commercial No Derivatives. 1MB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20230316-182174000.17
Abstract
Understanding how protein sequences confer function remains a defining challenge in molecular biology. Two approaches have yielded enormous insight yet are often pursued separately: structure-based, where sequence-encoded structures mediate function, and disorder-based, where sequences dictate physicochemical and dynamical properties which determine function in the absence of stable structure. Here we study highly charged protein regions (>40% charged residues), which are routinely presumed to be disordered. Using recent advances in structure prediction and experimental structures, we show that roughly 40% of these regions form well-structured helices. Features often used to predict disorder—high charge density, low hydrophobicity, low sequence complexity, and evolutionarily varying length—are also compatible with solvated, variable-length helices. We show that a simple composition classifier predicts the existence of structure far better than well-established heuristics based on charge and hydropathy. We show that helical structure is more prevalent than previously appreciated in highly charged regions of diverse proteomes and characterize the conservation of highly charged regions. Our results underscore the importance of integrating, rather than choosing between, structure- and disorder-based approaches.
Item Type: | Report or Paper (Discussion Paper) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| ||||||||||||||
ORCID: |
| ||||||||||||||
Additional Information: | The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. C.G.T. is a Damon Runyon Postdoctoral Fellow supported by the Damon Runyon Cancer Research Foundation (DRG-2465-22). R.W.P. acknowledges support from the UChicago Biological Sciences Collegiate Division Summer Fellowship, Liew Family College Research Fellows Fund, and the UChicago Quantitative Biology Summer Fellowship. D.A.D. acknowledges support from the NIH (award numbers GM144278 and GM127406) and the US Army Research Office (W911NF-14-1-0411). A.R.D. acknowledges support from the NIH (award number R35 GM136381). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors thank Alex Holehouse for helpful discussions, and Alexander Cope for providing the structure-annotated PDB data. Author contributions. D.A.D., A.R.D. and C.G.T. developed ideas and direction, R.W.P. and C.G.T. performed analyses, R.W.P., C.G.T. and D.A.D. made figures, and all authors contributed to the text. Data Availability. Data used in this study are from publicly available datasets: AlphaFold protein structure prediction available at https://alphafold.ebi.ac.uk/download#proteomes-section, yeast proteome available from Saccharomyces Genome Database http://sgd-archive.yeastgenome.org/sequence/S288C_reference/orf_protein/, AYbRAH fungal ortholog database available at https://github.com/LMSE/aybrah, and DisProt yeast disordered regions https://www.disprot.org/browse?sort_field=disprot_id&sort_value=asc&page_size=20&page=0&r elease=current&show_ambiguous=true&show_obsolete=false&ncbi_taxon_id=559292. All additional data generated in this study are available at https://github.com/drummondlab/highly-charged-regions-2022. Code availability. All analyses and code used to generate the figures in this work can be found at https://github.com/drummondlab/highly-charged-regions-2022. The authors have declared no competing interest. | ||||||||||||||
Funders: |
| ||||||||||||||
DOI: | 10.1101/2023.02.15.528637 | ||||||||||||||
Record Number: | CaltechAUTHORS:20230316-182174000.17 | ||||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20230316-182174000.17 | ||||||||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||||||||||
ID Code: | 120135 | ||||||||||||||
Collection: | CaltechAUTHORS | ||||||||||||||
Deposited By: | George Porter | ||||||||||||||
Deposited On: | 22 Mar 2023 00:57 | ||||||||||||||
Last Modified: | 22 Mar 2023 00:57 |
Repository Staff Only: item control page