Pervasive, conserved secondary structure in highly charged protein regions
Abstract
Understanding how protein sequences confer function remains a defining challenge in molecular biology. Two approaches have yielded enormous insight yet are often pursued separately: structure-based, where sequence-encoded structures mediate function, and disorder-based, where sequences dictate physicochemical and dynamical properties which determine function in the absence of stable structure. Here we study highly charged protein regions (>40% charged residues), which are routinely presumed to be disordered. Using recent advances in structure prediction and experimental structures, we show that roughly 40% of these regions form well-structured helices. Features often used to predict disorder—high charge density, low hydrophobicity, low sequence complexity, and evolutionarily varying length—are also compatible with solvated, variable-length helices. We show that a simple composition classifier predicts the existence of structure far better than well-established heuristics based on charge and hydropathy. We show that helical structure is more prevalent than previously appreciated in highly charged regions of diverse proteomes and characterize the conservation of highly charged regions. Our results underscore the importance of integrating, rather than choosing between, structure- and disorder-based approaches.
Copyright and License
© 2023 Triandafillou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Acknowledgement
The authors thank Alex Holehouse for helpful discussions, and Alexander Cope for providing the structure-annotated PDB data.
Contributions
Conceptualization: Catherine G. Triandafillou, Aaron R. Dinner, D. Allan Drummond. Formal analysis: Catherine G. Triandafillou, Rosalind Wenshan Pan. Funding acquisition: Catherine G. Triandafillou, Rosalind Wenshan Pan, Aaron R. Dinner, D. Allan Drummond. Investigation: Catherine G. Triandafillou, Rosalind Wenshan Pan. Methodology: Catherine G. Triandafillou, D. Allan Drummond. Supervision: Aaron R. Dinner, D. Allan Drummond. Visualization: Catherine G. Triandafillou, Rosalind Wenshan Pan, D. Allan Drummond. Writing – original draft: Catherine G. Triandafillou, Rosalind Wenshan Pan, Aaron R. Dinner, D. Allan Drummond. Writing – review & editing: Catherine G. Triandafillou, Rosalind Wenshan Pan, Aaron R. Dinner, D. Allan Drummond.
Funding
C.G.T. is a Damon Runyon Postdoctoral Fellow supported by the Damon Runyon Cancer Research Foundation (DRG-2465-22). R.W.P. acknowledges support from the UChicago Biological Sciences Collegiate Division Summer Fellowship, Liew Family College Research Fellows Fund, and the UChicago Quantitative Biology Summer Fellowship. D.A.D. acknowledges support from the NIH (award numbers GM144278 and GM127406) and the US Army Research Office (W911NF-14-1-0411). A.R.D. acknowledges support from the NIH (award number R35 GM136381). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data Availability
Data used in this study are from publicly available datasets: AlphaFold protein structure prediction available at https://alphafold.ebi.ac.uk/download#proteomes-section, yeast proteome available from the Saccharomyces Genome Database http://sgd-archive.yeastgenome.org/?prefix=sequence/S288C_reference/orf_protein/ AYbRAH fungal ortholog database available at https://github.com/LMSE/aybrah, and DisProt yeast disordered regions https://www.disprot.org/browse?sort_field=disprot_id&sort_value=asc&page_size=20&page=0&release=current&show_ambiguous=true&show_obsolete=false&ncbi_taxon_id=559292. All additional data generated in this study are available at https://github.com/drummondlab/highly-charged-regions-2022.
Code Availability
Code availability All analyses and code used to generate the figures in this work can be found at https://github.com/drummondlab/highly-charged-regions-2022.
Conflict of Interest
The authors have declared that no competing interests exist.
Files
Name | Size | Download all |
---|---|---|
md5:27679bf3c80122a9a5bfddcbf06e603b
|
4.0 MB | Preview Download |
Additional details
- PMCID
- PMC10602382
- DRG-2465-22
- Damon Runyon Cancer Research Foundation
- University of Chicago
- GM144278
- National Institutes of Health
- GM127406
- National Institutes of Health
- W911NF-14-1-0411
- United States Army Research Office
- R35 GM136381
- National Institutes of Health