Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published October 2023 | Published
Journal Article Open

Pervasive, conserved secondary structure in highly charged protein regions

Abstract

Understanding how protein sequences confer function remains a defining challenge in molecular biology. Two approaches have yielded enormous insight yet are often pursued separately: structure-based, where sequence-encoded structures mediate function, and disorder-based, where sequences dictate physicochemical and dynamical properties which determine function in the absence of stable structure. Here we study highly charged protein regions (>40% charged residues), which are routinely presumed to be disordered. Using recent advances in structure prediction and experimental structures, we show that roughly 40% of these regions form well-structured helices. Features often used to predict disorder—high charge density, low hydrophobicity, low sequence complexity, and evolutionarily varying length—are also compatible with solvated, variable-length helices. We show that a simple composition classifier predicts the existence of structure far better than well-established heuristics based on charge and hydropathy. We show that helical structure is more prevalent than previously appreciated in highly charged regions of diverse proteomes and characterize the conservation of highly charged regions. Our results underscore the importance of integrating, rather than choosing between, structure- and disorder-based approaches.

Copyright and License

© 2023 Triandafillou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Acknowledgement

The authors thank Alex Holehouse for helpful discussions, and Alexander Cope for providing the structure-annotated PDB data.

Contributions

Conceptualization: Catherine G. Triandafillou, Aaron R. Dinner, D. Allan Drummond. Formal analysis: Catherine G. Triandafillou, Rosalind Wenshan Pan. Funding acquisition: Catherine G. Triandafillou, Rosalind Wenshan Pan, Aaron R. Dinner, D. Allan Drummond. Investigation: Catherine G. Triandafillou, Rosalind Wenshan Pan. Methodology: Catherine G. Triandafillou, D. Allan Drummond. Supervision: Aaron R. Dinner, D. Allan Drummond. Visualization: Catherine G. Triandafillou, Rosalind Wenshan Pan, D. Allan Drummond. Writing – original draft: Catherine G. Triandafillou, Rosalind Wenshan Pan, Aaron R. Dinner, D. Allan Drummond. Writing – review & editing: Catherine G. Triandafillou, Rosalind Wenshan Pan, Aaron R. Dinner, D. Allan Drummond.

Funding

C.G.T. is a Damon Runyon Postdoctoral Fellow supported by the Damon Runyon Cancer Research Foundation (DRG-2465-22). R.W.P. acknowledges support from the UChicago Biological Sciences Collegiate Division Summer Fellowship, Liew Family College Research Fellows Fund, and the UChicago Quantitative Biology Summer Fellowship. D.A.D. acknowledges support from the NIH (award numbers GM144278 and GM127406) and the US Army Research Office (W911NF-14-1-0411). A.R.D. acknowledges support from the NIH (award number R35 GM136381). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability

Data used in this study are from publicly available datasets: AlphaFold protein structure prediction available at https://alphafold.ebi.ac.uk/download#proteomes-section, yeast proteome available from the Saccharomyces Genome Database http://sgd-archive.yeastgenome.org/?prefix=sequence/S288C_reference/orf_protein/ AYbRAH fungal ortholog database available at https://github.com/LMSE/aybrah, and DisProt yeast disordered regions https://www.disprot.org/browse?sort_field=disprot_id&sort_value=asc&page_size=20&page=0&release=current&show_ambiguous=true&show_obsolete=false&ncbi_taxon_id=559292. All additional data generated in this study are available at https://github.com/drummondlab/highly-charged-regions-2022.

Code Availability

Code availability All analyses and code used to generate the figures in this work can be found at https://github.com/drummondlab/highly-charged-regions-2022.

Conflict of Interest

The authors have declared that no competing interests exist.

Files

pcbi.1011565.pdf
Files (4.0 MB)
Name Size Download all
md5:27679bf3c80122a9a5bfddcbf06e603b
4.0 MB Preview Download

Additional details

Created:
November 15, 2023
Modified:
November 15, 2023