De novo protein identification in mammalian sperm using in situ cryoelectron tomography and AlphaFold2 docking

Creators: Chen, Zhen; Shiozaki, Momoko; Haas, Kelsey M.; Skinner, Will M.; Zhao, Shumei; Guo, Caiying; Polacco, Benjamin J.; Yu, Zhiheng; Krogan, Nevan J.; Lishko, Polina V.; Kaake, Robyn M.; Vale, Ronald D.; Agard, David A.

Abstract

To understand the molecular mechanisms of cellular pathways, contemporary workflows typically require multiple techniques to identify proteins, track their localization, and determine their structures in vitro. Here, we combined cellular cryoelectron tomography (cryo-ET) and AlphaFold2 modeling to address these questions and understand how mammalian sperm are built in situ. Our cellular cryo-ET and subtomogram averaging provided 6.0-Å reconstructions of axonemal microtubule structures. The well-resolved tertiary structures allowed us to unbiasedly match sperm-specific densities with 21,615 AlphaFold2-predicted protein models of the mouse proteome. We identified Tektin 5, CCDC105, and SPACA9 as novel microtubule-associated proteins. These proteins form an extensive interaction network crosslinking the lumen of axonemal doublet microtubules, suggesting their roles in modulating the mechanical properties of the filaments. Indeed, Tekt5 −/− sperm possess more deformed flagella with 180° bends. Together, our studies presented a cellular visual proteomics workflow and shed light on the in vivo functions of Tektin 5.

Acknowledgement

We are grateful to members of the Agard and Vale laboratories for the discussions and critical reading of the manuscript. We thank Xiaowei Zhao, Shixin Yang, and Rui Yan from the CryoEM Facility at the Janelia Research Campus for their assistance with data collection. We thank Zanlin Yu and Hao Wu at UCSF for their suggestions on sample processing and model building. We thank Garrett Greenan, Shawn Zheng, and Sam Li at UCSF for discussions on cryo-ET data processing. We thank Willy Wrigger from Old Dominion University for his input and suggestions on the SITUS package. EM data processing utilized computing resources at both the workstations at the CryoEM Facility at the Janelia Research Campus and the UCSF HPC Wynton cluster. We also thank David Bulkley, Glenn Gilbert, and Matt Harrington from the UCSF cryo-EM facility for their discussion on data collection and processing. We also thank Colin Morrow, Gillian Harris, Crystall Lopez, and Catherine Lindsey from Janelia Vivarium for mouse experiments. Z.C. was supported by the Helen Hay Whitney Foundation Postdoctoral Fellowship. W.M.S. was supported by the National Science Foundation Graduate Research Fellowship Program under grant numbers DGE 1752814 and DGE 2146752. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. P.V.L. received funding from a Pew Biomedical Scholars award and a GCRLE grant from the Global Consortium for Reproductive Longevity and Equality made possible by the Bia-Echo Foundation. D.A.A. received funding from NIH R35GM118099. R.D.V. received funding from NIH R35GM118106 and the Howard Hughes Medical Institute. The UCSF cryo-EM facility was supported by NIH instrumentation grants 1S10OD026881, 1S10OD020054, and 1S10OD021741.

Contributions

Conceptualization, Z.C., R.D.V., and D.A.A.; mouse sample preparation, S.Z., C.G., and W.M.S.; cryo-EM sample preparation, Z.C. and M.S.; data processing, Z.C.; biochemical extraction, Z.C.; mass spectroscopy, K.M.H., R.M.K., B.J.P., and N.J.K.; sperm analyses, W.M.S., Z.C., and P.V.L.; writing – original draft, Z.C., K.M.H., and R.M.K.; writing – review & editing, Z.C., R.D.V., and D.A.A.

Data Availability

Cryo-EM maps of 48 nm-repeating structures of doublets from wildtype mouse, Tekt5 -/- mouse and human sperm have been deposited in the Electron Microscopy Data Bank (EMDB) with accession codes: EMD-41431, EMD-41320 and EMD-41317, respectively. The EMD-41431 is a composite map with its two submaps deposited with accession codes: EMD-41450 and EMD-41451. Maps of focused refinement of 16 nm-repeating structures of A- and B-tubules from wildtype mouse have been deposited also: EMD-41315 and EMD-41316. The atomic model of the 48-nm repeat of the mouse sperm doublets has been deposited in the Protein Data Bank (PDB) with accession codes 8TO0. MS data are shared and available through the ProteomeXchange Consortium via the PRIDE partner repository under the dataset identifier: PXD036885 (username: reviewer_pxd036885@ebi.ac.uk; password: tMEZ90MC).⁵⁶ R package source materials for MSstats (version 3) are publicly available through the Krogan Lab GitHub: https://github.com/kroganlab.

After downloading the AlphaFold2 library of the mouse proteome, this code is used to distribute PDB files into subdirectories.

for f in ^∗;

do
## Splitting 50 PDBs in each subdirectory
d=dir_$(printf %03d $((i/50+1)));
mkdir -p $d;
mv "$f" $d;
let i++;
done

This code is used to unbiasedly match all PDBs with the target densities in each subdirectory:

for file in ^∗
do
echo $file
## CCDC105_flipped_b150.mrc is the target densities, the options could be found in the situs website

colores ../CCDC105_flipped_b150.mrc ${file} -res 6.0 -cutoff 0.0048 -deg 15.0 mkdir../output/${file}_out

mv col_^∗../output/${file}_out/.

done

The cross-correlation scores could then be extracted using the following script:

for f in ^∗_outt
do
echo $f
grep structure $f/^∗.pdb >> TheResultFile
grep Unnormalized $f/^∗.pdb >> TheResultFile
done

grep "correlation" TheResultFile > JustCCResults

The final output could then be sorted based on the cross-correlation scores in Excel. Note each PDB would be matched to the target densities with multiple orientations, resulting in multiple entries with the same PDB but different cross-correlation scores. The duplicate items for each PDB could be deleted in Excel.

Any additional information required to reanalyze the data reported in this work paper is available from the lead contact upon request.

Conflict of Interest

We declare that one or more authors have a competing interest as defined by Nature Portfolio. The Krogan Laboratory has received research support from Vir Biotechnology, F. Hoffmann-La Roche, and Rezo Therapeutics. N.J.K. has previously held financially compensated consulting agreements with the Icahn School of Medicine at Mount Sinai, New York, and Twist Bioscience Corp. He currently has financially compensated consulting agreements with Maze Therapeutics, Interline Therapeutics, Rezo Therapeutics, and GEn1E Lifesciences, Inc. He is on the Board of Directors of Rezo Therapeutics and is a shareholder in Tenaya Therapeutics, Maze Therapeutics, Rezo Therapeutics, and Interline Therapeutics.

Additional Information

We support inclusive, diverse, and equitable conduct of research.

Files

1-s2.0-S0092867423010383-main.pdf

Files (25.8 MB)

Name	Size	Download all
1-s2.0-S0092867423010383-mmc8.xlsx md5:7405508b160bb4af546d984c0a3ad5a1	2.6 MB	Download
1-s2.0-S0092867423010383-mmc7.xlsx md5:7a4afcb9bfe1f732610ccf5d8d99c886	272.0 kB	Download
1-s2.0-S0092867423010383-main.pdf md5:526d9ea290b7f7cfc967ec3a96babccc	21.0 MB	Preview Download
1-s2.0-S0092867423010383-mmc6.xlsx md5:6c47c6373ae12fa0674a5694ada67c8d	1.8 MB	Download
1-s2.0-S0092867423010383-mmc5.xlsx md5:3708756d89700dadc23649e1b387493e	96.9 kB	Download

Additional details

Views

Downloads

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

More info on how stats are collected....

Resource type: Journal Article
Publisher: Cell Press
Published in: Cell, 186(23), 5041-5053.e19, ISSN: 0092-8674.
Languages: English