The client-binding domain of the cochaperone Sgt2 has a helical-hand structure that binds a short hydrophobic helix

The targeting and insertion of tail-anchored (TA) integral membrane proteins (IMP) into the correct membrane is critical for cellular homeostasis. The fungal protein Sgt2, and its human homolog SGTA, binds hydrophobic clients and is the entry point for targeting of ER-bound TA IMPs. Here we reveal molecular details that underlie the mechanism of Sgt2 binding to TA IMP clients. We establish that the Sgt2 C-terminal region is flexible but conserved and sufficient for client binding. A molecular model for this domain reveals a helical hand forming a hydrophobic groove, consistent with a higher affinity for TA IMP clients with hydrophobic faces and a minimal length of 11 residues. This work places Sgt2 into a broader family of TPR-containing co-chaperone proteins.


Introduction The conserved region of the C-domain is sufficient for substrate binding 124
We then asked if the flexible Sgt2-C is the site of client binding in the co-chaperone and if so, 125 where within this domain is the binding region. During purification Sgt2-C is susceptible to 126 proteolytic activity being cut at several specific sites (Fig. 1A). Proteolysis occurred primarily at 127 Leu327 and in the poorly conserved N-terminal region (between Asp235-Gly258). Given the 128 intervening region, between Gly258 and Leu327 on ySgt2, is conserved (Fig 1A), it and the 129 corresponding region on hSgt2, may mediate TA client binding ( Fig. 2A, grey). To test this, we 130 established a set of his-tagged Sgt2 constructs of various lengths (Fig. 2C). These Sgt2-C mutants 131 were co-expressed with an MBP-tagged TA client, Sbh1, and binding was detected by the presence 132 of captured TA clients in nickel elution fractions (Fig. 2B). As previously seen [13], we confirm 133 that Sgt2-TPR-C alone is sufficient for capturing a TA client (Fig. 2C). As one might expect, the 134 C-domain was also sufficient for binding the TA client. A predicted six α-helical methionine-rich 135 region of Sgt2-C (Fig. 1A), hereafter referred to as Sgt2-Ccons, is sufficient for binding to Sbh1. 136 For ySgt2, a minimal region H1-H5 (ΔH0) poorly captures Sbh1, while for hSgt2 the equivalent 137 minimal region is sufficient for capturing the client at a similar level as the longer Ccons domain 138 (Fig. 2C). The predicted helices in Sgt2-Ccons are amphipathic and their hydrophobic patches could 139 be used for client binding (Fig. 2D). 140 To test this, each of the six helices in Sgt2-Ccons was mutated to replace the larger hydrophobic 141 residues with alanines, dramatically reducing the overall hydrophobicity. For all of the helices, 142 alanine replacement of the hydrophobic residues significantly reduces binding of Sbh1 to Sgt2-C 143 ( Fig. 2E & F). While these mutants expressed at similar levels to the wild-type sequence, one 144 cannot rule out that some of these changes may affect the tertiary structure of this domain. In 145 general, these results imply that these amphipathic helices are necessary for client binding since 146 removal of the hydrophobic faces disrupts binding. The overall effect on binding by each helix is 147 different, with mutations in helices 1-3 having the most dramatic reduction in binding suggesting 148 that these are more crucial for Sgt2-TA client (Sgt2-TA) complex formation. It is also worth noting, 149 as this is a general trend, that hSgt2 is more resistant to mutations that affect binding (Fig. 2F) than 150 ySgt2, which likely reflect different thresholds for binding. 151 Despite the need for a molecular model, the C-domain has resisted structural studies, likely 154 due to the demonstrated inherent flexibility. Based on the six conserved α-helical amphipathic 155 segments (Fig. 1A) that contain hydrophobic residues critical for TA client binding (Fig. 2D-E), 156 we expect some folded structure to exist. Therefore, we performed ab initio molecular modeling 157 of Sgt2-C using a variety of prediction methods resulting in a diversity of putative structures [48-158 52]. As expected, all models showed buried hydrophobic residues as this is a major criterion for in 159 silico protein folding. Residues outside the ySgt2-Ccons region adopted varied conformations 160 consistent with their expected higher flexibility. Pruning these N-and C-terminal regions to focus 161 on the ySgt2-Ccons region (Fig. S2A)  where a general consistent architecture was seen (Fig. 3A) [48]. The overall model contained a 168 potential TA client binding site, a hydrophobic groove formed by the amphipathic helices. The 169 groove is approximately 15 Å long, 12 Å wide, and 10 Å deep, which is sufficient to accommodate 170 three helical turns of an α-helix, ~11 amino acids (Fig. 3B). 171 To validate the model, we interrogated the accuracy of the predicted structural arrangement 172 by determining distance constraints from crosslinking experiments. We selected four pairs of 173 residues in close spatial proximity and one pair far apart based on the Quark models (Fig. 4A). 174 Calculating a Cβ-Cβ distance between residue pairs for each model (Fig. 4E), the Quark models 2 175 and 3 were the most consistent with an expected distance of 9Å or less for the close pairs. In all 176 alternative models, the overall distances are much larger and should not be expected to form 177 disulfide bonds in vitro if they represent a TMD-bound state. For Robetta, a number of the models 178 have pairs of residues within 9Å and Robetta's per-residue error estimate suggests relatively high 179 confidence in the Ccons region (Fig. S2B). 180 As a control, we first confirmed that the cysteine-mutant pairs do not affect the function of 181 ySgt2. We utilized an in vitro transfer assay where a yeast Hsp70 homolog Ssa1 loaded with a TA 182 client delivers the client to ySgt2 [21,49,50] (Fig. 4C). Purified Ssa1 is mixed with detergent 183 solubilized strep-tagged Bos1-TMD (a model ER TA client) that contained a p-benzoyl-l-184 phenylalanine (BPA) labeled residue, Bos1BPA, are diluted to below the critical micelle 185 concentration resulting in soluble complexes of Bos1BPA/Ssa1. Full-length ySgt2 variants were 186 each tested for the ability to capture Bos1BPA from Ssa1. After the transfer reaction, each was UV-187 treated to generate Bos1 crosslinks. Successful capture of the TA clients by ySgt2 was detected 188 using an anti-strep Western blot and the appearance of a Bos1BPA/ySgt2 crosslink band (Fig. 4D). 189 All of the cysteine variants of ySgt2 successfully captured Bos1BPA from Ssa1 similar to wild-type 190 suggesting that the cysteine mutations did not affect the structure or function of ySgt2. 191 For the distance experiment, each of the cysteine-mutant pairs was made in ySgt2-TPR-C 192 which lacks the dimerization domain. Each variant was coexpressed with an artificial TA client, a 193 cMyc-tagged BRIL (small, 4-helix bundle protein [51]) with a C-terminal TMD consisting of eight 194 leucines and three alanines, denoted 11[L8], and purified via nickel-affinity chromatography in 195 reducing buffer (Fig. S3A). All of the ySgt2 mutants bound the TA client and behaved similar to 196 the wild-type (cysteine-free) further suggesting the mutants did not perturb the native structure 197 (Fig. S3B). For disulfide crosslink formation, each eluate was oxidized and crosslinks were 198 identified by the visualization of a reducing-agent sensitive ~7.7kDa fragment in gel 199 electrophoresis (Fig. 3B). For both the wild-type construct and in N285C/G329C, where the pairs 200 are predicted from the Quark models to be too distant for disulfide bond formation, no higher 201 molecular weight band was observed. For the remaining pairs that are predicted to be close enough 202 for bond formation, the 7.7kDa fragment was observed in each case and is labile in reducing 203 conditions. Again, these results support the Ccons model derived from Quark. 204 With the four crosslinked pairs as distance constraints, new models were generated using 205 Robetta with a restraint on the corresponding pairs of Cβ atoms less than 9Å (Fig. S4A). The 206 Robetta models from these runs are similar to the top scoring models from Quark (Fig. 3). 207 Satisfyingly, the pair of residues that do not form disulfide crosslinks are generally consistent (Fig.  208   S4B). 209 The improvement of the ySgt2 models predicted by Robetta with restraints included 210 encouraged us to generate models for hSgt2-C with constraints. For this, pairs were defined based 211 on sequence alignments of Sgt2 (Fig. 1A) and used as restraints. The resulting predictions had 212 architectures consistent with the equivalent regions predicted for ySgt2-Ccons, for example Robetta 213 4 (Fig. S4C, top). Although in general the predicted hSgt2 model is similar to that for ySgt2, the 214 region that corresponds to H2 occupies a position that precludes a clear hydrophobic groove. For 215 ySgt2, the longer N-terminal loop occupies the groove preventing the exposure of hydrophobics 216 to solvent (Fig. 3C, grey). For hSgt2, the shorter N-terminal loop may not be sufficient to similarly 217 occupy the groove and allowing for the clear hydrophobic hand seen for the ySgt2-C. To correct 218 for this, we replaced the sequence of the N-terminal loop of hSgt2-C with the ySgt2-C loop and 219 ran structure prediction with the pairwise distance restraints. This resulted in a model where the 220 loop occupies the groove and, when pruned away suggests the hydrophobic hand seen in yeast 221 (Fig. S4C, middle boxed). Of note, we also generated models of hSgt2-C using the most recent 222 Robetta method (transform-restrained) which produces new structures with a groove and similar 223 helical-hand architecture across the board (Fig. S4C, bottom). 224 We sought to further test the robustness of our model considering the intrinsic flexibility of 225 Sgt2-C by probing for disulfide bond formation with neighboring residues of one of our 226 crosslinking pairs. While the Cβ-Cβ distance puts these adjacent pairs at farther than 9Å, mutating 227 residues to cystines and measuring S-S distances across all possible pairs of rotamers provides a 228 wider interval on possible distances and, therefore, the likelihood a disulfide bond will form (Fig.  229 4F). Cysteine mutants were introduced to the residues adjacent to M289 and A319 in ySgt2-TPR-230 C for four additional pairs: K288C/A319C, M290C/A319C, M289C/P318C, and M289C/L320C. 231 As described previously, these mutants were coexpressed with a TA substrate, in this case the 232 artificial BRIL-11[L8] which has a MBP-tag instead of a cMyc-tag. Complexes were purified by 233 amylose and then nickel affinity chromatography to ensure eluates contained only Sgt2-TPR-C 234 bound to substrate. Eluates were incubated in oxidizing conditions, quenched with 50mM NEM, 235 and digested with Glu-C protease. Again, a reductant sensitive band at 7.7kDa is observed for 236 each of these adjacent pairs. While the geometry of each of these C-C pairs might suggest against 237 disulfide bond formation, given the intrinsic flexibility of Sgt2-C, it is not surprising that each of 238 these pairs are able to form disulfide bonds. As before, disulfide bond formation was detected for The lengths of the α-helices in this structure concur with those inferred from the alignment in Fig.  268 4A. Our molecular model of Sgt2-Ccons is strikingly similar to this DP2 structure (Fig. 5B,C). An We examined the Sgt2-Ccons surface that putatively interacts with TA clients by constructing 277 hydrophobic-to-charge residue mutations that are expected to disrupt capture of TA clients by Sgt2. 278 Similar to the helix mutations in Fig. 2, the capture assay was employed to establish the relative 279 effects of individual mutations. A baseline was established based on the amount of the TA client 280 Sbh1 captured by wild-type Sgt2-TPR-C. In each experiment, Sbh1 was expressed at the same 281 level; therefore, differences in binding should directly reflect the affinity of Sgt2 mutants for clients. 282 In all cases, groove mutations from hydrophobic to aspartate led to a reduction in TA client binding 283 (Fig. 6). The effects are most dramatic with ySgt2 where each mutant significantly reduced binding 284 by 60% or more (Fig. 6A). While all hSgt2 individual mutants saw a significant loss in binding, 285 the results were more subtle with the strongest a ~36% reduction (M233D, Fig. 6B). Double 286 mutants were stronger with a significant decrease in binding relative to the individual mutants, 287 more reflective of the individual mutants in ySgt2. As seen before (Fig. 2E&F), we observe that 288 mutations toward the N-terminus of Sgt2-C have a stronger effect on binding than those later in 289 the sequence. 290 291

Sgt2-C domain binds clients with a hydrophobic segment ≥ 11 residues 292
With a molecular model for ySgt2-Ccons and multiple lines of evidence for a hydrophobic 293 groove, we sought to better understand the specific requirements for TMD binding. We and others 294 have demonstrated that a monomeric C-domain from Sgt2 is sufficient for binding to TA clients 295 [13]. To study the minimal constraints on TA client binding, we chose to focus on a monomeric 296 construct of Sgt2 (Sgt2-TPR-C) binding to variable TMDs. TA clients were designed where the 297 overall (sum) and average (mean) hydrophobicity, length, and the distribution of hydrophobic 298 character were varied in the TMDs. These artificial TMDs, a Leu/Ala helical stretch followed by 299 a Trp, were constructed as C-terminal fusions to the soluble protein BRIL (Fig. 7A). The total and 300 mean hydrophobicity are controlled by varying the helix-length and the Leu/Ala ratio. For clarity, 301 we define a syntax for the various artificial TA clients to highlight the various properties under 302 consideration: hydrophobicity, length, and distribution. The generic notation is TMD-303 length[number of leucines] which is represented, for example, as 18[L6] for a TMD of 18 amino 304 acids containing six leucines. 305 Our first goal with the artificial clients was to define the minimal length of a TMD to bind to 306 the C-domain. As described earlier, captures of his-tagged Sgt2-TPR-C with the various TA clients 307 were performed. We define a relative binding efficiency as the ratio of captured TA client by a 308 Sgt2-TPR-C normalized to the ratio of a captured wild-type TA client by Sgt2-TPR-C. In this case 309 we replaced the TMD in our artificial clients with the native TMD of Bos1 (Bos1TMD). The 310 artificial client 18[L13] shows a comparable binding efficiency to Sgt2-TPR-C as that of Bos1TMD 311 (Fig. 7B). From the helical wheel diagram of the TMD for Bos1, we noted that the hydrophobic 312 residues favored one face of the helix. We explored this 'hydrophobic face' by using model clients consistent with the dimensions of the groove predicted from the structural model (Fig. 3). In the 318 context of the full-length Sgt2, which exists as a dimer, an 11-residue cut-off suggests that two C-319 domains could come together and bind to a single TA whose TMD lengths range from 18-24 320

residues. 321
Since a detected binding event occurs with TMDs of at least 11 amino acids, we decided to 322 probe this limitation further. The dependency of client hydrophobicity was tested by measuring 323 complex formation of Sgt2-TPR-C and artificial TA clients containing an 11 amino acid TMD with 324

Sgt2-C preferentially binds to TMDs with a hydrophobic face 331
Next, we address the properties within the TMD of TA clients responsible for Sgt2 binding. 332 In the case of ySgt2, it has been suggested that the co-chaperone binds to TMDs based on 333 hydrophobicity and helical propensity [56]. In our system, our artificial TMDs consist of only 334 alanines and leucines which have high helical propensities [57], and despite keeping the helical 335 propensity constant and in a range that favors Sgt2 binding, there is still variation in binding 336 efficiency. For the most part, varying the hydrophobicity of an artificial TA client acts as expected, 337 the more hydrophobic TMDs bind more efficiently to Sgt2 TPR-C (Fig. 7C). Our Ccons model 338 suggests the hydrophobic groove of ySgt2-C protects a TMD with highly hydrophobic residues 339 clustered to one side (see Fig. 3B). To test this, various TMD pairs with the same hydrophobicity, 340 but different distributions of hydrophobic residues demonstrates TA clients with clustered leucines 341 have a higher relative binding efficiency than those with a more uniform distribution (Fig. 7D). 342 Helical wheel diagrams demonstrate the distribution of hydrophobic residues along the helix (e.g. and to provide a mechanistic explanation for binding a TMD of at least 11 hydrophobic residues. 364 We confidently identify the C-domain of Sgt2 as containing a STI1 domain for client binding 365 through sequence alignments and structural homology. This places Sgt2 into a larger context of 366 conserved co-chaperones (Fig. 8A) Computational modeling reveals a conserved region sufficient for TA client binding that 376 consists of a helical hand of five alpha-helices that form a hydrophobic groove to bind the client 377 TMD. The concept of TMD binding by a helical hand is reminiscent of other proteins involved in 378 membrane protein targeting. Like Sgt2, the signal recognition particle (SRP) contains a 379 methionine-rich domain that binds signal sequences and TMDs. While the helical order is inverted, 380 again five amphipathic helices form a hydrophobic groove that cradles the client signal peptide 381 [60] (Fig. 8B). Here once more, the domain has been observed to be flexible in the absence of correspondence to the demonstrated minimal 11 amino acids for a TA client to bind to the 388 monomeric Sgt2-TPR-C. In the context of the full-length Sgt2, one can speculate that the Sgt2 389 dimer may utilize both C-domains to bind to a full TMD, similar to calmodulin. Cooperation of 390 the two Sgt2 C-domains in client-binding could elicit conformational changes in the complex that 391 could be recognized by downstream factors, such as additional interactions that increase the 392 affinity to Get5/Ubl4A. 393 Intriguingly, Sgt2-TPR-C preferentially binds to artificial clients with clustered leucines. If 394 the C-domain forms a hydrophobic groove as suggested by the computational model, it provides 395 an attractive explanation for this preference. In order to bind to the hydrophobic groove, a client 396 buries a portion of its TMD in the groove leaving the other face exposed. Clustering hydrophobic 397 residues contributes to the hydrophobic effect driving binding efficiency and protecting them from 398 the aqueous environment. Indeed, GET pathway substrates have been suggested to be more While hSgt2 and ySgt2 share many properties, there are a number of differences between the 419 two homologs that may explain the different biochemical behavior. For the Ccons-domains, hSgt2 420 appears to be more ordered in the absence of client as the peaks in its NMR spectra are broader 421 (Fig. 1E). Comparing the domains at the sequence level, while the high glutamine content in the 422 C-domain is conserved it is higher in hSgt2 (8.8% versus 15.2%). The additional glutamines are 423 concentrated in the predicted longer H4 helix (Fig. 1A). The linker to the TPR domain is shorter 424 compared to ySgt2 while the loop between H3 and H4 is longer. Do these differences reflect 425 different roles? As noted, in every case the threshold for hydrophobicity of client-binding is lower 426 for hSgt2 than ySgt2 (Fig. 1E, 5, and 6) implying that the mammalian protein is more permissive 427 in client binding. The two C-domains have similar hydrophobicity, so this difference in binding 428 might be due to a lower entropic cost paid by having the hSgt2 C-domain more ordered in the 429 absence of client or the lack of an unstructured N-terminal loop. 430 The targeting of TA clients presents an intriguing and enigmatic problem for understanding 431 the biogenesis of IMPs. How subtle differences in each client modulates the interplay of hand-offs 432 that direct these proteins to the correct membrane remains to be understood. In this study, we focus 433 on a central player, Sgt2 and its client-binding domain. Through biochemistry and computational 434 analysis, we provide a structural model that adds more clarity to client discrimination. 435 The resin was washed with 20 mM Tris, 150 mM NaCl, 25 mM imidazole, 10 mM ME, pH 7.5. 458

Material and Methods
The complexes of interest were eluted in 20 mM Tris, 150 mM NaCl, 300 mM imidazole, 10 mM 459 ME, pH 7.5. 460 For structural analysis, the affinity tag was removed from complexes collected after the nickel 461 elution by an overnight TEV digestion against lysis buffer followed by size-exclusion 462 chromatography using a HiLoad 16/60 Superdex 75 prep grade column (GE Healthcare). 463 Measurement of Sgt2 protein concentration was carried out using the bicinchoninic acid 464 (BCA) assay with bovine serum albumin (BSA) as standard (Pierce Chemical Co.). Samples for NMR and CD analyses were concentrated to 10-15 mg/mL for storage at −80°C before 466

CD Spectroscopy 487
The CD spectra were recorded at 24°C with an Aviv 202 spectropolarimeter using a 1 mm 488 path length cuvette with 10 M protein in 20 mM phosphate buffer, pH 7.0. The CD spectrum of 489 each sample was recorded as the average over three scans from 190/195 to 250 nm in 1 nm steps. 490 Each spectrum was then decomposed into its most probable secondary structure elements using 491 BeStSel [68]. 492

Glu-C digestion of the double cysteine mutants on ySgt2-C 493
Complexes of the co-expressed wild type or double cysteine mutated His-ySgt2-TPR-C and the artificial TA client, 11[L8], with either a cMyc or MBP tag were purified as the other His-Sgt2 495 complexes described above or initially purified via amylose affinity chromatography before nickel  (2) 541 542 Each ETA and TTA value was obtained by blotting both simultaneously, i.e. adjacently on the same 543 blotting paper. To facilitate comparison between TA clients, the Sgt2-TPR-C/TA client complex 544 efficiency Ecomplex,TA is normalized by Sgt2-TPR-C/Bos1 complex efficiency Ecomplex,Bos1. 545 (3) 546

Sequence alignments 547
An alignment of Sgt2-C domains was carried out as follows: all sequences with an annotated  For hSgt2, using the same set of structure prediction servers above, we were only able to 571 produce a clear structural model using the Robetta transform-restrained mode. We were also unable 572 to generate a reliable model by directly using the ySgt2-C model as a template [82]. To crosslink 573 distance data from ySgt2 as restraints for hSgt2, pair positions were transferred from one protein 574 to the other via an alignment of Sgt2-C domains (excerpt in Fig. 1A) and ran Robetta ab initio. 575 Also, we grafted the N-terminal loop of ySgt2-C on hSgt2-C with the same set of restraints. 576 Images were rendered using PyMOL 2.3 (www.pymol.org).