pnasSI201700269 1..8 - pnas.201700269SI.pdf

Supporting Information

Bedbrook et al. 10.1073/pnas.1700269114

Parental ChR Constructs

Each of the three ChR library parent genes was built using a

consistent vector backbone (pFCK) with the same promoter

(CMV), trafficking signal (TS) sequence, and fluorescent protein

(mKate). We used the pFCK vector from the construct FCK-

CheRiff-eGFP [Addgene plasmid #51693 (41)]. A TS sequence

(42) was inserted between the opsin and the fluorescent protein.

The TS sequence has been shown to enhance opsin membrane

trafficking (42). The GFP was replaced with mKate2.5 (43). Use

of a red fluorescent protein as the marker for the opsin expression

enabled use of SpyCatcher-GFP labeling for membrane-localized

proteins. mKate2.5 is a monomeric far-red fluorescent protein

that shows no aggregation. The mKate2.5 sequence was synthe-

sized by IDT with overhangs for cloning into the desired vector

system.

For the SpyTag/SpyCatcher membrane localization assay, it was

necessary to add the SpyTag sequence close to the N terminus of

each of the parental proteins and C-terminal to the signal peptide

sequence cleavage site. For C1C2, an optimal position of the

SpyTag had already been published. The SpyTag-C1C2 gene was

amplified from the construct pLenti-CaMKIIa-SpyTag-C1C2-TS-

mCherry (44) and inserted into t

he pFCK backbone. For CheRiff

and CsChrimR, it was necessary to test various N-terminal

SpyTag locations. The CheRiff gene was first amplified from

FCK-CheRiff-eGFP [Addgene plasmid #51693 (41)], and the

SpyTag sequence was added at different N-terminal positions by

assembly PCR methods. The CsChrimR gene was built by as-

sembly of the Cs N-terminal sequence (synthesized by IDT) with

the C-terminal end of ChrimsonR amplified from the FCK-

ChrimsonR-GFP construct [Addgene plasmid #59049 (39)]. The

sequence of CsChrimR was designed to be identical to the pre-

viously published sequence (39). The SpyTag sequence was then

inserted at different positions in the N-terminal region of the

protein using assembly PCR methods. We tested three different

pFCK-SpyTag-CheRiff-TS-mKate designs and three different

pFCK-SpyTag-CsChrimR-TS-mK

ate designs and selected the

design that showed expression and localization levels most similar

to the nontagged parent.

Assembly-based methods and traditional cloning were used for

vector construction and parental gene insertion. Annotated

vector sequences of the three SpyTagged parental constructs are

included as Datasets S3

–

S5.

Library Design

SCHEMA was used to design recombination libraries of the three

parental ChRs to minimize the library-average disruption of the

ChR structure (10, 25, 28). For the contiguous library, the

SCHEMA-predicted block definitions were not modified. This

10-block library had roughly even-length blocks (14

–

43 residues), a

relatively low average E value (E = 25), and whose sequences

have an average of 73 mutations from the nearest parent. For the

noncontiguous library, the SCHEMA-predicted block definitions

were modified to group the N- or C-terminal domains into single

blocks, maintain the presumptive dimer interface, and minimize

the number of small blocks (less than five mutations). Specifically,

a 13-block noncontiguous recombination library was generated for

which two N-terminal blocks were combined, two C-terminal

blocks were combined, two of four blocks in TM 5 were com-

bined, and two residues of TM 3 were switched to the same block

as TM 4 (where TM 3 and 4 make up the dimer interface observed

for C1C2). The two loops that were not modeled in the

C1C2 structure, between TM 1 and TM 2 and in the

-turn of the

C-terminal motif, were added to the block containing TM2 and

the C-terminal block, respectively. The unmodeled residues of the

N and C termini were added to the N- and C-terminal blocks. The

resulting noncontiguous library had 10 blocks, an average E value

of 23, an average of 71 mutations, and block size similar to the

contiguous library (Fig. 2

and

Among the three ChR parents, five unique N-linked glycosyl-

ation sites have been predicted by the NetNGlyc 1.0 (

www.cbs.dtu.

dk/services/NetNGlyc/

) and GlycoEP servers (52). C1C2 harbors

four of these sites with by far the highest confidence at each site.

With one exception, the putative N-linked glycosylation sites do

not overlap with recombination block borders. The exception site

(SpyTag-C1C2 N95) is located in between the N-terminal domain

and the first TM helix.

Contiguous recombination design was done using a software

package for calculating SCHEMA energies and running the

RASPP algorithm (23) openly available at

cheme.che.caltech.edu/

groups/fha/Software.htm

(53). Noncontiguous recombination de-

sign was done using a software package for performing non-

contiguous protein recombination (24) openly available at

cheme.

che.caltech.edu/grou

ps/fha/Software.htm

(54). Both software packages

are written in the Python programming language.

Construction of Chimeras

The SCHEMA software outputs the amino acid sequences of all

chimeras in a library. The amino acid sequence for each chimera

chosen for experimental testing was converted into a nucleotide

sequence using the following method to define codon use:

1. Align the amino acid sequence to the C1C2 parent.

2. Assign conserved amino acids in the alignment to the C1C2

parental codon.

3. Assign nonconserved amino acids to the parental codon from

which the amino acid is derived.

This method was used for all chimeras to ensure that codon use

was consistent. Once amino acid sequences were converted into

nucleotide sequences, additional 3

′

and 5

′

sequences containing a

BamHI and a NotI restriction enzyme cut site, respectively, were

appended to the gene sequence. These sequences were necessary

for cloning in the pFCK vector using either restriction ligation or

homology-based cloning strategies. Gene sequences for the 223-

chimera set were synthesized by Twist Bioscience, using its

proprietary silicon-based DNA writing technology. After as-

sembly, each fragment was cloned in the pFCK vector by

homology-based cloning strategy and transformed into Stbl3 cells

(Invitrogen) or Endura cells (Lucigen). Individual clones were

picked and sequenced by NGS. Perfect clones were stored as

individual glycerol stocks. Eight of the single-block swap se-

quences failed either the synthesis or cloning steps; these were

not included in the chimera set.

Purified plasmid DNA of each chimera was prepared for HEK

cell transfection. Each construct was streaked onto LB-amp plates

from a glycerol stock, and an individual colony from each construct

was picked and used to inoculate a 5-mL LB-ampicillin liquid

media. Cultures were then grown overnight to reach saturation.

Plasmid DNA for each construct was then purified using the

QIAprep Spin Miniprep Kit. DNA concentrations for all constructs

were measured and normalized before HEK cell transfection.

HEK Cell Maintenance and Transfection

HEK 293T cells were cultured at 37 °C and 5% CO

in D10

[DMEM supplemented with 10% (vol/vol) FBS, 1% sodium

Bedbrook et al.

www.pnas.org/cgi/content/short/1700269114

1of8

bicarbonate, and 1% sodium pyruvate]. For 96-well transfec-

tions, HEK cells were plated on poly-

-lysine

–

coated glass-

bottom 96-well plates at 20

–

30% confluency. Cells were left to

divide until they reached 70

–

80% confluency. HEK cells were

then transfected with one library variant per well at a pre-

normalized DNA concentration using Fugene6 reagent accord-

ing to the manufacturer

’

s recommendations. Cells were given

48 h to express and then subjected to the SpyCatcher-GFP la-

beling assay and imaged.

Recombinant SpyCatcher-GFP Expression and Purification

The SpyCatcher-GFP was produced from a previously published

construct

—

pQE80l-T5::6xhis-SpyCatcher-Elp-GFP

[for details,

see Bedbrook et al. (44)].

E. coli

expression strain

BL21(DE3)

harboring the

pQE80l-T5::6xhis-SpyCatcher-Elp-GFP

plasmid

was grown at 37 °C in TB medium to an optical density of 0.6

–

0.8 at 600 nm, and protein expression was induced using 1 mM

isopropyl

-1-thiogalactopyranoside at 30 °C. After 4 h of in-

duction, cells were harvested and frozen at

−

80 °C before protein

purification. Protein purification was carried out using HisTrap

columns (GE Healthcare) following the column manufacturer

’

recommendations. Protein was buffer exchanged into sterile PBS

at 4 °C. Protein was stable through multiple freeze/thaws and

over many months.

SpyCatcher Labeling of HEK Cells

HEK cells were subjected to SpyCatcher labeling 48 h post-

transfection. Labeling was done in a 96-well format using mul-

tichannel pipettes. SpyCatcher-GFP was added directly into the

D10 media of wells containing HEK cells at a final concentration

of 30

M, and the cells were then incubated for 45 min at 25 °C.

To avoid variability in labeling in the 96-well format screen, we

used a saturating concentration of the SpyCatcher (30

M) for

labeling experiments. After labeling, HEK cells were washed

with D10 three times, and then cells were incubated at 37 °C for

1 h to allow any remaining SpyCatcher to diffuse off of the well

surface. For cell imaging, D10 medium was replaced with ex-

tracellular buffer (in mM: 140 NaCl, 5 KCl, 10 Hepes, 2 MgCl

2 CaCl

, 10 glucose; pH 7.35) to avoid the high autofluorescence

of the D10. Cells were washed two times with extracellular buffer

to fully remove any residual D10 before imaging.

Imaging and Image Processing of ChR Expression and

Localization

Imaging of ChR expression and localization was done using a Leica

DMI 6000 microscope. Four positions in each well were imaged in

all 96-well plates using a fully automated system with motorized

stage and automated

focus. Three channels were imaged at each

position (mKate, GFP, and bright field). Cell segmentation was

done using CellProfiler (55), an open-source image-processing

software, and whole population intensity measurements were

done using custom image-processing scripts written using open-

source packages in the SciPy ecosystem (56

–

58). Both processing

methods require a series of filtering steps and background sub-

traction. Whole population intensity measurements required a

thresholding step when defining a pixel mask for image process-

ing. We used wells containing nontransfected HEK cell that went

through the labeling experiment as a background for establishing a

threshold. A threshold was set to 2 SDs above the mean intensity

values calculated in these background wells for each channel

(mKate and GFP). For each image, a mask was defined for each

channel (mKate and GFP) as the pixels above a set threshold. The

masks for the two channels were then combined so that the mask

included any pixel that was above threshold in the GFP channel or

the mKate channel. This combined pixel mask was used to cal-

culate the mean mKate fluorescence intensity (expression) and

mean GFP fluorescence intensity (localization) across the pixels in

the mask. The ratio mean mKate intensity/mean GFP intensity is

the localization efficiency.

Electrophysiology for ChR Photocurrents

Conventional whole-cell patch-clamp recordings were done in

cultured HEK cells at 2 d posttransfection. Cells were continuously

perfused with extracellular solution at room temperature (in mM:

140 NaCl, 5 KCl, 10 Hepes, 2 MgCl

,2CaCl

, 10 glucose; pH 7.35)

while mounted on the microscope stage. Patch pipettes were

fabricated from borosilicate capillary glass tubing (1B150-4; World

Precision Instruments) using a model P-2000 laser puller (Sutter

Instruments) to resistances of 2

–

.Pipetteswerefilledwith

intracellular solution containing the following (in mM): 134 K

gluconate, 5 EGTA, 10 Hepes, 2 MgCl

,0.5CaCl

,3ATP,and

0.2 GTP. Whole-cell patch-clamp recordings were made using

a Multiclamp 700B amplifier (Molecular Devices), a Digidata

1440 digitizer (Molecular Devices), and a PC running pClamp

(version 10.4) software (Molecular Devices) to generate current

injection waveforms and to record voltage and current traces.

Patch-clamp recordings were done with short light pulses to

measure photocurrents. Photocurrents for each chimera were

induced by three different wavelengths of light (473

10, 560

25, and 650

13 nm) at 2 mW (

∼

0.1 mW

−

). Photocurrents

were recorded from cells in voltage clamp held at

−

50 mV with

one light pulse for 1 s with each wavelength of light tested se-

quentially with 2 min between light exposures. Because ChRs

show some level of desensitization to light after continued light

exposure, we ran all colors in one direction (red

→

green

→

blue) and then again in the other direction (blue

→

green

→

red). The means of peak and steady-state currents were calcu-

lated for each color between the two trials for a given cell. Light

wavelengths were produced using LED illumination using a

Lumencor SPECTRAX light engine with quad band 387/485/

559/649-nm excitation filter, quad band 410/504/582/669-nm di-

chroic mirror, and quad band 440/521/607/700-nm emission filter

(all SEMROCK).

Electrophysiology data were analyzed using custom data-

processing scripts written using open-source packages in the

Python programming language to do baseline adjustments, find

the peak inward currents, and find the steady-state currents.

Bedbrook et al.

www.pnas.org/cgi/content/short/1700269114

2of8

CheRiff          EYHAPAGYQVNPPYHPVHGYE---EQCSSIYIYYGALWEQETARGFQWFAVFLSALFLAF   57

C1C2             RMLFQTSYTLENNGSVICIPNNGQCFCLAWLKSNGTNAEKLAANILQWITFALSALCLMF   60

CsChrimsonR      GFDELAKGAVVPEDHFVCGPA-DKCYCSAWLHSRGTPGEKIGAQVCQWIAFSIAIALLTF   59

:   :      :         * :     *:  *:  *.  **::. ::   * *

Contiguous

Non-Contiguous

2

nd

Structure

CheRiff          YGWHAYKASVGWE

E

VY

V

CSV

E

LIKVILEIYFEFTSPAMLFLYGGNITPWLRYA

EW

LL

TC

P   117

C1C2             YGYQTWKSTCGWE

E

IY

V

ATI

E

MIKFIIEYFHEFDEPAVIYSSNGNKTVWLRYA

EW

LL

TC

P120

CsChrimsonR      YGFSAWKATCGWE

E

VY

V

CCV

E

VLFVTLEIFKEFSSPATVYLSTGNHAYCLRYF

EW

LL

SC

P   119

**: ::*:: ****:**. :*:: . :* : ** .** ::   ** :  *** ****:**

Contiguous

Non-Contiguous

2

nd

Structure

CheRiff          V

I

LIHLSNITGLSEAYNKRTMALLVS

D

LG

TI

CM

G

VTAALATGWVKWLF

Y

CI

GL

VY

G

TQT

F

177

C1C2             V

I

LIHLSNLTGLANDYNKRTMGLLVS

D

IG

TI

VW

G

TTAALSKGYVRVIF

F

LM

GL

CY

G

IYT

F

180

CsChrimsonR      V

I

LIRLSNLSGLKNDYSKRTMGLIVSCVG

MI

VF

G

MAAGLATDWLKWLL

Y

IV

SC

IY

G

GY

MY

179

****:***::** : *.****.*:** :* *  * :*.*:. ::: ::: :.  **   :

Contiguous

Non-Contiguous

2

nd

Structure

CheRiff          YNAGIIYVESYYIMPAGGCKKLVLAMTAVYYSS

W

LM

FP

GL

F

IFGPEGMHTLSVAGSTIGH   237

C1C2             FNAAKVYIEAYHTVPKGRCRQVVTGMAWLFFVS

W

GM

FP

IL

F

ILGPEGFGVLSVYGSTVGH   240

CsChrimsonR      FQAAKCYVEANHSVPKGHCRMVVKLMAYAYFAS

W

GS

YP

IL

W

AVGPEGLLKLSPYANSIGH   239

::*.  *:*: : :* * *: :*  *:  :: **  :* *: .****:  **  ..::**

Contiguous

Non-Contiguous

2

nd

Structure

CheRiff          TIA

D

LL

S

K

N

I

WG

LLGHFLRIKIHEHIIMYGDIRRPVSSQFLGRKVDVLAFVTEE    291

C1C2             TII

D

LM

S

K

N

C

WG

LLGHYLRVLIHEHILIHGDIRKTTKLNIGGTEIEVETLVEDE    294

CsChrimsonR      SIC

D

II

A

K

E

F

WT

FLAHHLRIKIHEHILIHGDIRKTTKMEIGGEEVEVEEFVEEE    293

:* *:::*: * :*.*.**: *****:::****: .. :: * :::*  :* :*

Contiguous

Non-Contiguous

2

nd

Structure

Fig. S1.

Amino acid alignment of parental sequences and recombination block designs. Alignment showing the contiguous and noncontiguous block designs.

Each color represents a different block, and white shows the conserved residues. Amino acids thought to be important for ChR spectral properties are b

olded

and underlined. The conserved lysine residue that participates in a Schiff base linkage with retinal is highlighted in red text. The secondary struct

ure is shown

below the alignment.

Bedbrook et al.

www.pnas.org/cgi/content/short/1700269114

3of8

AB

C

Fig. S2.

Interdependencies of chimera properties. Chimera data are plotted as gray points, and parental data points are highlighted in color (red, CsChrimR;

green, C1C2; and blue, CheRiff). (

) Plot of measured localization [mean GFP fluorescence (in arbitrary units)] vs. measured expression [mean mKate fluo-

rescence (in arbitrary units)] shows no clear correlation. (

) Plot of measured localization vs. number of mutations from closest parent. (

) Plot of measured

expression vs. number of mutations from closest parent. Dashed lines in

and

show the measured properties of the lowest-performing parent (CheRiff).

Fig. S3.

Chimeras from the contiguous and noncontiguous libraries, ranked by expression, localization, and localization efficiency. Block identity of the

chimeras ranked according to performance for each given property with the best-ranking chimera at the top of the list for the contiguous (

) and non-

contiguous (

) library chimeras. Each row represents a chimera. The colors represent the parental origin of the block (red, CsChrimR; green, C1C2; and blue,

CheRiff). The properties shown are measured expression [mean mKate fluorescence (in arbitrary units)], localization [mean GFP fluorescence (in ar

bitrary

units)], and localization efficiency (mean mKate/GFP fluorescence).

Bedbrook et al.

www.pnas.org/cgi/content/short/1700269114

4of8

B

A

C

Fig. S4.

Comparison of chimeras from the contiguous and noncontiguous recombination libraries. Swarm plot showing each chimera

’

s expression [mean

mKate fluorescence (in arbitrary units)] (

), localization [mean GFP fluorescence (in arbitrary units)] (

), and localization efficiency (mean mKate/GFP fluo-

rescence) (

) for the contiguous and noncontiguous recombination libraries. Chimera data are plotted as gray points, and parental data points are highlighted

in color (red, CsChrimR; green, C1C2; and blue, CheRiff).

Bedbrook et al.

www.pnas.org/cgi/content/short/1700269114

5of8

B

D

A

C

Fig. S5.

Comparison of measured expression and membrane localization efficiency for each chimera set. Swarm plots of expression [mean mKate fluorescence

(in arbitrary units)] (

) and localization efficiency (mean mKate/GFP fluorescence) (

) showing measurements for each dataset compared with parents: single-

block swaps, maximally informative with mutation cap, and maximally informative. Chimera data are plotted as gray points, and parental data points a

highlighted in color (red, CsChrimR; green, C1C2; and blue, CheRiff). Comparison of single-block swap chimeras measured expression (

) and localization

efficiency (

) relative to the dominant parent. Each single-block swap chimera is grouped based on the dominant parent with data points colored based on the

identity of the single block being swapped in (red, CsChrimR block; green, C1C2 block; and blue, CheRiff block). The large point in each group shows the

performance of the dominant parent.

Bedbrook et al.

www.pnas.org/cgi/content/short/1700269114

6of8

Fig. S6.

Photocurrents vs. measured localization for all tested chimeras. Chimera data are plotted as gray points and parental data points are highlighted in

color (red, CsChrimR; green, C1C2; and blue, CheRiff). Plot of measured photocurrents vs. measured localization [mean GFP fluorescence (in arbitra

ry units)] for

three different wavelengths: 473 nm (

Top

, blue shading), 560 nm (

Middle

, green shading), and 650 nm (

Bottom

, red shading).

Fig. S7.

One multiblock swap chimera with unique properties. (

) Chimera photocurrents upon 1-s exposure to 473-nm (

Top

), 560-nm (

Middle

), and 650-nm

(

Bottom

) light. (

) Sequential activation of chimera with 473-nm and then 560-nm light. (

) Sequential activation of chimera with 560-nm and then 560-nm

light.

Bedbrook et al.

www.pnas.org/cgi/content/short/1700269114

7of8

Dataset S1. Localization and expression of 218 ChR chimeras

Dataset S1

Measured localization and expression properties for each chimera tested and associated chimera name, chimera_block_ID, and sequence. Chimera nam

and chimera_block_ID begin with either

“

”

“

”

to indicate the contiguous or noncontiguous library. The following 10 digits in the chimera_block_ID

indicate, in block order, the parent that contributes each of the 10 blocks (

“

”

CheRiff;

“

”

C1C2; and

“

”

CsChrimR). For the contiguous library, blocks in the

chimera_block_ID are listed from N to C termini; for the noncontiguous library, the block order is arbitrary. Sequences list only the ChR ORF; the C-te

rminal

trafficking and mKate2.5 sequences have been removed, but are available parental sequences in annotated GenBank files. The table shows mean propert

ies

(mKate_mean, GFP_mean, and intensity_ratio_mean) and the SD of properties (mKate_SD, GFP_SD, and intensity_ratio_SD). ND, not detected, below th

e limit

of detection for our assay.

Dataset S2. Functional characteristics of 75 ChR chimeras

Dataset S2

Functional characteristics of each tested chimera and associated chimera name and chimera_block_ID. Photocurrent was measured using patch-clamp

electrophysiology in voltage-clamp mode upon exposure to 473-nm (cyan), 560-nm (green), or 650-nm (red) wavelength light. The table has mean peak

and steady-state photocurrent (in picoamperes) and the SD of peak and steady-state photocurrent (in picoamperes) at each wavelength. The chimera_b

lock_ID

begins with either

“

”

“

”

to indicate the contiguous or noncontiguous library. The following 10 digits in the chimera_block_ID indicate, in block order, the

parent that contributes each of the 10 blocks (

“

”

CheRiff;

“

”

C1C2; and

“

”

CsChrimR).

Dataset S3. SpyTagged C1C2 sequence

Dataset S3

Dataset S4. SpyTagged CheRiff sequence

Dataset S4

Dataset S5. SpyTagged CsChrimsonR sequence

Dataset S5

Bedbrook et al.

www.pnas.org/cgi/content/short/1700269114

8of8