SPIDR: a highly multiplexed method for mapping RNA-protein interactions uncovers a potential mechanism for selective translational suppression upon cellular stress

June 6, 2023

SPIDR: a highly multiplexed method for

mapping RNA-protein interactions uncovers

a potential mechanism for selective

translational suppression upon cellular stress

Erica Wolin

*, Jimmy K. Guo

2,3

*, Mario R. Blanco

, Andrew A. Perez

, Isabel N. Goronzy

, Ahmed A.

Abdou

, Darvesh Gorhe

, Mitchell Guttman

2†

, Marko Jovanovic

1†

RNA binding proteins (RBPs) play crucial roles in regulating every stage of the mRNA life cycle and

mediating non-coding RNA functions. Despite their importance, the specific roles of most RBPs

remain unexplored because we do not know what specific RNAs most RBPs bind. Current methods,

such as crosslinking and immunoprecipitation followed by sequencing (CLIP-seq), have expanded

our knowledge of RBP-RNA interactions but are generally limited by their ability to map only one

RBP at a time. To address this limitation, we developed SPIDR (Split and Pool Identification of

RBP targets), a massively multiplexed method to simultaneously profile global RNA binding sites of

dozens to hundreds of RBPs in a single experiment. SPIDR employs split-pool barcoding coupled

with antibody-bead barcoding to increase the throughput of current CLIP methods by two orders

of magnitude. SPIDR reliably identifies precise, single-nucleotide RNA binding sites for diverse

classes of RBPs simultaneously. Using SPIDR, we explored changes in RBP binding upon mTOR

inhibition and identified that 4EBP1 acts as a dynamic RBP that selectively binds to 5’-untranslated

regions of specific translationally repressed mRNAs only upon mTOR inhibition. This observation

provides a potential mechanism to explain the specificity of translational regulation controlled by

mTOR signaling. SPIDR has the potential to revolutionize our understanding of RNA biology and

both transcriptional and post-transcriptional gene regulation by enabling rapid,

de novo

discovery

of RNA-protein interactions at an unprecedented scale.

INTRODUCTION

RNA binding proteins (RBPs) play key roles in

controlling all stages of the mRNA life cycle, including

transcription, processing, nuclear export, translation,

and degradation

1–5

. Recent estimates suggest that up

to 30% of all human proteins (several thousand in total)

bind to RNA

6–10

, indicative of their broad activity and

central importance in cell biology. Moreover, mutations

in RBPs have been causally linked to various human

diseases, including immunoregulatory and neurological

disorders as well as cancer

2–4,11

. Yet, we still do not

know what specific roles most of these RBPs play

because the RNAs they bind remain mostly unknown.

In addition, there are many thousands of regulatory

non-coding RNAs (ncRNAs) whose functional roles

remain largely unknown

12,13

; understanding how they

work requires defining the proteins to which they

bind

13–15

. For example, uncovering the mechanism by

which the Xist long noncoding RNA (lncRNA) silences

the inactive X chromosome required identification

of the SPEN/SHARP RBP that binds to Xist

16–20

– a

process that took >25 years after the lncRNA was

discovered

. Given the large discrepancy between the

number of ncRNAs and putative RBPs identified, and

the number of RNA-protein interactions demonstrated

to be functionally relevant, there is an urgent need

to generate high-resolution binding maps to enable

functional characterization

Currently, the most rigorous and widely utilized method

to characterize RBP-RNA interactions is crosslinking

and immunoprecipitation followed by next generation

1. Department of Biological Sciences, Columbia University, New York City, New York 10027, USA

2. Division of Biology and Bioengineering, California Institute of Technology, Pasadena CA 91125, USA

3. Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA.

* These authors contributed equally

† To whom correspondence should be addressed: mguttman@caltech.edu & mj2794@columbia.edu

CC-BY 4.0 International license

available under a

was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which

this version posted June 7, 2023.

;

https://doi.org/10.1101/2023.06.05.543769

doi:

bioRxiv preprint

June 6, 2023

sequencing (CLIP-seq)

21–26

. Briefly, CLIP works by

utilizing UV light to covalently crosslink RNA and

directly interacting proteins, followed by cell lysis,

immunoprecipitation under stringent conditions (e.g.,

1M salt) to purify a protein of interest followed by gel

electrophoresis, transfer to a nitrocellulose membrane,

and excision of the protein-RNA complex prior to

sequencing and identification of the bound RNAs.

CLIP and its related variants have greatly expanded

our knowledge of RNA-RBP interactions and our

understanding of gene expression from mRNA splicing

to microRNA targeting

21–26

Yet, CLIP and all of its variants (with one recent

exception

which we discuss in more detail below; see

Note 1

) are limited to mapping a single RBP at a time. As

such, efforts to generate reference maps for hundreds

of RBPs in even a limited number of cell types have

required major financial investment and the work of

large teams working in international consortiums (e.g.,

ENCODE)

23,28,29

. Despite these herculean efforts and

the important advances they have enabled, there are

critical limitations: (i) Only a small fraction of the total

number of predicted RBPs have been successfully

mapped using genome-wide methods (ENCODE has

so far characterized the binding patterns of < 10% of

known RBPs); (ii) Of these, most have been mapped

in only a small number of cell lines (mainly K562 and

HepG2); (iii) Because each protein map is generated

from an individual experiment, a large number of

cells is required to map dozens, let alone hundreds,

of RBPs – this is particularly challenging for studying

primary cells, disease models, or other populations of

rare cells. Further, because these datasets are highly

cell type-specific, the generated maps are not likely to

be directly useful for studying these RBPs within other

cell-types or model systems (e.g., patient samples,

animal models, or perturbations). Thus, it is critically

important to enable the generation of comprehensive

RBP binding for any cell type of interest in a manner

that is accessible to any individual lab.

To overcome these challenges, we developed SPIDR

(Split and Pool Identification of RBP targets), a massively

multiplexed method to simultaneously profile the global

RNA binding sites of dozens to hundreds of RBPs in

a single experiment. SPIDR is based on our split-pool

barcoding strategy that maps multiway nucleic acid

interactions using high throughput sequencing

30–32

;

the vastly simplified version of split-pool barcoding

we present here, when combined with antibody-

bead barcoding, increases the throughput of current

CLIP methods by two orders of magnitude. Using

this approach, we can reliably identify the precise,

single nucleotide RNA binding sites of dozens of

RBPs simultaneously and can detect changes in

RBP binding upon perturbation. Using this approach,

we uncovered a mechanism driven by dynamic RBP

binding to mRNA that may explain the specificity of

translational regulation controlled by mTOR signaling.

Thus, SPIDR enables rapid,

de novo

discovery of

RNA-protein interactions at an unprecedented scale

and has the potential to transform our understanding

of RNA biology and both transcriptional and post-

transcriptional gene regulation.

RESULTS

SPIDR: A highly multiplexed method for mapping

RBP-RNA interactions

We developed SPIDR to enable highly multiplexed

mapping of RBPs to individual RNAs transcriptome-

wide. Briefly, SPIDR involves: (i) generating highly

multiplexed antibody-bead pools by tagging

individual antibody-bead conjugates with a specific

oligonucleotide (tagged bead pools), (ii) performing RBP

purification using these tagged antibody-bead pools in

UV-crosslinked cell lysates, and (iii) linking individual

antibodies to their associated RNAs using split-and-

pool barcoding (

Figure 1A

and

Supplemental Figure

We first devised a highly modular scheme to generate

hundreds of tagged antibody-beads such that each

unique bead population is labeled with a specific

oligonucleotide tag and all bead populations are

combined to generate an antibody-bead pool (

Figure

and

Supplemental Figure 1

). Because this

approach does not require direct chemical modification

of the antibody, we can utilize any antibody (in any

storage buffer) and rapidly link it to a defined sequence

on a bead at high efficiency using the same coupling

procedure utilized in traditional CLIP-based approaches

(see

Methods

). Using this pool, we perform on-bead

immunopurification (IP) of RBPs in UV-crosslinked

lysates using standard conditions and assign individual

protein identities to their associated RNAs using split-

and-pool barcoding, where the same barcode strings

are added to both the oligonucleotide bead tag and

immunopurified RNA (

Figure 1A

). We dramatically

simplified our split-and-pool tagging method such that

the entire protocol can be performed without the need

for specialized equipment in ~1 hour (see

Methods

After split-and-pool tagging and subsequent library

preparation, we sequenced all barcoded DNA

molecules (antibody-bead tags and the converted

cDNA of RNAs bound to corresponding RBPs). We

then matched all antibody-bead tags and RNA reads

by their shared barcodes; we refer to these as SPIDR

clusters (

Figure 1A

). We merged all SPIDR clusters

by protein identity (specified by the antibody-bead

tag) to generate a high-depth binding map for each

protein. The resulting datasets are analogous to those

generated by traditional individual CLIP approaches.

CC-BY 4.0 International license

available under a

was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which

this version posted June 7, 2023.

;

https://doi.org/10.1101/2023.06.05.543769

doi:

bioRxiv preprint

June 6, 2023

To ensure that IP using a pool containing multiple

antibodies can successfully and specifically purify

each of the individual proteins, we performed an IP

in K562 cells using a pool of antibodies against 39

RBPs and measured the purified proteins by liquid

chromatography tandem mass spectrometry (LC-MS/

MS). We confirmed that 35 of the 39 targeted RBPs

enriched at least 2-fold relative to a negative control,

showing that multiplexed enrichment of several RBPs

simultaneously is possible (

Supplemental Figure 2

The few exceptions were RBPs that were simply not

detected (neither in the pooled IP nor under control

conditions) and likely reflect either a poor antibody or

lack of RBP expression in this cell line.

SPIDR accurately maps dozens of RBPs within a

single experiment

To test whether SPIDR accurately maps RBPs to RNA,

we performed SPIDR in two widely studied human

Figure 1

A

B

Protein 1

Protein 2

Protein

RNA 1

RNA

C

D

XIST

Split

[0 - 12973]

Pool

[0 - 113]

HNRNPK

[0 - 125]

HNRNPM

[0 - 59]

SAF-A (HNRNPU)

[0 - 3157]

PTBP1

[0 - 2221]

SPEN (SHARP)

[0 - 59]

IgG

Split

H3C2

[0 - 576]

Pool

[0 - 90]

SLBP

[0 - 90]

AQR

[0 - 90]

LARP7

[0 - 90]

HNRNPK

[0 - 90]

PTBP1

[0 - 90]

IgG

Transcription

RNA Processing

Translation

Structural

BUD13

FUS

HNRNPA1

HNRNPC

HNRNPK

HNRNPL

HNRNPM

KHSRP

PTBP1

RBFOX2

SRSF9

TARDBP

TIAL1

TRA2A

mRNA Splicing

AQR

LARP7

LSM11

RPS2

RPS3

RPS6

SMNDC1

U2AF1

EWSR1

HNRNPU

LBR

SAFB

SHARP

SSB

TAF15

CPSF6

SLBP

UPF1

XRN1

DGCR8

DROSHA

HuR

DDX52

DDX55

DDX6

DHX30

FASTKD2

RBM15

ADAR1

NOLC1

WDR43

PCBP1

PCBP2

4EBP1

EIF4A

EIF4E

EIF4G1

FUBP3

ILF3

IMP1

IMP2

IMP3

LARP1

LARP4

LIN

28B

PUM1

CLIP

≥ 6 rounds split-pool tagging

Sequence + Assign

Analyze

Protein ID

Split

Pool

Antibody-bead pool

Lyse

Round 1

Round 2

Antibody

Bead

Barcoded

oligo

RNA/

cDNA

Reads

RNA/

cDNA

Reads

Oligo

Barcode

DNA

Ab ID

Barcode

m

Crosslinked

cells

96 well plate

m

Tag IDs

Ta g

Oligo

RNA/cDNA

Protein

Cluster

m

Figure 1: SPIDR (Split and Pool Identification of RBP targets) – a highly multiplexed method to map protein-RNA

interactions.

(A)

Schematic overview of the SPIDR method. The bead pool is incubated with UV crosslinked lysate in a single tube. After

immunopurification, each bead is uniquely labeled by split-and-pool barcoding. The complexity of the barcode generated depends on the

number of individual tags used in each split-pool round and the number of split-pool rounds. For example, after 8 rounds of split and pool

barcoding, using 12 barcodes in each round, the likelihood that two beads will end up with the same barcode is ~ 1 in 430 million (1/12

Oligos and RNA molecules and their linked barcodes are sequenced and RNAs are matched to proteins based on their shared barcodes. (The

bead labeling strategy was adapted from ChIP-DIP, a protocol for multiplexed mapping of proteins to DNA,

https://guttmanlab.caltech.edu/

technologies/

(B)

Schematic list of the different RBPs mapped by SPIDR in K562 and/or HEK293T cells, functional assignments based on

literature review.

(C)

An example of the raw alignment data for the pool (all reads before splitting by bead identities) and for specific RBPs (all

reads assigned to specific RBP beads) across the

XIST

RNA. Blocks represent exons, lines introns, and thick blocks are the annotated

XIST

repeat regions (A-E).

(D)

Raw alignment data for SLBP across the

H3C2

histone mRNA. Top track is pooled alignment data; tracks below are

reads assigned to SLBP or other RBPs and controls.

CC-BY 4.0 International license

available under a

was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which

this version posted June 7, 2023.

;

https://doi.org/10.1101/2023.06.05.543769

doi:

bioRxiv preprint

June 6, 2023

cell lines (K562 and HEK293T cells). Specifically,

we generated antibody bead

pools containing 68

uniquely tagged antibody-beads targeting 62 distinct

RBPs across the RNA life cycle, including splicing,

processing, and translation factors (

Figure 1B,

Supplemental Tables 1, 2

). As negative controls, we

included antibodies against epitopes not present in

endogenous human cells (GFP and V5), antibodies

that lack affinity to any epitope (mouse IgG), and

oligonucleotide-labeled beads lacking any antibody

(empty beads)

Using these pools, we performed SPIDR on 10 million

UV-crosslinked cells. Focusing on the K562 data (which

were sequenced at greater depth), we generated a

median of 4 oligonucleotide tags per SPIDR cluster

with the majority of clusters (>80%) containing tags

representing only a single antibody type (

Supplemental

Figure 3

), indicating that there is minimal ‘crosstalk’

between beads in a SPIDR experiment. This specificity

enables us to uniquely assign RNA molecules to

their corresponding RBPs. After removing PCR

duplicates, we assigned each sequenced RNA read

to its associated RBP and identified high confidence

binding sites by comparing read coverage across an

RNA to the coverage in all other targets in the pooled

IP (

Supplemental Figure 4, Supplemental Figure

; see

Methods

for details). Using this approach, we

detected the precise binding sites for SAF-A, PTBP1,

SPEN, and HNRNPK on the

XIST

RNA

17,20,23

(

Figure

). Although most proteins (38/53 RBPs in K562)

contained more than 2 million mapped RNA reads

(

Supplemental Figure 4

we observed specific

Figure 2

LSM11

WDR43

NOLC1

SMNDC1

FUS

TAF15

PCBP2

DDX52

RPS3

LARP1

ADAR1

DDX55

ILF3

LARP7

TARDBP

LIN28B

SSB

2.00

10.00

Enrichment

18S

28S

scaRNAs

Terc

tRNAs

45S

5’ETS

3’ETS

ITS1

ITS2

7SK

MIRLET7F1

MIRLET7D

[0-272]

[0-111]

Pool

LIN28B

IgG

[

0-2469]

[

0-1125]

[

0-1125]

Pool

LSM

IgG

DGCR8

[0-2544]

[0-170]

[0-52]

Pool

DROSHA

DGCR8

IgG

SPEN

[0-2159]

[0-44]

Pool

SPEN

IgG

ARDBP

TARDBP

[0-1127]

[0-62]

Pool

ARDBP

IgG

UPF1

[0-468]

[0-59]

Pool

UPF1

IgG

LIN28

WDR43

5’ ETS (45S rRNA)

300,000

200,000

100,000

Enrichment

ADARB1

Pool

ENCODE

SPIDR

Antibody 1

SPIDR

Antibody 2

IgG

hnRNPL

[0-109]

[0-111]

[0- 48]

[0-60]

[0-48]

Figure 2: SPIDR accurately maps binding of a diverse set of RBPs.

(A)

RNA binding patterns of selected RBPs (rows) relative

to 100nt windows across each classical non-coding RNA (columns). Each bin is colored based on the enrichment of read coverage per RBP

relative to background.

(B)

Sequence read coverage for LSM11 binding to U7 snRNA. For all tracks, “pool” refers to all reads prior to splitting

them by paired barcodes (shown in gray), and individual tracks (shown in teal) reflect reads after assignment to specific antibodies.

(C)

Enrichment of read coverage relative to background for WDR43 and LIN28B over the 5’ ETS region of 45S RNA.

) Sequence reads coverage

for LIN28B binding to let-7 miRNAs.

(E)

Sequence reads coverage for DROSHA/DGCR8, UPF1, SPEN, and TARDBP to their respective mRNAs.

(F)

Sequence reads coverage for two distinct antibodies to HNRNPL in a single SPIDR experiment. For comparison, HNRNPL coverage from

the ENCODE-generated eCLIP data is shown (bright green).

CC-BY 4.0 International license

available under a

was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which

this version posted June 7, 2023.

;

https://doi.org/10.1101/2023.06.05.543769

doi:

bioRxiv preprint

June 6, 2023

binding to known target sites even for RBPs with lower

numbers of reads. For example, SLBP (Stem Loop

Binding Protein) had only 1.5 million mapped reads yet

displayed strong enrichment specifically at the 3’ ends

of histone mRNAs as expected

(

Figure 1D

To systematically assess the quality, accuracy, and

resolution of our SPIDR binding maps and the scope of

the SPIDR method, we explored several key features:

(i) Accurate mapping of classical RNPs.

We targeted

RBPs of diverse functionality, such as those which

bind preferentially to RNAs coding for proteins and/or

lncRNAs, to introns, exons, miRNAs, etc., as well as

more “classical” ribonuclear protein (RNP) complexes,

such as the ribosome or spliceosome (

Figure 2A

). We

observed precise binding to the expected RNAs and

binding sites. For example, we observed binding of:

•

LSM11 to the U7 small nuclear RNA (snRNA)

and

the telomerase RNA component (TERC)

(

Figure 2A

and 2B

•

WDR43, a protein that is involved in ribosomal RNA

(rRNA) processing, to the 45S pre-rRNA and the U3

small nucleolar RNA (snoRNA), which is involved in

rRNA modification

(

Figure 2A and 2C

•

LIN28B to a distinct region of the 45S pre-rRNA,

consistent with recent reports of its role in ribosomal

RNA biogenesis in the nucleolus

(

Figure 2A and

•

NOLC1 (also known as NOPP140), a protein that

localizes within the nucleolus and Cajal bodies

37,38

to both the 45S pre-rRNA (enriched within the

nucleolus) and various small Cajal-body associated

RNAs (scaRNAs) (

Figure 2A

•

DDX52, a DEAD-box protein that is predicted to be

involved in the maturation of the small ribosomal

subunit

39,40

and RPS3, a structural protein contained

within the small ribosomal RNA subunit, to distinct

sites on the 18S rRNA (

Figure 2A

•

FUS and TAF15 to distinct locations on the U1

snRNA

41,42

(

Figure 2A

)

•

SMNDC1 specifically to the U2 snRNA

(

Figure 2A

)

•

SSB (also known as La protein) binding to tRNA

precursors consistent with its known role in the

biogenesis of RNA Polymerase III transcripts

44,45

(

Figure 2A

•

LIN28B to the let-7 miRNA

46–50

(

Figure 2D)

•

LARP7 binding to 7SK

(

Figure 2A

Supplemental

Figure 5

(ii) Many RBPs bind their own mRNAs to autoregulate

expression levels.

Many RBPs have been reported

to bind their own mRNAs to control their overall

protein levels through post-transcriptional regulatory

feedback

52–54

. For example, SPEN protein binds its

own mRNA to suppress its transcription

, UPF1 binds

its mRNA to target it for Nonsense Mediated Decay

TARDBP binds its own 3’-UTR to trigger an alternative

splicing event that results in degradation of its own

mRNA

57,58

, and DGCR8, which together with DROSHA

forms the known microprocessor complex, binds a

hairpin structure in

DGCR8

mRNA to induce cleavage

and destabilization of the mRNA

(

Figure 2E

). In

addition to these cases, we observed autoregulatory

binding of proteins to their own mRNAs for nearly a

third of our targeted RBPs (15 proteins) (

Supplemental

Figure 6

(iii)

Different antibodies that capture the same

protein or multiple proteins within the same complex

show similar binding

. We considered the possibility

that including antibodies against multiple proteins

contained within the same complex, or that otherwise

bind to the same RNA, within the same pooled sample

could compete against each other and therefore limit

the utility of large-scale multiplexing. However, we did

not observe this to be the case; in fact, antibodies

against different proteins known to occupy the same

complex displayed highly comparable binding sites on

the same RNAs. For example, DROSHA and DGCR8,

two proteins that bind as part of the microprocessor

complex, showed highly consistent binding patterns

across known miRNA precursors with significant

overlap in their binding sites (odds-ratio of 316-fold,

hypergeometric p-value < 10

-100

). Similarly, when we

included two distinct antibodies targeting the same

protein, HNRNPL, we observed highly comparable

binding profiles for both antibodies (

Figure 2F

) and

significant overlap in defined binding sites (odds-ratio

of 15-fold, hypergeometric p-value < 10

-100

). Taken

together, our results indicate that SPIDR can be used

to map different RBPs that bind to the same RNA

targets and can successfully map multiple antibodies

targeting the same protein. As such, SPIDR may be a

particularly useful tool for directly screening multiple

antibodies targeting the same protein to evaluate utility

for use in CLIP-like studies.

(iv) Transcriptome-wide SPIDR maps are highly

comparable with CLIP.

Because K562 represents the

ENCODE-mapped cell line with the largest number

of eCLIP datasets, we were able to benchmark our

SPIDR results directly to those generated by ENCODE.

To do this, we compared the profiles for each of the

33 RBPs that overlap between SPIDR and ENCODE

datasets in K562 cells

23,28,29

(see

Methods

). We

observed highly overlapping binding patterns for most

RBPs, including: HNRNPK binding to

POLR2A

(

Figure

), PTBP1 binding to

AGO1

(

Figure 3B

), RBFOX2 to

NDEL1

(

Figure 3C

) and the binding of several known

nuclear RBPs to

XIST

(

Figure 3D

). To explore this data

on a global scale, we compared RNA binding sites for

each RBP and observed significant overlap between

CC-BY 4.0 International license

available under a

was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which

this version posted June 7, 2023.

;

https://doi.org/10.1101/2023.06.05.543769

doi:

bioRxiv preprint