of 30
1
June 6, 2023
SPIDR: a highly multiplexed method for
mapping RNA-protein interactions uncovers
a potential mechanism for selective
translational suppression upon cellular stress
Erica Wolin
1
*, Jimmy K. Guo
2,3
*, Mario R. Blanco
2
, Andrew A. Perez
2
, Isabel N. Goronzy
2
, Ahmed A.
Abdou
1
, Darvesh Gorhe
1
, Mitchell Guttman
2†
, Marko Jovanovic
1†
RNA binding proteins (RBPs) play crucial roles in regulating every stage of the mRNA life cycle and
mediating non-coding RNA functions. Despite their importance, the specific roles of most RBPs
remain unexplored because we do not know what specific RNAs most RBPs bind. Current methods,
such as crosslinking and immunoprecipitation followed by sequencing (CLIP-seq), have expanded
our knowledge of RBP-RNA interactions but are generally limited by their ability to map only one
RBP at a time. To address this limitation, we developed SPIDR (Split and Pool Identification of
RBP targets), a massively multiplexed method to simultaneously profile global RNA binding sites of
dozens to hundreds of RBPs in a single experiment. SPIDR employs split-pool barcoding coupled
with antibody-bead barcoding to increase the throughput of current CLIP methods by two orders
of magnitude. SPIDR reliably identifies precise, single-nucleotide RNA binding sites for diverse
classes of RBPs simultaneously. Using SPIDR, we explored changes in RBP binding upon mTOR
inhibition and identified that 4EBP1 acts as a dynamic RBP that selectively binds to 5’-untranslated
regions of specific translationally repressed mRNAs only upon mTOR inhibition. This observation
provides a potential mechanism to explain the specificity of translational regulation controlled by
mTOR signaling. SPIDR has the potential to revolutionize our understanding of RNA biology and
both transcriptional and post-transcriptional gene regulation by enabling rapid,
de novo
discovery
of RNA-protein interactions at an unprecedented scale.
INTRODUCTION
RNA binding proteins (RBPs) play key roles in
controlling all stages of the mRNA life cycle, including
transcription, processing, nuclear export, translation,
and degradation
1–5
. Recent estimates suggest that up
to 30% of all human proteins (several thousand in total)
bind to RNA
6–10
, indicative of their broad activity and
central importance in cell biology. Moreover, mutations
in RBPs have been causally linked to various human
diseases, including immunoregulatory and neurological
disorders as well as cancer
2–4,11
. Yet, we still do not
know what specific roles most of these RBPs play
because the RNAs they bind remain mostly unknown.
In addition, there are many thousands of regulatory
non-coding RNAs (ncRNAs) whose functional roles
remain largely unknown
12,13
; understanding how they
work requires defining the proteins to which they
bind
13–15
. For example, uncovering the mechanism by
which the Xist long noncoding RNA (lncRNA) silences
the inactive X chromosome required identification
of the SPEN/SHARP RBP that binds to Xist
16–20
– a
process that took >25 years after the lncRNA was
discovered
14
. Given the large discrepancy between the
number of ncRNAs and putative RBPs identified, and
the number of RNA-protein interactions demonstrated
to be functionally relevant, there is an urgent need
to generate high-resolution binding maps to enable
functional characterization
14
.
Currently, the most rigorous and widely utilized method
to characterize RBP-RNA interactions is crosslinking
and immunoprecipitation followed by next generation
1. Department of Biological Sciences, Columbia University, New York City, New York 10027, USA
2. Division of Biology and Bioengineering, California Institute of Technology, Pasadena CA 91125, USA
3. Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA.
* These authors contributed equally
† To whom correspondence should be addressed: mguttman@caltech.edu & mj2794@columbia.edu
.
CC-BY 4.0 International license
available under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which
this version posted June 7, 2023.
;
https://doi.org/10.1101/2023.06.05.543769
doi:
bioRxiv preprint
2
June 6, 2023
sequencing (CLIP-seq)
21–26
. Briefly, CLIP works by
utilizing UV light to covalently crosslink RNA and
directly interacting proteins, followed by cell lysis,
immunoprecipitation under stringent conditions (e.g.,
1M salt) to purify a protein of interest followed by gel
electrophoresis, transfer to a nitrocellulose membrane,
and excision of the protein-RNA complex prior to
sequencing and identification of the bound RNAs.
CLIP and its related variants have greatly expanded
our knowledge of RNA-RBP interactions and our
understanding of gene expression from mRNA splicing
to microRNA targeting
21–26
.
Yet, CLIP and all of its variants (with one recent
exception
27
which we discuss in more detail below; see
Note 1
) are limited to mapping a single RBP at a time. As
such, efforts to generate reference maps for hundreds
of RBPs in even a limited number of cell types have
required major financial investment and the work of
large teams working in international consortiums (e.g.,
ENCODE)
23,28,29
. Despite these herculean efforts and
the important advances they have enabled, there are
critical limitations: (i) Only a small fraction of the total
number of predicted RBPs have been successfully
mapped using genome-wide methods (ENCODE has
so far characterized the binding patterns of < 10% of
known RBPs); (ii) Of these, most have been mapped
in only a small number of cell lines (mainly K562 and
HepG2); (iii) Because each protein map is generated
from an individual experiment, a large number of
cells is required to map dozens, let alone hundreds,
of RBPs – this is particularly challenging for studying
primary cells, disease models, or other populations of
rare cells. Further, because these datasets are highly
cell type-specific, the generated maps are not likely to
be directly useful for studying these RBPs within other
cell-types or model systems (e.g., patient samples,
animal models, or perturbations). Thus, it is critically
important to enable the generation of comprehensive
RBP binding for any cell type of interest in a manner
that is accessible to any individual lab.
To overcome these challenges, we developed SPIDR
(Split and Pool Identification of RBP targets), a massively
multiplexed method to simultaneously profile the global
RNA binding sites of dozens to hundreds of RBPs in
a single experiment. SPIDR is based on our split-pool
barcoding strategy that maps multiway nucleic acid
interactions using high throughput sequencing
30–32
;
the vastly simplified version of split-pool barcoding
we present here, when combined with antibody-
bead barcoding, increases the throughput of current
CLIP methods by two orders of magnitude. Using
this approach, we can reliably identify the precise,
single nucleotide RNA binding sites of dozens of
RBPs simultaneously and can detect changes in
RBP binding upon perturbation. Using this approach,
we uncovered a mechanism driven by dynamic RBP
binding to mRNA that may explain the specificity of
translational regulation controlled by mTOR signaling.
Thus, SPIDR enables rapid,
de novo
discovery of
RNA-protein interactions at an unprecedented scale
and has the potential to transform our understanding
of RNA biology and both transcriptional and post-
transcriptional gene regulation.
RESULTS
SPIDR: A highly multiplexed method for mapping
RBP-RNA interactions
We developed SPIDR to enable highly multiplexed
mapping of RBPs to individual RNAs transcriptome-
wide. Briefly, SPIDR involves: (i) generating highly
multiplexed antibody-bead pools by tagging
individual antibody-bead conjugates with a specific
oligonucleotide (tagged bead pools), (ii) performing RBP
purification using these tagged antibody-bead pools in
UV-crosslinked cell lysates, and (iii) linking individual
antibodies to their associated RNAs using split-and-
pool barcoding (
Figure 1A
and
Supplemental Figure
1
).
We first devised a highly modular scheme to generate
hundreds of tagged antibody-beads such that each
unique bead population is labeled with a specific
oligonucleotide tag and all bead populations are
combined to generate an antibody-bead pool (
Figure
1A
and
Supplemental Figure 1
). Because this
approach does not require direct chemical modification
of the antibody, we can utilize any antibody (in any
storage buffer) and rapidly link it to a defined sequence
on a bead at high efficiency using the same coupling
procedure utilized in traditional CLIP-based approaches
(see
Methods
). Using this pool, we perform on-bead
immunopurification (IP) of RBPs in UV-crosslinked
lysates using standard conditions and assign individual
protein identities to their associated RNAs using split-
and-pool barcoding, where the same barcode strings
are added to both the oligonucleotide bead tag and
immunopurified RNA (
Figure 1A
). We dramatically
simplified our split-and-pool tagging method such that
the entire protocol can be performed without the need
for specialized equipment in ~1 hour (see
Methods
).
After split-and-pool tagging and subsequent library
preparation, we sequenced all barcoded DNA
molecules (antibody-bead tags and the converted
cDNA of RNAs bound to corresponding RBPs). We
then matched all antibody-bead tags and RNA reads
by their shared barcodes; we refer to these as SPIDR
clusters (
Figure 1A
). We merged all SPIDR clusters
by protein identity (specified by the antibody-bead
tag) to generate a high-depth binding map for each
protein. The resulting datasets are analogous to those
generated by traditional individual CLIP approaches.
.
CC-BY 4.0 International license
available under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which
this version posted June 7, 2023.
;
https://doi.org/10.1101/2023.06.05.543769
doi:
bioRxiv preprint
3
June 6, 2023
To ensure that IP using a pool containing multiple
antibodies can successfully and specifically purify
each of the individual proteins, we performed an IP
in K562 cells using a pool of antibodies against 39
RBPs and measured the purified proteins by liquid
chromatography tandem mass spectrometry (LC-MS/
MS). We confirmed that 35 of the 39 targeted RBPs
enriched at least 2-fold relative to a negative control,
showing that multiplexed enrichment of several RBPs
simultaneously is possible (
Supplemental Figure 2
).
The few exceptions were RBPs that were simply not
detected (neither in the pooled IP nor under control
conditions) and likely reflect either a poor antibody or
lack of RBP expression in this cell line.
SPIDR accurately maps dozens of RBPs within a
single experiment
To test whether SPIDR accurately maps RBPs to RNA,
we performed SPIDR in two widely studied human
Figure 1
A
B
Protein 1
Protein 2
Protein
m
RNA 1
RNA
n
C
D
XIST
E
D
C
A
Split
[0 - 12973]
Pool
[0 - 113]
HNRNPK
[0 - 125]
HNRNPM
[0 - 59]
SAF-A (HNRNPU)
[0 - 3157]
PTBP1
[0 - 2221]
SPEN (SHARP)
[0 - 59]
IgG
Split
H3C2
[0 - 576]
Pool
[0 - 90]
SLBP
[0 - 90]
AQR
[0 - 90]
LARP7
[0 - 90]
HNRNPK
[0 - 90]
PTBP1
[0 - 90]
IgG
Transcription
RNA Processing
Translation
Structural
BUD13
FUS
HNRNPA1
HNRNPC
HNRNPK
HNRNPL
HNRNPM
KHSRP
PTBP1
RBFOX2
SRSF9
TARDBP
TIAL1
TRA2A
mRNA Splicing
AQR
LARP7
LSM11
RPS2
RPS3
RPS6
SMNDC1
U2AF1
EWSR1
HNRNPU
LBR
SAFB
SHARP
SSB
TAF15
CPSF6
SLBP
UPF1
XRN1
DGCR8
DROSHA
HuR
DDX52
DDX55
DDX6
DHX30
FASTKD2
RBM15
ADAR1
NOLC1
WDR43
PCBP1
PCBP2
4EBP1
EIF4A
EIF4E
EIF4G1
FUBP3
ILF3
IMP1
IMP2
IMP3
LARP1
LARP4
LIN
28B
PUM1
CLIP
≥ 6 rounds split-pool tagging
Sequence + Assign
Analyze
Protein ID
Split
Split
Split
Split
Pool
Pool
Antibody-bead pool
Antibody-bead pool
Lyse
Lyse
Round 1
Round 2
Antibody
Bead
Barcoded
oligo
B
A
RNA/
cDNA
Reads
RNA/
cDNA
Reads
Oligo
Oligo
Barcode
DNA
Ab ID
Barcode
A
C
B
m
Crosslinked
cells
96 well plate
A
B
C
m
Tag IDs
Ta g
Oligo
RNA/cDNA
Protein
Cluster
Cluster
A
A
A
B
C
m
Figure 1: SPIDR (Split and Pool Identification of RBP targets) – a highly multiplexed method to map protein-RNA
interactions.
(A)
Schematic overview of the SPIDR method. The bead pool is incubated with UV crosslinked lysate in a single tube. After
immunopurification, each bead is uniquely labeled by split-and-pool barcoding. The complexity of the barcode generated depends on the
number of individual tags used in each split-pool round and the number of split-pool rounds. For example, after 8 rounds of split and pool
barcoding, using 12 barcodes in each round, the likelihood that two beads will end up with the same barcode is ~ 1 in 430 million (1/12
8
).
Oligos and RNA molecules and their linked barcodes are sequenced and RNAs are matched to proteins based on their shared barcodes. (The
bead labeling strategy was adapted from ChIP-DIP, a protocol for multiplexed mapping of proteins to DNA,
https://guttmanlab.caltech.edu/
technologies/
).
(B)
Schematic list of the different RBPs mapped by SPIDR in K562 and/or HEK293T cells, functional assignments based on
literature review.
(C)
An example of the raw alignment data for the pool (all reads before splitting by bead identities) and for specific RBPs (all
reads assigned to specific RBP beads) across the
XIST
RNA. Blocks represent exons, lines introns, and thick blocks are the annotated
XIST
repeat regions (A-E).
(D)
Raw alignment data for SLBP across the
H3C2
histone mRNA. Top track is pooled alignment data; tracks below are
reads assigned to SLBP or other RBPs and controls.
.
CC-BY 4.0 International license
available under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which
this version posted June 7, 2023.
;
https://doi.org/10.1101/2023.06.05.543769
doi:
bioRxiv preprint
4
June 6, 2023
cell lines (K562 and HEK293T cells). Specifically,
we generated antibody bead
pools containing 68
uniquely tagged antibody-beads targeting 62 distinct
RBPs across the RNA life cycle, including splicing,
processing, and translation factors (
Figure 1B,
Supplemental Tables 1, 2
). As negative controls, we
included antibodies against epitopes not present in
endogenous human cells (GFP and V5), antibodies
that lack affinity to any epitope (mouse IgG), and
oligonucleotide-labeled beads lacking any antibody
(empty beads)
.
Using these pools, we performed SPIDR on 10 million
UV-crosslinked cells. Focusing on the K562 data (which
were sequenced at greater depth), we generated a
median of 4 oligonucleotide tags per SPIDR cluster
with the majority of clusters (>80%) containing tags
representing only a single antibody type (
Supplemental
Figure 3
), indicating that there is minimal ‘crosstalk’
between beads in a SPIDR experiment. This specificity
enables us to uniquely assign RNA molecules to
their corresponding RBPs. After removing PCR
duplicates, we assigned each sequenced RNA read
to its associated RBP and identified high confidence
binding sites by comparing read coverage across an
RNA to the coverage in all other targets in the pooled
IP (
Supplemental Figure 4, Supplemental Figure
5
; see
Methods
for details). Using this approach, we
detected the precise binding sites for SAF-A, PTBP1,
SPEN, and HNRNPK on the
XIST
RNA
17,20,23
(
Figure
1C
). Although most proteins (38/53 RBPs in K562)
contained more than 2 million mapped RNA reads
(
Supplemental Figure 4
),
we observed specific
Figure 2
A
LSM11
WDR43
NOLC1
SMNDC1
FUS
TAF15
PCBP2
DDX52
RPS3
LARP1
ADAR1
DDX55
ILF3
LARP7
TARDBP
LIN28B
SSB
2.00
10.00
Enrichment
18S
28S
scaRNAs
Terc
tRNAs
45S
5’ETS
3’ETS
ITS1
ITS2
U
7
7SK
U
6
U
1
U
11
U
3
U
4
U
5
U
2
U
12
MIRLET7F1
MIRLET7D
[0-272]
[0-111]
[0-111]
Pool
LIN28B
IgG
U7
[
0-2469]
[
0-1125]
[
0-1125]
Pool
LSM
11
IgG
DGCR8
[0-2544]
[0-170]
[0-52]
[0-52]
Pool
DROSHA
DGCR8
IgG
SPEN
[0-2159]
[0-44]
[0-44]
Pool
SPEN
IgG
T
ARDBP
TARDBP
[0-1127]
[0-62]
[0-62]
Pool
T
ARDBP
IgG
UPF1
[0-468]
[0-59]
[0-59]
Pool
UPF1
IgG
B
D
E
F
LIN28
WDR43
5’ ETS (45S rRNA)
0
300,000
200,000
100,000
Enrichment
C
ADARB1
Pool
ENCODE
SPIDR
Antibody 1
SPIDR
Antibody 2
IgG
hnRNPL
[0-109]
[0-111]
[0- 48]
[0-60]
[0-48]
Figure 2: SPIDR accurately maps binding of a diverse set of RBPs.
(A)
RNA binding patterns of selected RBPs (rows) relative
to 100nt windows across each classical non-coding RNA (columns). Each bin is colored based on the enrichment of read coverage per RBP
relative to background.
(B)
Sequence read coverage for LSM11 binding to U7 snRNA. For all tracks, “pool” refers to all reads prior to splitting
them by paired barcodes (shown in gray), and individual tracks (shown in teal) reflect reads after assignment to specific antibodies.
(C)
Enrichment of read coverage relative to background for WDR43 and LIN28B over the 5’ ETS region of 45S RNA.
(D
) Sequence reads coverage
for LIN28B binding to let-7 miRNAs.
(E)
Sequence reads coverage for DROSHA/DGCR8, UPF1, SPEN, and TARDBP to their respective mRNAs.
(F)
Sequence reads coverage for two distinct antibodies to HNRNPL in a single SPIDR experiment. For comparison, HNRNPL coverage from
the ENCODE-generated eCLIP data is shown (bright green).
.
CC-BY 4.0 International license
available under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which
this version posted June 7, 2023.
;
https://doi.org/10.1101/2023.06.05.543769
doi:
bioRxiv preprint
5
June 6, 2023
binding to known target sites even for RBPs with lower
numbers of reads. For example, SLBP (Stem Loop
Binding Protein) had only 1.5 million mapped reads yet
displayed strong enrichment specifically at the 3’ ends
of histone mRNAs as expected
29
(
Figure 1D
).
To systematically assess the quality, accuracy, and
resolution of our SPIDR binding maps and the scope of
the SPIDR method, we explored several key features:
(i) Accurate mapping of classical RNPs.
We targeted
RBPs of diverse functionality, such as those which
bind preferentially to RNAs coding for proteins and/or
lncRNAs, to introns, exons, miRNAs, etc., as well as
more “classical” ribonuclear protein (RNP) complexes,
such as the ribosome or spliceosome (
Figure 2A
). We
observed precise binding to the expected RNAs and
binding sites. For example, we observed binding of:
LSM11 to the U7 small nuclear RNA (snRNA)
33
and
the telomerase RNA component (TERC)
34
(
Figure 2A
and 2B
).
WDR43, a protein that is involved in ribosomal RNA
(rRNA) processing, to the 45S pre-rRNA and the U3
small nucleolar RNA (snoRNA), which is involved in
rRNA modification
35
(
Figure 2A and 2C
).
LIN28B to a distinct region of the 45S pre-rRNA,
consistent with recent reports of its role in ribosomal
RNA biogenesis in the nucleolus
36
(
Figure 2A and
2C
).
NOLC1 (also known as NOPP140), a protein that
localizes within the nucleolus and Cajal bodies
37,38
,
to both the 45S pre-rRNA (enriched within the
nucleolus) and various small Cajal-body associated
RNAs (scaRNAs) (
Figure 2A
).
DDX52, a DEAD-box protein that is predicted to be
involved in the maturation of the small ribosomal
subunit
39,40
and RPS3, a structural protein contained
within the small ribosomal RNA subunit, to distinct
sites on the 18S rRNA (
Figure 2A
).
FUS and TAF15 to distinct locations on the U1
snRNA
41,42
(
Figure 2A
)
SMNDC1 specifically to the U2 snRNA
43
(
Figure 2A
)
SSB (also known as La protein) binding to tRNA
precursors consistent with its known role in the
biogenesis of RNA Polymerase III transcripts
44,45
(
Figure 2A
).
LIN28B to the let-7 miRNA
46–50
(
Figure 2D)
,
LARP7 binding to 7SK
51
(
Figure 2A
,
Supplemental
Figure 5
).
(ii) Many RBPs bind their own mRNAs to autoregulate
expression levels.
Many RBPs have been reported
to bind their own mRNAs to control their overall
protein levels through post-transcriptional regulatory
feedback
52–54
. For example, SPEN protein binds its
own mRNA to suppress its transcription
55
, UPF1 binds
its mRNA to target it for Nonsense Mediated Decay
56
,
TARDBP binds its own 3’-UTR to trigger an alternative
splicing event that results in degradation of its own
mRNA
57,58
, and DGCR8, which together with DROSHA
forms the known microprocessor complex, binds a
hairpin structure in
DGCR8
mRNA to induce cleavage
and destabilization of the mRNA
59
(
Figure 2E
). In
addition to these cases, we observed autoregulatory
binding of proteins to their own mRNAs for nearly a
third of our targeted RBPs (15 proteins) (
Supplemental
Figure 6
).
(iii)
Different antibodies that capture the same
protein or multiple proteins within the same complex
show similar binding
. We considered the possibility
that including antibodies against multiple proteins
contained within the same complex, or that otherwise
bind to the same RNA, within the same pooled sample
could compete against each other and therefore limit
the utility of large-scale multiplexing. However, we did
not observe this to be the case; in fact, antibodies
against different proteins known to occupy the same
complex displayed highly comparable binding sites on
the same RNAs. For example, DROSHA and DGCR8,
two proteins that bind as part of the microprocessor
complex, showed highly consistent binding patterns
across known miRNA precursors with significant
overlap in their binding sites (odds-ratio of 316-fold,
hypergeometric p-value < 10
-100
). Similarly, when we
included two distinct antibodies targeting the same
protein, HNRNPL, we observed highly comparable
binding profiles for both antibodies (
Figure 2F
) and
significant overlap in defined binding sites (odds-ratio
of 15-fold, hypergeometric p-value < 10
-100
). Taken
together, our results indicate that SPIDR can be used
to map different RBPs that bind to the same RNA
targets and can successfully map multiple antibodies
targeting the same protein. As such, SPIDR may be a
particularly useful tool for directly screening multiple
antibodies targeting the same protein to evaluate utility
for use in CLIP-like studies.
(iv) Transcriptome-wide SPIDR maps are highly
comparable with CLIP.
Because K562 represents the
ENCODE-mapped cell line with the largest number
of eCLIP datasets, we were able to benchmark our
SPIDR results directly to those generated by ENCODE.
To do this, we compared the profiles for each of the
33 RBPs that overlap between SPIDR and ENCODE
datasets in K562 cells
23,28,29
(see
Methods
). We
observed highly overlapping binding patterns for most
RBPs, including: HNRNPK binding to
POLR2A
(
Figure
3A
), PTBP1 binding to
AGO1
(
Figure 3B
), RBFOX2 to
NDEL1
(
Figure 3C
) and the binding of several known
nuclear RBPs to
XIST
(
Figure 3D
). To explore this data
on a global scale, we compared RNA binding sites for
each RBP and observed significant overlap between
.
CC-BY 4.0 International license
available under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which
this version posted June 7, 2023.
;
https://doi.org/10.1101/2023.06.05.543769
doi:
bioRxiv preprint