of 22
1
January 4, 2023
3D genome organization around nuclear
speckles drives mRNA splicing efficiency
Prashant Bhat
1,2
, Amy Chow
1
, Benjamin Emert
1
, Olivia Ettlin
1
, Sofia A. Quinodoz
1,3
, Yodai Takei
1
,
Wesley Huang
1
, Mario R. Blanco
1
, Mitchell Guttman
1*
The nucleus is highly organized such that factors involved in transcription and processing of
distinct classes of RNA are organized within specific nuclear bodies. One such nuclear body is
the nuclear speckle, which is defined by high concentrations of protein and non-coding RNA
regulators of pre-mRNA splicing. What functional role, if any, speckles might play in the process
of mRNA splicing remains unknown. Here we show that genes localized near nuclear speckles
display higher spliceosome concentrations, increased spliceosome binding to their pre-mRNAs,
and higher co-transcriptional splicing levels relative to genes that are located farther from nuclear
speckles. We show that directed recruitment of a pre-mRNA to nuclear speckles is sufficient to drive
increased mRNA splicing levels. Finally, we show that gene organization around nuclear speckles
is highly dynamic with differential localization between cell types corresponding to differences in
Pol II occupancy. Together, our results integrate the longstanding observations of nuclear speckles
with the biochemistry of mRNA splicing and demonstrate a critical role for dynamic 3D spatial
organization of genomic DNA in driving spliceosome concentrations and controlling the efficiency
of mRNA splicing.
INTRODUCTION
The nucleus is highly organized such that DNA, RNA
and protein molecules involved in transcription and
processing of distinct RNA classes (e.g., ribosomal
RNA, histone mRNAs, snRNAs, mRNAs) are spatially
organized within or near specific nuclear bodies
[1–5]
(e.g., nucleolus [
6,7]
, histone locus body [
8,9
], Cajal body
[
9–11
], nuclear speckles [
12,13
]). Yet, despite being first
described more than a century ago, the functional roles
of these nuclear bodies remain untested [
14–16
]. In
theory, they could represent structures that are critical
for transcription and/or processing of specialized
classes of RNA [
2
], or instead they could represent an
emergent property of co-regulation whereby regions
of shared regulation simply self-assemble in three-
dimensional (3D) space [
17
]. Distinguishing between
these possibilities has proven challenging [
14–16
]
because many of the molecular components contained
within these nuclear bodies serve dual roles – as
catalytic components required for transcription or RNA
processing and as structural components required for
the integrity of these structures [
18–22
].
To explore this question, we focused on the relationship
between nuclear structure and mRNA splicing. In higher
eukaryotes, most RNA Polymerase II (Pol II) transcribed
genes contain intronic sequences that must be removed
from precursor messenger RNAs (pre-mRNAs) to
generate mature mRNA transcripts [
23,24
]. mRNA
splicing is predominantly co-transcriptional such that
nascent pre-mRNAs are spliced as they are transcribed
[
25–31
]. Incomplete splicing yields mRNAs that are
degraded by nonsense-mediated decay and results in
decreased protein levels [
32
], and disruption of mRNA
splicing is associated with many human diseases [
33
]
including cancer [
34–36
], neurodegeneration [
37–40
],
and immune dysregulation [
41,42
]. Due to this central
importance, splicing needs to be highly efficient to
ensure the fidelity of mRNA and protein production.
Early studies visualizing the localization of mRNA
splicing factors– including proteins (e.g., SRRM1,
SRSF1, SF3a66) and non-coding RNAs (e.g., U1, U2)
[
43,44
] – observed that these factors were not uniformly
distributed throughout the nucleus but instead were
enriched within specific, 3D territories referred to as
1. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena CA 91125, USA
2. David Geffen School of Medicine, University of California, Los Angeles, Los Angeles CA 90095, USA
3. Current address: Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
* Correspondence: mguttman@caltech.edu
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint
this version posted January 4, 2023.
;
https://doi.org/10.1101/2023.01.04.522632
doi:
bioRxiv preprint
2
January 4, 2023
nuclear speckles [
45–47
]. Because of the preferential
localization of splicing regulators, nuclear speckles were
initially thought to represent the site of mRNA splicing
in the nucleus [
12,13
]. However, subsequent studies
showed that splicing does not occur within nuclear
speckles, but instead splicing factors diffuse away from
speckles to bind nascent pre-mRNAs and catalyze
the splicing reaction [
48–52
]. These observations led
to the prevailing notion that nuclear speckles simply
act as storage bodies of inactive spliceosomes rather
than functional structures involved in mRNA splicing
[
53–58
]. Accordingly, despite their initial description
over 40 years ago [
45–47
], what functional role, if any,
speckles might play in the process of mRNA splicing
remains unknown [
59
].
Recently, we developed genome-wide methods
to explore the higher-order three-dimensional
organization of DNA and RNA in the nucleus [
60–62
].
Using these and related approaches [
63,64
], we and
others identified that nuclear speckles represent
major structural hubs that organize interchromosomal
contacts corresponding to genomic regions containing
highly transcribed Pol II genes and their associated
nascent pre-mRNAs [
61,62
]. Because co-localizing
splicing factors (enzymes) and their target pre-mRNAs
(substrates) would concentrate splicing factors at the
locations where they must act (nascent pre-mRNA), we
hypothesized that organization of highly transcribed
Pol II genes on the periphery of nuclear speckles
would increase the concentration of spliceosomes at
these nascent pre-mRNAs, thereby increasing their
splicing efficiency. In this way, spatial organization may
act to effectively couple Pol II transcription and mRNA
splicing efficiency. Here we demonstrate an essential
role for 3D organization of genomic DNA in controlling
the efficiency of mRNA splicing.
RESULTS
snRNAs preferentially bind pre-mRNAs of genes
that are close to speckles
To explore DNA localization around the nuclear speckle,
we first computed speckle contacts for all genomic
regions using both genomic (RNA & DNA SPRITE)
[
62
] and microscopy (seqFISH+) [
64
] approaches in
mouse embryonic stem (ES) cells. We observed that
DNA regions that exhibit high SPRITE-based speckle
contact frequencies (e.g., Tcf3, Foxj1, and Nrxn2) were
preferentially located adjacent to SF3a66, a protein
marker of nuclear speckles (Figure 1A). Conversely,
DNA regions with low SPRITE-based speckle contact
frequencies on the same chromosomes (e.g., Grik2,
Efemp1, Zfand5) were located farther away from
SF3a66 foci (Figure 1A). Comparing 2,460 paired
genomic regions, we observed that SPRITE-based
speckle contact frequency and DNA distance to
SF3a66 were inversely correlated (r = -0.72), indicating
that SPRITE accurately measures genomic distance
to nuclear speckles (Figure 1B). We refer to genomic
regions with the highest 5% of speckle contact
frequencies as speckle close and those with the lowest
5% as speckle far.
Having defined genome-wide proximity to nuclear
speckles, we explored the localization of the
spliceosome – the molecular machinery that carries
out splicing and consists of U-rich small nuclear
RNAs (snRNAs) and associated proteins [
65
] – across
the genome. We considered two possible models for
spliceosome association with pre-mRNA. In the direct-
recruitment model, the spliceosome is directly recruited
by either Pol II or the nascent pre-mRNA, which would
result in the spliceosome associating with transcribed
regions proportional to their mRNA levels. Alternatively,
in the speckle-recruitment model, the spliceosome
would accumulate preferentially at nascent pre-mRNAs
that are localized near nuclear speckles.
To test these two models, we mapped the localization
of the U1, U2, U4, and U6 snRNAs across the genome
using RNA & DNA SPRITE (RD-SPRITE, Figure
1C). As expected, these snRNAs are enriched over
genomic DNA regions that are actively transcribed
into pre-mRNA. However, rather than simply reflecting
pre-mRNA levels as would be predicted by the direct-
recruitment model, we observed that regions that
are close to nuclear speckles display ~10-fold higher
enrichment of snRNAs independent of gene expression
levels (Figure 1D, Supplemental Figure 1A-E). For
example, two neighboring genomic regions on mouse
chromosome 7 that are transcribed at comparable
levels, but that are located within a speckle close
and speckle far region display a ~4-fold difference in
snRNA levels (Figure 1E). These results indicate that
spliceosome concentrations are highest at nascent
pre-mRNAs that are in proximity to nuclear speckles.
Because RD-SPRITE utilizes protein-protein
crosslinking (formaldehyde + DSG) to map RNA-DNA
contacts [
60
], this approach captures associations
that are indirect and therefore may not reflect the
proportion of pre-mRNAs directly engaged by
spliceosomes [
61,62
] (Figure 1C). To measure the
number of spliceosomes that directly bind to nascent
pre-mRNAs, we used psoralen-mediated crosslinking
(which forms covalent crosslinks only between directly
hybridized nucleic acids [
66
]) to map U1 interactions
with pre-mRNAs (Figure 1F). We previously showed
that this approach is highly specific at mapping U1
binding to 5’ splice sites at exon-intron junctions [
67
].
Using this data, we computed the frequency of U1
binding to each pre-mRNA (number of U1 bound RNAs
divided by RNA abundance) and compared U1 binding
frequency to the distance between the nascent locus
and nuclear speckles. We observed ~3-fold higher
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint
this version posted January 4, 2023.
;
https://doi.org/10.1101/2023.01.04.522632
doi:
bioRxiv preprint
3
January 4, 2023
levels of U1 binding to pre-mRNAs transcribed from
speckle close genes compared to those transcribed
from speckle far genes (Figure 1G).
Together, these results indicate that proximity of
genomic DNA regions to nuclear speckles is associated
with increased concentrations of spliceosomes and
spliceosome engagement on pre-mRNA.
Co-transcriptional splicing efficiency varies based
on proximity to nuclear speckles
Because the efficiency of a reaction is dependent
on substrate and enzyme concentration, we
reasoned that higher concentration of spliceosome
components (enzyme) at pre-mRNAs (substrate)
located proximal to nuclear speckles would lead to
increased co-transcriptional splicing efficiencies (e.g.,
the proportion of spliced products to total mRNA
produced, Figure 2A) relative to pre-mRNAs that are
located farther from the speckle.
To focus on splicing of pre-mRNAs that occurs near the
DNA locus from which it is transcribed (which we refer
to as co-transcriptional splicing), we analyzed nascent
RNA that is associated with chromatin using a stringent
biochemical purification procedure [
68,69
] (Figure 2B).
Using these data, we computed the splicing efficiency
for each gene by taking the ratio of spliced reads relative
to total pre-mRNA reads (spliced counts + unspliced
counts) (Figure 2A). Overall, we observed that genes
that were located closest to nuclear speckles showed
a >2-fold higher splicing ratio compared to genes that
are farthest from nuclear speckles (41.0% vs 19.1%)
(Figure 2C-D). More generally, we observed a strong
correlation between speckle contact frequency and
splicing efficiency (r=0.92, p<0.0001, Figure 2E).
To further validate this effect and exclude the possibility
that the observed splicing differences might reflect
mature mRNA in our biochemical purification, we used
an orthogonal method to measure mRNA levels on
chromatin. Specifically, we used RD-SPRITE to analyze
splicing ratios of RNAs [
70
] exclusively when they were
associated with the DNA of their own nascent locus
Figure 1: snRNAs preferentially bind pre-mRNAs of genes that are close to speckles
(A)
Three reconstructed images for DNA seqFISH+ and immunofluorescence (SF3A66) in mouse ES cells comparing speckle close genes (Tcf3, Foxj1, Nrxn2 in blue) and
speckle far genes (Grik2, Efemp1, Zfand5 in purple) (top). Images are maximum intensity z-projected for 1 μm section. White lines represent nuclear segmentation. Scale
bars in zoom out panels are 5
μ
m and zoom in panels are 2.5
μ
m. Speckle contact frequencies from SPRITE for chromosomes 10, 11, and 19 at 100-kb resolution (bottom).
Zoom in, speckle contact frequencies from SPRITE for the 2 Mb region around genes shown in top.
(B)
Genome-wide comparison of DNA seqFISH+ distance to exterior of
speckle (
μ
m) and SPRITE speckle hub contact frequency for 2460 paired genomic regions. Pearson r correlation is -0.72.
(C)
Schematic of types of RNA-DNA interactions
captured by SPRITE. Formaldehyde and DSG crosslink nucleic acids and proteins to each other and SPRITE can measure the number, type (DNA or RNA), and sequence of
molecules within each crosslinked complex.
(D)
Normalized density of U1, U2, U4, U6 snRNAs on speckle close versus speckle far genomic regions. Normalization for each
snRNA is to the mode of the speckle far distribution to visualize all snRNA densities on the same scale. RPKM for both speckle far and close genes is thresholded between
2.5-7.5.
(E)
Whole chromosome 7 view of SPRITE contact frequencies at 1-Mb resolution for speckle hub, U1, U2, U4 and U6 snRNAs. Pol II-S2P ChIP-seq density at 100-kb
resolution.
(F)
Schematic of direct RNA-RNA interactions capture by AMT RAP RNA67. Psoralen forms direct crosslinks between RNA-RNA hybrids, affinity purification (not
shown) selectively captures U1 snRNA, and all directly hybridized pre-mRNAs.
(G)
U1 snRNA density from AMT RAP RNA for speckle close versus speckle far regions.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint
this version posted January 4, 2023.
;
https://doi.org/10.1101/2023.01.04.522632
doi:
bioRxiv preprint
4
January 4, 2023
(Figure 2B). We then computed splicing efficiency as
the fraction of exons over the total number of exons
and introns. Consistent with the chromatin RNA-Seq
data, we observed ~3 fold higher splicing in speckle-
close (16.1%) to speckle-far (5.5%) regions (Figure 2F).
Furthermore, we observed a strong correlation between
the splicing efficiency per gene and its speckle contact
frequency (r=0.91, p<0.0001; Figure 2G).
Together, these results indicate that the pre-mRNA
splicing efficiency is highest for speckle-associated
genes and that this splicing efficiency is achieved while
the pre-mRNA is bound at its nascent locus.
pre-mRNA organization around nuclear speckles is
sufficient to drive increased mRNA splicing
Because genes differ in multiple ways beyond their
nuclear speckle proximity (e.g., gene length, alternative
splicing patterns, and sequence-specific features), it
remains possible that the observed increase in splicing
efficiency is due to other gene-specific or genomic
DNA features (e.g., chromatin structure) that might also
correlate with speckle proximity.
To directly test whether speckle proximity drives
splicing efficiency, we designed a splicing reporter that
can be directly recruited to nuclear speckles, allowing
us to measure its splicing efficiency within individual
cells. Specifically, we generated a reporter that
produces an mRNA that is translated into GFP when
spliced, but not when unspliced (Figure 3A). Increased
GFP signal reflects increased reporter splicing and
can be quantitatively measured within each cell via
a fluorescence readout (Figure 3A). In the intron of
this reporter, we embedded an MS2 bacteriophage
RNA hairpin that binds with high affinity to the MS2
bacteriophage coat protein (MCP) [
71
]. We used this
system to localize the pre-mRNA reporter to specific
nuclear locations by co-expressing the splicing reporter
together with specific MCP-fusion proteins that are
known to localize at different locations within the
nucleus (Figure 3B). Specifically, we expressed SRRM1
and SRSF1, two proteins that localize within nuclear
speckles [
22
,
72
]. SRRM1 is primarily localized in nuclear
speckles (punctate), while SRSF1 exhibits both speckle
(punctate) and nucleoplasmic (diffuse) localization. As
controls, we expressed several non-speckle proteins,
including SRSF3 and SRSF9 (two splicing proteins
that are not enriched within nuclear speckles but are
localized throughout the nucleoplasm [
73,74
]) and LBR
(a protein that is anchored in the nuclear membrane
Figure 2: Co-transcriptional splicing efficiency varies based on proximity to nuclear speckles
(A)
Nascent RNA splicing efficiency calculation. Splicing efficiency of a gene is calculated by taking the ratio of exon to total pre-mRNA counts from RNA sequencing (exons
+ introns).
(B)
Schematic of nascent RNA sequencing and SPRITE methods used to measure splicing efficiency.
(C)
SPRITE speckle hub contact frequency for a 20-Mb
region on chromosome 8 (top). Nascent RNA coverage from chromatin RNA sequencing for a speckle far (Nae1) and speckle close (Aars) gene around a single 3’splice site
(bottom). Percent spliced across entire gene is 27% (Nae1) and 56% (Aars).
(D)
Density plot of percent spliced for genes located within speckle close or speckle far 100-kb
genomic regions (461 speckle close genes and 460 speckle far genes).
(E)
SPRITE speckle hub contact frequency (x axis) and percent spliced for genes from nascent RNA
sequencing within each bin (y axis) across 50 bins. Each point/bin contains at least 20 genes and reflects the average splicing for that bin. Pearson r correlation = 0.92.
(F)
Density plot of percent spliced within 100-kb genomic intervals from SPRITE for speckle close and speckle far regions (312 speckle close and 311 speckle far 100-kb
regions).
(G)
SPRITE speckle hub contact frequency (x axis) and percent spliced within genomic bins from SPRITE (y axis) across 50 bins. Each point/bin contains at least
20 regions and reflects the average splicing for that bin. Pearson r correlation = 0.91.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint
this version posted January 4, 2023.
;
https://doi.org/10.1101/2023.01.04.522632
doi:
bioRxiv preprint
5
January 4, 2023
and associates with the transcriptionally inactive
nuclear lamina [
75
]).
We transfected each of these proteins fused to MCP
and mCherry (to directly visualize localization) and,
using fluorescence microscopy, confirmed that each
protein localized in the nucleus as expected (Figure
3B, Supplemental Figure 2A-E). We observed that
SRRM1-MCP co-localized with endogenous SC35, a
well-characterized marker of nuclear speckles (Figure
3C), while SRSF3 and SRSF9 localized diffusively
throughout the nucleus and LBR localized to the
periphery of the nucleus (Figure 3B, Supplemental Figure
2A-E). Next, we confirmed that the MS2-containing
reporter RNA co-localized along with the MCP fusion
protein using RNA FISH coupled with fluorescence
microscopy of mCherry (Figure 3D-4E). We observed
that the MS2-RNA localizes within nuclear speckles
when co-expressed with SRRM1-MCP and localizes
at the nuclear periphery when co-expressed with
LBR-MCP. As expected, cells that express higher
Figure 3: pre-mRNA organization around nuclear speckles drives splicing efficiency
(A)
Schematic of pre-mRNA splicing assay via a fluorescence based read out. Individual proteins of interest are mCherry-tagged (shown) or without (not shown) an MCP
tag. MCP protein binds to the complementary MS2 stem loop embedded within the intron of the pre-mRNA reporter. GFP is expressed only when the reporter is spliced
and measured via FACS.
(B)
Schematic of specific nuclear locations (speckle, nuclear periphery, nucleoplasm, top) and mCherry fluorescence of their corresponding
proteins (SRRM1, SRSF1; LBR; SRSF3, SRSF9, bottom). Nucleus is outlined in white. Scale bar is 5
μ
m.
(C)
Fluorescence microscopy for mCherry-SRRM1 (top left).
co-immunofluorescence for SC35 (top middle), and merge (top right). Scale bar is 5
μ
m.
(D)
Localization of SRRM1+MCP with mCherry reporter and single-molecule RNA
FISH. Nucleus is outlined in white. Scale bars, 5
μ
m (top). GFP levels (x axis) versus fluorescence intensity (levels) of SRRM1 (y axis) (bottom). Error bars are S.E.M for three
replicates.
(E)
Localization of LBR+MCP with mCherry reporter and single-molecule RNA FISH. Nucleus is outlined in white. Scale bars, 5
μ
m (top). GFP levels (x axis) versus
fluorescence intensity (levels) of LBR (y axis) (bottom). Error bars are S.E.M for three replicates.
(F)
Difference of GFP expression between constructs with MCP and no MCP (y
axis) versus mCherry fluorescence intensity (x axis) for all constructs tested. Error bars are S.E.M for three replicates.
(G)
Fluorescence microscopy for mCherry-SRRM1-∆NS
(bottom left). co-immunofluorescence for SC35 (bottom middle), and merge (bottom right). Error bars are S.E.M for three replicates. Scale bar is 5
μ
m.
(H)
Difference of
GFP expression between SRRM1 full length and SRRM1 ∆NS constructs with MCP and no MCP (y axis) versus mCherry fluorescence intensity (x axis). Error bars are S.E.M.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint
this version posted January 4, 2023.
;
https://doi.org/10.1101/2023.01.04.522632
doi:
bioRxiv preprint