3D genome organization around nuclear speckles drives mRNA splicing efficiency

January 4, 2023

3D genome organization around nuclear

speckles drives mRNA splicing efficiency

Prashant Bhat

1,2

, Amy Chow

, Benjamin Emert

, Olivia Ettlin

, Sofia A. Quinodoz

1,3

, Yodai Takei

Wesley Huang

, Mario R. Blanco

, Mitchell Guttman

The nucleus is highly organized such that factors involved in transcription and processing of

distinct classes of RNA are organized within specific nuclear bodies. One such nuclear body is

the nuclear speckle, which is defined by high concentrations of protein and non-coding RNA

regulators of pre-mRNA splicing. What functional role, if any, speckles might play in the process

of mRNA splicing remains unknown. Here we show that genes localized near nuclear speckles

display higher spliceosome concentrations, increased spliceosome binding to their pre-mRNAs,

and higher co-transcriptional splicing levels relative to genes that are located farther from nuclear

speckles. We show that directed recruitment of a pre-mRNA to nuclear speckles is sufficient to drive

increased mRNA splicing levels. Finally, we show that gene organization around nuclear speckles

is highly dynamic with differential localization between cell types corresponding to differences in

Pol II occupancy. Together, our results integrate the longstanding observations of nuclear speckles

with the biochemistry of mRNA splicing and demonstrate a critical role for dynamic 3D spatial

organization of genomic DNA in driving spliceosome concentrations and controlling the efficiency

of mRNA splicing.

INTRODUCTION

The nucleus is highly organized such that DNA, RNA

and protein molecules involved in transcription and

processing of distinct RNA classes (e.g., ribosomal

RNA, histone mRNAs, snRNAs, mRNAs) are spatially

organized within or near specific nuclear bodies

[1–5]

(e.g., nucleolus [

6,7]

, histone locus body [

8,9

], Cajal body

[

9–11

], nuclear speckles [

12,13

]). Yet, despite being first

described more than a century ago, the functional roles

of these nuclear bodies remain untested [

14–16

]. In

theory, they could represent structures that are critical

for transcription and/or processing of specialized

classes of RNA [

], or instead they could represent an

emergent property of co-regulation whereby regions

of shared regulation simply self-assemble in three-

dimensional (3D) space [

]. Distinguishing between

these possibilities has proven challenging [

14–16

]

because many of the molecular components contained

within these nuclear bodies serve dual roles – as

catalytic components required for transcription or RNA

processing and as structural components required for

the integrity of these structures [

18–22

To explore this question, we focused on the relationship

between nuclear structure and mRNA splicing. In higher

eukaryotes, most RNA Polymerase II (Pol II) transcribed

genes contain intronic sequences that must be removed

from precursor messenger RNAs (pre-mRNAs) to

generate mature mRNA transcripts [

23,24

]. mRNA

splicing is predominantly co-transcriptional such that

nascent pre-mRNAs are spliced as they are transcribed

[

25–31

]. Incomplete splicing yields mRNAs that are

degraded by nonsense-mediated decay and results in

decreased protein levels [

], and disruption of mRNA

splicing is associated with many human diseases [

]

including cancer [

34–36

], neurodegeneration [

37–40

and immune dysregulation [

41,42

]. Due to this central

importance, splicing needs to be highly efficient to

ensure the fidelity of mRNA and protein production.

Early studies visualizing the localization of mRNA

splicing factors– including proteins (e.g., SRRM1,

SRSF1, SF3a66) and non-coding RNAs (e.g., U1, U2)

[

43,44

] – observed that these factors were not uniformly

distributed throughout the nucleus but instead were

enriched within specific, 3D territories referred to as

1. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena CA 91125, USA

2. David Geffen School of Medicine, University of California, Los Angeles, Los Angeles CA 90095, USA

3. Current address: Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA

* Correspondence: mguttman@caltech.edu

The copyright holder for this preprint

this version posted January 4, 2023.

;

https://doi.org/10.1101/2023.01.04.522632

doi:

bioRxiv preprint

January 4, 2023

nuclear speckles [

45–47

]. Because of the preferential

localization of splicing regulators, nuclear speckles were

initially thought to represent the site of mRNA splicing

in the nucleus [

12,13

]. However, subsequent studies

showed that splicing does not occur within nuclear

speckles, but instead splicing factors diffuse away from

speckles to bind nascent pre-mRNAs and catalyze

the splicing reaction [

48–52

]. These observations led

to the prevailing notion that nuclear speckles simply

act as storage bodies of inactive spliceosomes rather

than functional structures involved in mRNA splicing

[

53–58

]. Accordingly, despite their initial description

over 40 years ago [

45–47

], what functional role, if any,

speckles might play in the process of mRNA splicing

remains unknown [

Recently, we developed genome-wide methods

to explore the higher-order three-dimensional

organization of DNA and RNA in the nucleus [

60–62

Using these and related approaches [

63,64

], we and

others identified that nuclear speckles represent

major structural hubs that organize interchromosomal

contacts corresponding to genomic regions containing

highly transcribed Pol II genes and their associated

nascent pre-mRNAs [

61,62

]. Because co-localizing

splicing factors (enzymes) and their target pre-mRNAs

(substrates) would concentrate splicing factors at the

locations where they must act (nascent pre-mRNA), we

hypothesized that organization of highly transcribed

Pol II genes on the periphery of nuclear speckles

would increase the concentration of spliceosomes at

these nascent pre-mRNAs, thereby increasing their

splicing efficiency. In this way, spatial organization may

act to effectively couple Pol II transcription and mRNA

splicing efficiency. Here we demonstrate an essential

role for 3D organization of genomic DNA in controlling

the efficiency of mRNA splicing.

RESULTS

snRNAs preferentially bind pre-mRNAs of genes

that are close to speckles

To explore DNA localization around the nuclear speckle,

we first computed speckle contacts for all genomic

regions using both genomic (RNA & DNA SPRITE)

[

] and microscopy (seqFISH+) [

] approaches in

mouse embryonic stem (ES) cells. We observed that

DNA regions that exhibit high SPRITE-based speckle

contact frequencies (e.g., Tcf3, Foxj1, and Nrxn2) were

preferentially located adjacent to SF3a66, a protein

marker of nuclear speckles (Figure 1A). Conversely,

DNA regions with low SPRITE-based speckle contact

frequencies on the same chromosomes (e.g., Grik2,

Efemp1, Zfand5) were located farther away from

SF3a66 foci (Figure 1A). Comparing 2,460 paired

genomic regions, we observed that SPRITE-based

speckle contact frequency and DNA distance to

SF3a66 were inversely correlated (r = -0.72), indicating

that SPRITE accurately measures genomic distance

to nuclear speckles (Figure 1B). We refer to genomic

regions with the highest 5% of speckle contact

frequencies as speckle close and those with the lowest

5% as speckle far.

Having defined genome-wide proximity to nuclear

speckles, we explored the localization of the

spliceosome – the molecular machinery that carries

out splicing and consists of U-rich small nuclear

RNAs (snRNAs) and associated proteins [

] – across

the genome. We considered two possible models for

spliceosome association with pre-mRNA. In the direct-

recruitment model, the spliceosome is directly recruited

by either Pol II or the nascent pre-mRNA, which would

result in the spliceosome associating with transcribed

regions proportional to their mRNA levels. Alternatively,

in the speckle-recruitment model, the spliceosome

would accumulate preferentially at nascent pre-mRNAs

that are localized near nuclear speckles.

To test these two models, we mapped the localization

of the U1, U2, U4, and U6 snRNAs across the genome

using RNA & DNA SPRITE (RD-SPRITE, Figure

1C). As expected, these snRNAs are enriched over

genomic DNA regions that are actively transcribed

into pre-mRNA. However, rather than simply reflecting

pre-mRNA levels as would be predicted by the direct-

recruitment model, we observed that regions that

are close to nuclear speckles display ~10-fold higher

enrichment of snRNAs independent of gene expression

levels (Figure 1D, Supplemental Figure 1A-E). For

example, two neighboring genomic regions on mouse

chromosome 7 that are transcribed at comparable

levels, but that are located within a speckle close

and speckle far region display a ~4-fold difference in

snRNA levels (Figure 1E). These results indicate that

spliceosome concentrations are highest at nascent

pre-mRNAs that are in proximity to nuclear speckles.

Because RD-SPRITE utilizes protein-protein

crosslinking (formaldehyde + DSG) to map RNA-DNA

contacts [

], this approach captures associations

that are indirect and therefore may not reflect the

proportion of pre-mRNAs directly engaged by

spliceosomes [

61,62

] (Figure 1C). To measure the

number of spliceosomes that directly bind to nascent

pre-mRNAs, we used psoralen-mediated crosslinking

(which forms covalent crosslinks only between directly

hybridized nucleic acids [

]) to map U1 interactions

with pre-mRNAs (Figure 1F). We previously showed

that this approach is highly specific at mapping U1

binding to 5’ splice sites at exon-intron junctions [

Using this data, we computed the frequency of U1

binding to each pre-mRNA (number of U1 bound RNAs

divided by RNA abundance) and compared U1 binding

frequency to the distance between the nascent locus

and nuclear speckles. We observed ~3-fold higher

The copyright holder for this preprint

this version posted January 4, 2023.

;

https://doi.org/10.1101/2023.01.04.522632

doi:

bioRxiv preprint

January 4, 2023

levels of U1 binding to pre-mRNAs transcribed from

speckle close genes compared to those transcribed

from speckle far genes (Figure 1G).

Together, these results indicate that proximity of

genomic DNA regions to nuclear speckles is associated

with increased concentrations of spliceosomes and

spliceosome engagement on pre-mRNA.

Co-transcriptional splicing efficiency varies based

on proximity to nuclear speckles

Because the efficiency of a reaction is dependent

on substrate and enzyme concentration, we

reasoned that higher concentration of spliceosome

components (enzyme) at pre-mRNAs (substrate)

located proximal to nuclear speckles would lead to

increased co-transcriptional splicing efficiencies (e.g.,

the proportion of spliced products to total mRNA

produced, Figure 2A) relative to pre-mRNAs that are

located farther from the speckle.

To focus on splicing of pre-mRNAs that occurs near the

DNA locus from which it is transcribed (which we refer

to as co-transcriptional splicing), we analyzed nascent

RNA that is associated with chromatin using a stringent

biochemical purification procedure [

68,69

] (Figure 2B).

Using these data, we computed the splicing efficiency

for each gene by taking the ratio of spliced reads relative

to total pre-mRNA reads (spliced counts + unspliced

counts) (Figure 2A). Overall, we observed that genes

that were located closest to nuclear speckles showed

a >2-fold higher splicing ratio compared to genes that

are farthest from nuclear speckles (41.0% vs 19.1%)

(Figure 2C-D). More generally, we observed a strong

correlation between speckle contact frequency and

splicing efficiency (r=0.92, p<0.0001, Figure 2E).

To further validate this effect and exclude the possibility

that the observed splicing differences might reflect

mature mRNA in our biochemical purification, we used

an orthogonal method to measure mRNA levels on

chromatin. Specifically, we used RD-SPRITE to analyze

splicing ratios of RNAs [

] exclusively when they were

associated with the DNA of their own nascent locus

Figure 1: snRNAs preferentially bind pre-mRNAs of genes that are close to speckles

(A)

Three reconstructed images for DNA seqFISH+ and immunofluorescence (SF3A66) in mouse ES cells comparing speckle close genes (Tcf3, Foxj1, Nrxn2 in blue) and

speckle far genes (Grik2, Efemp1, Zfand5 in purple) (top). Images are maximum intensity z-projected for 1 μm section. White lines represent nuclear segmentation. Scale

bars in zoom out panels are 5

m and zoom in panels are 2.5

m. Speckle contact frequencies from SPRITE for chromosomes 10, 11, and 19 at 100-kb resolution (bottom).

Zoom in, speckle contact frequencies from SPRITE for the 2 Mb region around genes shown in top.

(B)

Genome-wide comparison of DNA seqFISH+ distance to exterior of

speckle (

m) and SPRITE speckle hub contact frequency for 2460 paired genomic regions. Pearson r correlation is -0.72.

(C)

Schematic of types of RNA-DNA interactions

captured by SPRITE. Formaldehyde and DSG crosslink nucleic acids and proteins to each other and SPRITE can measure the number, type (DNA or RNA), and sequence of

molecules within each crosslinked complex.

(D)

Normalized density of U1, U2, U4, U6 snRNAs on speckle close versus speckle far genomic regions. Normalization for each

snRNA is to the mode of the speckle far distribution to visualize all snRNA densities on the same scale. RPKM for both speckle far and close genes is thresholded between

2.5-7.5.

(E)

Whole chromosome 7 view of SPRITE contact frequencies at 1-Mb resolution for speckle hub, U1, U2, U4 and U6 snRNAs. Pol II-S2P ChIP-seq density at 100-kb

resolution.

(F)

Schematic of direct RNA-RNA interactions capture by AMT RAP RNA67. Psoralen forms direct crosslinks between RNA-RNA hybrids, affinity purification (not

shown) selectively captures U1 snRNA, and all directly hybridized pre-mRNAs.

(G)

U1 snRNA density from AMT RAP RNA for speckle close versus speckle far regions.

The copyright holder for this preprint

this version posted January 4, 2023.

;

https://doi.org/10.1101/2023.01.04.522632

doi:

bioRxiv preprint

January 4, 2023

(Figure 2B). We then computed splicing efficiency as

the fraction of exons over the total number of exons

and introns. Consistent with the chromatin RNA-Seq

data, we observed ~3 fold higher splicing in speckle-

close (16.1%) to speckle-far (5.5%) regions (Figure 2F).

Furthermore, we observed a strong correlation between

the splicing efficiency per gene and its speckle contact

frequency (r=0.91, p<0.0001; Figure 2G).

Together, these results indicate that the pre-mRNA

splicing efficiency is highest for speckle-associated

genes and that this splicing efficiency is achieved while

the pre-mRNA is bound at its nascent locus.

pre-mRNA organization around nuclear speckles is

sufficient to drive increased mRNA splicing

Because genes differ in multiple ways beyond their

nuclear speckle proximity (e.g., gene length, alternative

splicing patterns, and sequence-specific features), it

remains possible that the observed increase in splicing

efficiency is due to other gene-specific or genomic

DNA features (e.g., chromatin structure) that might also

correlate with speckle proximity.

To directly test whether speckle proximity drives

splicing efficiency, we designed a splicing reporter that

can be directly recruited to nuclear speckles, allowing

us to measure its splicing efficiency within individual

cells. Specifically, we generated a reporter that

produces an mRNA that is translated into GFP when

spliced, but not when unspliced (Figure 3A). Increased

GFP signal reflects increased reporter splicing and

can be quantitatively measured within each cell via

a fluorescence readout (Figure 3A). In the intron of

this reporter, we embedded an MS2 bacteriophage

RNA hairpin that binds with high affinity to the MS2

bacteriophage coat protein (MCP) [

]. We used this

system to localize the pre-mRNA reporter to specific

nuclear locations by co-expressing the splicing reporter

together with specific MCP-fusion proteins that are

known to localize at different locations within the

nucleus (Figure 3B). Specifically, we expressed SRRM1

and SRSF1, two proteins that localize within nuclear

speckles [

]. SRRM1 is primarily localized in nuclear

speckles (punctate), while SRSF1 exhibits both speckle

(punctate) and nucleoplasmic (diffuse) localization. As

controls, we expressed several non-speckle proteins,

including SRSF3 and SRSF9 (two splicing proteins

that are not enriched within nuclear speckles but are

localized throughout the nucleoplasm [

73,74

]) and LBR

(a protein that is anchored in the nuclear membrane

Figure 2: Co-transcriptional splicing efficiency varies based on proximity to nuclear speckles

(A)

Nascent RNA splicing efficiency calculation. Splicing efficiency of a gene is calculated by taking the ratio of exon to total pre-mRNA counts from RNA sequencing (exons

+ introns).

(B)

Schematic of nascent RNA sequencing and SPRITE methods used to measure splicing efficiency.

(C)

SPRITE speckle hub contact frequency for a 20-Mb

region on chromosome 8 (top). Nascent RNA coverage from chromatin RNA sequencing for a speckle far (Nae1) and speckle close (Aars) gene around a single 3’splice site

(bottom). Percent spliced across entire gene is 27% (Nae1) and 56% (Aars).

(D)

Density plot of percent spliced for genes located within speckle close or speckle far 100-kb

genomic regions (461 speckle close genes and 460 speckle far genes).

(E)

SPRITE speckle hub contact frequency (x axis) and percent spliced for genes from nascent RNA

sequencing within each bin (y axis) across 50 bins. Each point/bin contains at least 20 genes and reflects the average splicing for that bin. Pearson r correlation = 0.92.

(F)

Density plot of percent spliced within 100-kb genomic intervals from SPRITE for speckle close and speckle far regions (312 speckle close and 311 speckle far 100-kb

regions).

(G)

SPRITE speckle hub contact frequency (x axis) and percent spliced within genomic bins from SPRITE (y axis) across 50 bins. Each point/bin contains at least

20 regions and reflects the average splicing for that bin. Pearson r correlation = 0.91.

The copyright holder for this preprint

this version posted January 4, 2023.

;

https://doi.org/10.1101/2023.01.04.522632

doi:

bioRxiv preprint

January 4, 2023

and associates with the transcriptionally inactive

nuclear lamina [

]).

We transfected each of these proteins fused to MCP

and mCherry (to directly visualize localization) and,

using fluorescence microscopy, confirmed that each

protein localized in the nucleus as expected (Figure

3B, Supplemental Figure 2A-E). We observed that

SRRM1-MCP co-localized with endogenous SC35, a

well-characterized marker of nuclear speckles (Figure

3C), while SRSF3 and SRSF9 localized diffusively

throughout the nucleus and LBR localized to the

periphery of the nucleus (Figure 3B, Supplemental Figure

2A-E). Next, we confirmed that the MS2-containing

reporter RNA co-localized along with the MCP fusion

protein using RNA FISH coupled with fluorescence

microscopy of mCherry (Figure 3D-4E). We observed

that the MS2-RNA localizes within nuclear speckles

when co-expressed with SRRM1-MCP and localizes

at the nuclear periphery when co-expressed with

LBR-MCP. As expected, cells that express higher

Figure 3: pre-mRNA organization around nuclear speckles drives splicing efficiency

(A)

Schematic of pre-mRNA splicing assay via a fluorescence based read out. Individual proteins of interest are mCherry-tagged (shown) or without (not shown) an MCP

tag. MCP protein binds to the complementary MS2 stem loop embedded within the intron of the pre-mRNA reporter. GFP is expressed only when the reporter is spliced

and measured via FACS.

(B)

Schematic of specific nuclear locations (speckle, nuclear periphery, nucleoplasm, top) and mCherry fluorescence of their corresponding

proteins (SRRM1, SRSF1; LBR; SRSF3, SRSF9, bottom). Nucleus is outlined in white. Scale bar is 5

(C)

Fluorescence microscopy for mCherry-SRRM1 (top left).

co-immunofluorescence for SC35 (top middle), and merge (top right). Scale bar is 5

(D)

Localization of SRRM1+MCP with mCherry reporter and single-molecule RNA

FISH. Nucleus is outlined in white. Scale bars, 5

m (top). GFP levels (x axis) versus fluorescence intensity (levels) of SRRM1 (y axis) (bottom). Error bars are S.E.M for three

replicates.

(E)

Localization of LBR+MCP with mCherry reporter and single-molecule RNA FISH. Nucleus is outlined in white. Scale bars, 5

m (top). GFP levels (x axis) versus

fluorescence intensity (levels) of LBR (y axis) (bottom). Error bars are S.E.M for three replicates.

(F)

Difference of GFP expression between constructs with MCP and no MCP (y

axis) versus mCherry fluorescence intensity (x axis) for all constructs tested. Error bars are S.E.M for three replicates.

(G)

Fluorescence microscopy for mCherry-SRRM1-∆NS

(bottom left). co-immunofluorescence for SC35 (bottom middle), and merge (bottom right). Error bars are S.E.M for three replicates. Scale bar is 5

(H)

Difference of

GFP expression between SRRM1 full length and SRRM1 ∆NS constructs with MCP and no MCP (y axis) versus mCherry fluorescence intensity (x axis). Error bars are S.E.M.

The copyright holder for this preprint

this version posted January 4, 2023.

;

https://doi.org/10.1101/2023.01.04.522632

doi:

bioRxiv preprint