Articles
https://doi.org/10.1038/s41587-021-01006-2
1
Wellcome-Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK.
2
Department of Physiology, Development
and Neuroscience, University of Cambridge, Cambridge, UK.
3
Epigenetics Programme, Babraham Institute, Cambridge, UK.
4
Cancer Research UK
Cambridge Institute, University of Cambridge, Cambridge, UK.
5
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome
Genome Campus, Cambridge, UK.
6
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
7
Developmental
Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
8
Department of Physiology, Anatomy and
Genetics, University of Oxford, Oxford, UK.
9
Department of Haematology, University of Cambridge, Cambridge, UK.
10
Sahlgrenska Center for Cancer
Research, Department of Microbiology and Immunology, University of Gothenburg, Gothenburg, Sweden.
11
The Francis Crick Institute, London, UK.
12
The Wellcome/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK.
13
Department of Applied Mathematics and Theoretical
Physics, Centre for Mathematical Sciences, University of Cambridge, Cambridge, UK.
14
Centre for Trophoblast Research, University of Cambridge,
Cambridge, UK.
15
Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
16
Present address: Genomics Plc, Cambridge, UK.
17
These authors contributed equally: T. Lohoff, S. Ghazanfar.
✉
e-mail: wolf.reik@babraham.ac.uk; jn270@cam.ac.uk; lcai@caltech.edu; marioni@ebi.ac.uk
L
ineage priming, cell fate specification and tissue patterning
during early mammalian development are complex processes
involving signals from surrounding tissues, mechanical con
-
straints, and transcriptional and epigenetic changes, which together
prompt the adoption of unique cell fates
1–7
. All these factors play
key roles in gastrulation, the process by which the three germ lay
-
ers emerge and the body axis is established. Subsequently, the germ
layer progenitors, formed during gastrulation, will give rise to all
major organs in a process known as organogenesis.
Recently, scRNA-seq and other single-cell genomic approaches
have been used to investigate how the molecular landscape of cells
within the mouse embryo changes during early development. These
methods have provided insights into how symmetry breaking
of the epiblast population leads to commitment to different fates
as the embryo passes through gastrulation and on to organogen
-
esis
1–3,6–14
. By computationally ordering cells through their differen
-
tiation (‘pseudotime’), an understanding of the molecular changes
that underpin cell-type development has been obtained, provid
-
ing insight into the underlying regulatory mechanisms, includ
-
ing the role of the epigenome. Recently, technological advances
have enabled scRNA-seq to be performed alongside CRISPR–Cas9
scarring, thus simultaneously documenting a cell’s molecular state
and lineage. Such approaches have been applied to track zebraf
-
ish development
15–17
and more recently mouse embryogenesis
9,18
.
Together, these experimental strategies have enhanced our under
-
standing of developmental lineage relationships and the associated
molecular changes.
However, to date, single-cell genomics studies of early mam
-
malian development have focused on profiling dissociated popula
-
tions of cells, where spatial information is lost. Although regions of
the embryo have been microdissected and profiled using small cell
number RNA-sequencing protocols, these approaches neither scale
to later stages of development nor do they provide single-cell reso
-
lution, which may be critical given the role of local environmen
-
tal cues in conditioning cell fate and patterning
8,13,19
. By contrast,
in situ hybridization, single-molecule RNA FISH (smFISH) and
other related approaches allow gene expression levels to be mea
-
sured within a defined spatial context. However, these approaches
are typically limited to either quantifying expression patterns in
broad domains
20,21
or to studying a limited number of genes, thus
precluding the generation of comprehensive cell resolution maps of
expression across an entire embryo. Recent technological advances
Integration of spatial and single-cell
transcriptomic data elucidates mouse
organogenesis
T. Lohoff
1,2,3,17
, S. Ghazanfar
4,17
, A. Missarova
4,5
, N. Koulena
6
, N. Pierson
6
, J. A. Griffiths
4,16
,
E. S. Bardot
7
, C.-H. L. Eng
6
, R. C. V. Tyser
8
, R. Argelaguet
3,5
, C. Guibentif
1,9,10
, S. Srinivas
8
,
J. Briscoe
11
, B. D. Simons
1,12,13
, A.-K. Hadjantonakis
7
, B. Göttgens
1,9
, W. Reik
1,3,14,15
✉
,
J. Nichols
1,2
✉
, L. Cai
6
✉
and J. C. Marioni
4,5,15
✉
Molecular profiling of single cells has advanced our knowledge of the molecular basis of development. However, current
approaches mostly rely on dissociating cells from tissues, thereby losing the crucial spatial context of regulatory processes.
Here, we apply an image-based single-cell transcriptomics method, sequential fluorescence in situ hybridization (seqFISH), to
detect mRNAs for 387 target genes in tissue sections of mouse embryos at the 8–12 somite stage. By integrating spatial context
and multiplexed transcriptional measurements with two single-cell transcriptome atlases, we characterize cell types across
the embryo and demonstrate that spatially resolved expression of genes not profiled by seqFISH can be imputed. We use this
high-resolution spatial map to characterize fundamental steps in the patterning of the midbrain–hindbrain boundary (MHB)
and the developing gut tube. We uncover axes of cell differentiation that are not apparent from single-cell RNA-sequencing
(scRNA-seq) data, such as early dorsal–ventral separation of esophageal and tracheal progenitor populations in the gut tube.
Our method provides an approach for studying cell fate decisions in complex tissues and development.
NATuRE BIo
TECHNoL
oG
y
| VOL 40 |
JANUARY 2022 | 74–85 |
www.nature.com/naturebiotechnology
74
Articles
NAtuRE BIOtEcHNOlOgy
promise to overcome these limitations; approaches that exploit
highly multiplexed RNA FISH
22–27
, that perform sequencing on
intact tissues
28–30
, or that hybridize tissue sections to spatially bar
-
coded microarrays
31,32
promise to simultaneously profile the expres
-
sion of hundreds or thousands of genes within single cells whose
spatial location is preserved.
Here, using an existing scRNA-seq atlas covering stages of
mouse development from gastrulation to early organogenesis
6
(‘Gastrulation atlas’), we designed probes against a panel of 387
genes and spatially localized their expression in multiple embryo
sections at the 8–12 somite stage (ss) using a version of the seq
-
FISH method modified to allow highly effective cell segmentation.
Assigning each cell in the seqFISH-profiled embryos a distinct
cell-type identity revealed different patterns of colocalization of
cells within and between cell types. Integrating scRNA-seq and
seqFISH data enabled the genome-wide imputation of expression,
thus generating a complete quantitative and spatially resolved map
of gene expression at single-cell resolution across the entire embryo.
To illustrate the power of this resource, we used these imputed
data to perform a virtual dissection of the midbrain and hindbrain
region of the embryo, uncovering spatially resolved patterns of
expression associated with both the dorsal–ventral and rostral–cau
-
dal axes. Finally, by integrating a second independent scRNA-seq
dataset that characterized cell types within the developing gut tube
2
,
we resolved the position of two clusters of cells that were both previ
-
ously assigned a lung precursor identity using the scRNA-seq data
2
.
Our spatial data revealed that these two clusters were exclusively
located on either the dorsal or ventral side of the gut tube, with cor
-
responding transcriptional differences indicating that the dorsal
cells give rise to the esophagus, while the ventral cells give rise to
the lung and trachea.
Results
Single-cell spatial expression of mouse organogenesis.
We per
-
formed seqFISH
10,11
on sagittal sections from three mouse embryos
at the 8–12 ss, corresponding to embryonic day (E)8.5–8.75
(Fig.
1a–c). The sections analyzed were chosen to correspond as
closely as possible to the midline of the embryo, albeit some varia
-
tion along the left–right axis could be observed due to embryo
tilt (Fig. 1b). Notably, we observed in embryo 2 considerable
tilt of the tail region, suggesting depletion of mesodermal and
tail-specific populations. In each section, we probed the expres
-
sion of 351 barcoded genes specifically chosen to distinguish
distinct cell types at these developmental stages (Extended Data
Fig. 1 and Supplementary Tables 1 and 2). To do this, we exploited
a recently published single-cell molecular map of mouse gastrula
-
tion and early organogenesis
6
and determined computationally a
set of lowly expressed to moderately expressed genes that were best
able to recover the cell-type identities (Methods and Extended Data
Fig. 1). Lowly expressed to moderately expressed genes were selected
because low overall expression of the library is needed to reduce the
optical density of detected transcripts in a cell so that crowding does
not prevent single mRNA spots from being resolved reliably.
To obtain a good signal-to-noise ratio for the mRNA spots, we
performed tissue clearing to reduce the tissue background signal,
as introduced before
25,33
. Briefly, the tissue sections were embedded
into a hydrogel scaffold, RNA molecules were crosslinked into the
hydrogel and lipid and protein were removed to achieve optimal
tissue transparency for seqFISH (Methods). One consequence of
depleting proteins is that delineating the cell membrane, and hence
cell segmentation, becomes challenging. To address this, before tis
-
sue embedding, we performed immunodetection for selected sur
-
face antigens, pan-cadherin, N-cadherin,
β
-catenin and E-cadherin,
which could in turn be recognized by a secondary antibody con
-
jugated to a unique DNA sequence. We then hybridized a tertiary
probe to the DNA sequence of the secondary antibody, which had
a unique smFISH readout sequence and an acrydite group. The
acrydite group becomes crosslinked into the hydrogel scaffold and
remains in position, even after protein degradation
34
. The unique
smFISH readout sequence can subsequently be hybridized with a
readout probe conjugated to a fluorophore, allowing the cell mem
-
brane to be visualized (Fig.
1d) and enabling segmentation using the
interactive learning and cell segmentation tool Ilastik
35
. To validate
this strategy, we applied it to a 10-
μ
m thick transverse section of
an E8.5 mouse embryo, which confirmed labeling of the cell mem
-
brane (Fig. 1e and Extended Data Fig. 2). Before imaging samples
for seqFISH, overall RNA integrity was examined by ensuring colo
-
calization of two
Eef2
probe sets, each detected by a unique readout
probe conjugated to a different fluorophore (Extended Data Fig. 2
and Supplementary Tables 1 and 3).
Following imaging, the resulting data were segmented as detailed
above, and individual mRNA molecules were detected by decoding
barcodes over the multiple rounds of imaging. To guarantee high
sample quality, the first round of hybridization was repeated follow
-
ing all intervening hybridization rounds, allowing for consistency
of mRNA signal intensity to be assessed (Supplementary Fig. 1). In
total, following cell-level quality control, we identified 57,536 cells
across three embryos with a combined total of 11,004,298 individ
-
ual mRNA molecules detected. In the embryo tissue sections, each
cell contained an average of 196
±
19.3 (mean
±
s.e.) mRNA tran
-
scripts from 93.2
±
6.6 (mean
±
s.e.) genes (Supplementary Fig. 2),
corresponding to an average of 26.6% of all genes profiled. The
set of genes expressed was not biased toward a specific germ layer,
and an average of 21.0%
±
1.1% (mean
±
s.e.) of genes most associ
-
ated with a mesoderm identity in the E8.5 Gastrulation atlas was
expressed per seqFISH cell, 25.9%
±
2.1% of genes were associated
Fig. 1 |
Single-cell spatial transcriptomics map of mouse organogenesis using seqFISH.
a
, Illustration of 8–12 ss mouse embryo. Dotted lines indicate the
estimated position of the sagittal tissue section shown in
b
; D, dorsal; V, ventral; R, right; L, left; A, anterior; P, posterior.
b
, Tile scan of a 20-
μ
m sagittal
section of three independently sampled 8–12 ss embryos stained with nuclear dye DAPI (white). Red boxes indicate the selected field of view (FOV)
imaged using seqFISH.
c
, Illustration of the experimental overview for spatial transcriptomics using seqFISH for 351 selected genes in 16 sequential
rounds of hybridization and 12 non-barcoded sequential smFISH hybridization rounds for 36 genes. For each targeted gene, 17–48 unique probes were
used to capture the mRNA; UMAP, uniform manifold approximation and projection.
d
, Cell segmentation strategy using a combination of E-cadherin
(E-cad), N-cadherin (N-cad), pan-cadherin (Pan-cad) and
β
-catenin antibody (AB; green) staining detected by an oligo-conjugated anti-mouse
IgG secondary antibody (orange) that gets recognized by a tertiary probe sequence. The acrydite group (blue star) of the tertiary probe (blue) gets
crosslinked into a hydrogel scaffold and stays in place even after protein removal during tissue clearing. The cell segmentation labeling can be read by a
fluorophore-conjugated readout probe (red); AB1, antibody 1; AB2, antibody 2.
e
, Cell segmentation staining of a 10-
μ
m thick transverse section of an E8.5
mouse embryo using the strategy introduced in
d
. Cell segmentation signal was used to generate a cell segmentation mask using Ilastik (right). This was
repeated independently for all
N
=
3 embryos with similar results.
f
, Representative visualization of normalized log expression counts of 12 selected genes
measured by seqFISH to validate performance. This experiment was repeated independently for all
N
=
3 embryos with similar results.
g
, Highly resolved
‘digital in situ
’
of the cardiomyocyte marker titin (
Ttn
),
Tbx5
,
Cdh5
and
Dlk1
, colored in red, cyan, green and orange, respectively. Dots represent individually
detected mRNA spots, and the box represents an area that was magnified for better visualization. This experiment was repeated independently for all
N
=
3 embryos with similar results.
NATuRE BIo
TECHNoL
oG
y
| VOL 40 |
JANUARY 2022 | 74–85 |
www.nature.com/naturebiotechnology
75
Articles
NAtuRE BIOtEcHNOlOgy
A
P
A
P
250
μ
m
250
μ
m
250
μ
m
A
L
R
P
A
D
V
P
Brain
Heart
Allantois
Gut tube
Somites
Embryo 1
Embryo 2
Embryo 3
A
P
D
V
D
V
D
V
a
b
c
d
e
50
μ
m
50
μ
m
β
-catenin
E-cad
N-cad
Pan-cad
Embryo 1
Embryo 2
Embryo 3
Read-out
probe
Fluorophore
Hydrogel
Tertiary
probe
AB2
AB1
Targeted
protein
UMAP
Physical
Downstream analysis
Image processing
Cell
segmentation
mRNA dot
calling
Imaging
Tissue samples
Embryo 3
Embryo 3
Embryo 2
Embryo 2
Embryo 1
Embryo 1
Sample preparation
seqFISH
×16
×12
smFISH
28–48×
17–27×
A
B
C
D
E
E
E
E
W
a
s
h
H
y
b
r
i
d
i
z
e
S
t
r
i
p
I
m
a
g
e
W
a
s
h
H
y
b
r
i
d
i
z
e
S
t
r
i
p
I
m
a
g
e
Ttn
Pou3f1
Otx2
Gata5
Six3
Lhx2
Cldn4
Hand1
Foxf1
Foxa1
Sox2
Popdc2
f
g
High
Low
250
μ
m
50
μ
m
50
μ
m
Ttn Tbx5 Cdh5 Dlk1
A
P
D
V
NATuRE BIo
TECHNoL
oG
y
| VOL 40 |
JANUARY 2022 | 74–85 |
www.nature.com/naturebiotechnology
76
Articles
NAtuRE BIOtEcHNOlOgy
with the endoderm, 28.6%
±
1.3% of genes were identified as extra
-
embryonic and 31.6%
±
3.3% (mean
±
s.e.) of genes were associated
with the ectoderm.
Next, to confirm the quality of our data, we examined the expres
-
sion of 12 genes (Fig. 1f
) with well-characterized expression pat
-
terns. As expected, the cardiomyocyte markers
Ttn
36
and
Popdc2
(ref.
37
) showed the highest expression in the region of the develop
-
ing heart tube, while
Hand1
(refs.
38,39
) and
Gata5
(ref.
40
) showed
expression in the heart as well as the more posterior lateral plate
mesoderm. Similarly, the expression of four known brain mark
-
ers,
Six3
(ref.
41
),
Lhx2
(ref.
42
),
Otx2
(refs.
43–45
) and
Pou3f1
(ref.
46
)
was strongest in the developing brain. Turning to genes that mark
broader territories, the neural tube marker
Sox2
showed strong
expression in the brain and along the dorsal side of the embryo
47,48
.
Additionally, expression of the mesoderm marker
Foxf1
was local
-
ized to mesodermal cells outlining the developing gut tube, the
lateral plate mesoderm and the extraembryonic mesoderm of the
allantois
49
. Lastly, two gut endoderm markers
Foxa1
(ref.
50
) and
Cldn4
(refs.
51,52
) marked the developing gut tube along the ante
-
rior–posterior axis of the embryo. The tissue-specific expression
profile of these genes was consistent with both the Gastrulation
atlas
6
(Supplementary Fig. 2) as well as the broad expression terri
-
tories defined in the EMAGE database
20
. As a further confirmation
of the quality of our data, we confirmed the positional expression
profiles of the measured Hox gene family members, which fol
-
lowed the described ‘Hox code’ along the anterior–posterior axis
53,54
(Supplementary Fig. 3). Finally, the high-resolution of seqFISH
allows for visualization of mRNA molecules at subcellular resolu
-
tion, enabling the generation of high-quality digital in situ images
(Fig.
1g). Taken together, these analyses demonstrate that we can
reliably record the expression profiles of hundreds of genes across
an entire embryo cross-section at single-cell resolution.
Cell-type identity and spatial transcriptional heterogeneity.
Thus far, we have focused on the expression of individual genes.
However, the real power of the data derives from the ability to study
coexpression of hundreds of genes within their spatial context. To
develop this potential, as a first step, we assigned each cell within
the seqFISH-profiled embryos a distinct cell-type identity using
cell-type mapping. To make this assignment, we integrated each
cell’s expression profile from seqFISH with the E8.5 cells from the
Gastrulation atlas
6
using batch-aware dimension reduction and
mutual nearest neighbors (MNN) batch correction
55
(Extended
Data Fig. 3) before annotating seqFISH cells based on their near
-
est neighbors in the Gastrulation atlas (Fig. 2a and Extended Data
Fig. 3). We further manually refined this automated cell-type clas
-
sification using a cell type’s anatomical location and by perform
-
ing joint clustering of both datasets and comparing their relative
cell-type contribution and gene expression profiles (Extended Data
Fig. 3 and Methods). The assigned cell-type identities were consis
-
tent with known anatomy as well as with the expression of distinct
marker genes (Figs. 1f and 2b,c and Supplementary Figs. 4–6).
As an alternative, we performed direct clustering of the seqFISH
data, which revealed similar groupings of cells (Extended Data Fig. 4),
indicating that a small number of carefully chosen genes can pro
-
vide enough information to accurately group cells. However, we
note that assigning cell-type identity using only a small number
of marker genes is likely to be less reliable than inferring identity
through reference to the Gastrulation atlas. Indeed, upon per
-
forming a further simulation on randomly selected subsets of the
seqFISH gene panel, we observed decreasing cell-type recovery
accuracy, more so for the imaging data than for the Gastrulation
atlas or even for independent wild-type (WT) chimera control
scRNA-seq cells (Methods and Supplementary Fig. 7), suggesting
that it may be prudent to select more cell-type marker genes than
would be suggested by computational analysis of scRNA-seq data.
Next, to study when boundaries between emerging tissue
compartments are established in the developing embryo, we
statistically quantified whether cells assigned to the same type
were spatially coherent within the embryo and determined the
extent to which pairs of cell types were colocated (Fig. 2d,e and
Methods). We used a permutation strategy to evaluate the relative
enrichment or depletion of direct cell–cell contact events between
each cell type resulting in a cell–cell contact map (Fig. 2d and
Extended Data Fig. 5). Certain cell types, such as cardiomyocytes
and the gut tube, were spatially and morphologically distinct,
while others, like the endothelium, were interspersed and spread
across the entire embryo space.
More generally, while most cell types are characterized using
prior knowledge of expression markers and lineage inference, other
populations, such as the mixed mesenchymal mesoderm, represent
a cell state rather than a defined cell type. Mesenchyme represents
a state in which cells express markers characteristic of migratory
cells loosely dispersed within an extracellular matrix
56
. This strong
overriding transcriptional signature of mesenchyme, irrespective of
location, makes it challenging to distinguish which cell types this
mixed mesenchymal mesoderm population represents using clas
-
sical scRNA-seq data. By contrast, our integrated spatial expression
map allowed us to resolve five transcriptionally distinct subpopula
-
tions (clusters 1–5) that were spatially defined (Extended Data Fig. 6
and Methods).
Based on its anatomical position overlaying the developing
heart, we infer that cluster 1 reflects cells with a cardiac mesoderm
and pericardium identity. Clusters 2 and 3 are located in the septum
transversum, in the region of the forming hepatic plate and proepi
-
cardium. At this developmental stage, bone morphogenetic protein
(BMP) signaling from the developing heart and fibroblast growth
factor (FGF) signaling from the septum transversum mesenchyme
are critical for the induction of hepatic fate specification in the fore
-
gut
57,58
. Consistent with this, we observed enrichment for BMP sig
-
naling in cluster 1 (Extended Data Fig. 6). Additionally, in cluster
3, we observed the coexpression of proepicardial markers
Tbx18
and
Wt1
(refs.
59,60
) whose deletion results in heart
61
and liver
62
defects (Extended Data Fig. 6). Our ability to spatially map clus
-
ter 3 revealed its position caudal to the forming heart, correspond
-
ing with the known location of the proepicardium. Together, their
location and expression profiles indicate that the cells from clus
-
ters 2 and 3 will contribute to the hepatic mesenchyme (important
for hepatoblast specification) and the proepicardium, respectively.
Lastly, clusters 4 and 5 are located toward the body wall, suggesting
a somatic mesoderm identity that will contribute to the dermis
63
.
To characterize additional spatially driven transcriptional het
-
erogeneity, we used a linear model to identify genes that show
a strong spatial expression pattern within each cell type (Fig. 2e
,
Supplementary Table 4 and Methods). This indicated that residual
transcriptional heterogeneity in the forebrain/midbrain/hindbrain
cluster can be explained by localized patterns of expression, most
likely resulting from the presence of regionally specific develop
-
ing brain subtypes (Supplementary Table 5). To investigate this, we
performed a focused reclustering of forebrain/midbrain/hindbrain
cells, recovering four major brain subregions and seven subclusters
(Fig.
2f,g). Cross-referencing spatial location and underlying gene
expression signatures allowed us to identify subclusters associated
with the prosencephalon, mesencephalon, rhombencephalon and
the tegmentum (Fig. 2g,h and Extended Data Fig. 5).
A 10,000-plex spatial map of inferred gene expression.
By design,
our seqFISH library allowed us to probe the expression of specific
genes associated with cell-type identity. Additionally, we directly
measured the expression of a number of genes associated with key
signaling cascades, for example, Notch
64
a n d Wnt
65
. Nevertheless,
a full, unbiased view of the interplay between a cell’s spatial
NATuRE BIo
TECHNoL
oG
y
| VOL 40 |
JANUARY 2022 | 74–85 |
www.nature.com/naturebiotechnology
77
Articles
NAtuRE BIOtEcHNOlOgy
a
Gastrulation atlas
UMAP2
seqFISH
b
UMAP1
250
μ
m
c
d
Neural crest
Neural crest
Dermomyotome
Cranial mesoderm
Anterior somitic tissues
Sclerotome
Endothelium
Erythroid
Hematoendothelial progenitors
Intermediate mesoderm
Allantois
Lateral plate mesoderm
Surface ectoderm
Mixed mesenchymal mesoderm
Splanchnic mesoderm
Splanchnic mesoderm
Presomitic mesoderm
Definitive endoderm
NMP
Spinal cord
Spinal cord
Cardiomyocytes
Forebrain/midbrain/hindbrain
Forebrain/midbrain/hindbrain
Gut tube
Blood progenitors
Dermomyotome
Cranial mesoderm
Anterior somitic tissues
Sclerotome
Endothelium
Mesenchyme
Erythroid
Hematoendothelial progenitors
Intermediate mesoderm
Allantois
Lateral plate mesoderm
Surface ectoderm
Presomitic mesoderm
Definitive endoderm
NMP
Cardiomyocytes
Gut tube
e
f
g
250
μ
m
h
50
μ
m
Integrated
Segregated
Rhombencephalon 1
Rhombencephalon 1
Rhombencephalon 1
Mesencephalon
Mesencephalon
Rhombencephalon 2
Rhombencephalon 2
Rhombencephalon 2
Rhombencephalon 3
Rhombencephalon 3
Rhombencephalon 3
Prosencephalon 1
Prosencephalon 1
Prosencephalon 2
Mesencephalon
Prosencephalon 1
Prosencephalon 2
Prosencephalon 2
Tegmentum
Tegmentum
Tegmentum
120
90
60
30
0
t
–statistic
Integrated
Segregated
Neural crest
Splanchnic mesoderm
Spinal cord
Forebrain/midbrain/hindbrain
Blood progenitors
Dermomyotome
Cranial mesoderm
Definitive endoderm
Caudal mesoderm
Anterior somitic tissues
Sclerotome
Endothelium
Mixed mesenchymal mesoderm
ExE endoderm
Erythroid
Hematoendothelial progenitors
Intermediate mesoderm
Allantois
Lateral plate mesoderm
Surface ectoderm
Presomitic mesoderm
NMP
Cardiomyocytes
Gut tube
Ectoderm
Mesoderm
Endoderm
A
P
D
V
250
μ
m
Fig. 2 |
Cell-type annotation and neighborhood characterization.
a
, Projection of seqFISH spatial and Gastrulation atlas cells in joint reduced dimensional
space to annotate seqFISH cells based on their nearest neighbors in the mouse Gastrulation atlas.
b
, Real position of annotated seqFISH cells in an embryo
tissue section. Colors represent refined cell-type classification; ExE endoderm, extraembryonic endoderm; NMP, neuromesodermal progenitor.
c
, Cell-type
maps separated by the three germ layers (ectoderm, mesoderm and endoderm).
d
, Cell–cell contact map displaying the relative enrichment toward
integration and segregation of pairs of cell types in space. Cell types are clustered by their relative integration with others.
e
, Violin plots showing the
t
-statistic for each gene and cell type corresponding to a measure of the degree of residual transcriptional heterogeneity explained by space.
f
, Reclustering
of forebrain/midbrain/hindbrain cell types into seven spatially distinct clusters.
g
, Zoom in of the brain region to visualize four major brain regions and
seven subclusters identified in
f
.
h
, Cell–cell contact map of brain subclusters in space, ordered roughly anatomically from hindbrain to forebrain.
NATuRE BIo
TECHNoL
oG
y
| VOL 40 |
JANUARY 2022 | 74–85 |
www.nature.com/naturebiotechnology
78
Articles
NAtuRE BIOtEcHNOlOgy
location and its molecular profile and how this influences devel
-
opment would benefit from measuring expression of the entire
transcriptome, which is not straightforward with existing highly
multiplexed RNA FISH protocols.
To overcome these limitations, we built upon the MNN map
-
ping approach (Fig. 2
and Extended Data Fig. 3) and inferred the
full transcriptome of each seqFISH cell by considering the weighted
expression profile of the cells to which it is most transcriptionally
similar to in the Gastrulation atlas (Fig.
3a, Extended Data Fig. 7 and
Methods). To test the integrity of this strategy, for each gene probed
in our seqFISH experiment (excluding
Xist
, as it is sex specific),
we used the remaining 349 measured genes to map all cells to the
Gastrulation atlas and imputed the expression of the withheld gene.
To evaluate performance, we calculated for each gene and across
all cells the Pearson correlation (‘performance score’) between the
imputed expression counts and the measured seqFISH expression
levels. To estimate an upper bound on the performance score (that is,
the maximum correlation we might expect to observe), we exploited
the four independent batches of E8.5 cells that were processed in the
scRNA-seq Gastrulation atlas. We treated one of the four batches
as the query set and used the leave-one-out approach described
above to impute the expression of the 350 genes of interest by map
-
ping cells onto a reference composed of the remaining three batches
before computing the Pearson correlation between the imputed and
true expression counts (‘prediction score’; Methods). Computing
the ratio of the performance (seqFISH–scRNA-seq) and prediction
(scRNA-seq–scRNA-seq) scores yields a normalized performance
score. Across genes, we observed a median normalized performance
score of 0.73 (lower quartile, 0.32; upper quartile, 1.09) (Extended
Data Fig. 7), suggesting that our ability to infer gene expression is
comparable to what might be expected when combining independent
scRNA-seq datasets. While we observed a high level of consistency
among the independently captured genes, we identified a subset of
genes that did not perform as well (Methods). These nine genes were
either lowly or rarely expressed in the independent smFISH data or
were variably expressed between replicates (Extended Data Fig. 7 and
Supplementary Table 6). Consequently, care must be taken in inter
-
preting imputed expression patterns for such genes.
To further validate our imputation strategy, we used
non-barcoded sequential smFISH to measure the expression of
36 additional genes in the embryo sections probed by seqFISH
and contrasted the true expression profile with the imputed values
(Fig.
3b). This independent validation (these smFISH genes were
not used in the MNN mapping) confirmed that imputation reli
-
ably recovered gene expression profiles (Fig. 3b and Supplementary
Figs. 8–12). For example, we observed a strong overlap between
measured and imputed expression for
Dlx5
(ref.
66
), an essential
and spatially restricted regulator of craniofacial structures, in the
anterior surface ectoderm and first branchial arch. Additionally, we
noted that
Tmem54
was inferred to be specifically expressed in the
anterior surface ectoderm and along the gut tube,
Nkx2-5
(refs.
67,68
)
was inferred to be expressed in the developing heart, and
Mesp1
was
inferred to be expressed in the posterior presomitic mesoderm
69,70
.
Finally, the ubiquitous expression profile of
Basp1
and the absence
of expression of the germ line marker
Utf1
(ref.
71
) was also recapitu
-
lated in the imputed expression maps.
Reconstruction of MHB formation.
To illustrate the utility of the
imputed data, we focused on a well-described developmental pro
-
cess that takes place at this embryonic stage, the formation of the
MHB, also known as the isthmus organizer. The MHB acts as a sig
-
naling hub that is essential for patterning of the adjacent midbrain
and hindbrain regions by inducing two distinct transcriptional
programs via defined signaling cascades (reviewed in
72–
74
). Thus,
the MHB presents an important dividing point in the develop
-
ing brain, functioning both as a signaling center and as a physical
barrier of the developing brain ventricles
75
. We observed expres
-
sion of the mesencephalon and prosencephalon marker
Otx2
(refs.
43,
76
) and the rhombencephalon marker
Gbx2
(refs.
76,
77
) in
the brain region of all three embryos, albeit the sagittal section for
embryo 2 appeared to capture this region most comprehensively
(Supplementary Fig. 13). Focusing on this region of embryo 2,
we used the expression of
Gbx2
and
Otx2
to identify the precise
boundary between the two subclusters (Fig.
3c,d). Subsequently,
we virtually dissected the
Otx2
-positive midbrain region and the
Gbx2
-positive hindbrain region (Supplementary Fig. 13) and
performed a differential expression analysis (using the imputed
expression profiles) to identify additional genes that distinguish
the two regions (Fig.
3e). This identified 66 genes (false discov
-
ery rate (FDR)-adjusted
P
value of
<
0.05; absolute log fold change
(LFC)
>
0.2) with spatially distinct expression profiles between the
two regions (Supplementary Table 7).
To further understand the spatial distribution of gene expres
-
sion at the MHB, we investigated whether further local differences
in spatial expression patterns were present. Using a diffusion-based
transcriptional embedding
78
, we observed smoothness of the esti
-
mated diffusion components in physical space, with an extreme cor
-
responding to the MHB itself (Fig. 3f,g and Methods). Using a spatial
vector field to capture local magnitude and direction of changes in
DC1 in space, we observed an outward radiation of signaling gra
-
dients from the MHB region, corresponding to the rostral–caudal
axis (Fig. 3g), with strong enrichment for
Lmo1
(ref.
79
) in the mid
-
brain and
Pax8
(ref.
80
) in the hindbrain (Fig.
3i). Additionally, we
observed that DC2 corresponds to an emerging dorsal–ventral axis
(Fig.
3h), demonstrating that the coordinate space of the brain is
established at this stage of development.
Fig. 3 |
Creating and using a 10,000-plex spatial map.
a
, Schematic representation of the imputation strategy.
b
, Independent validation of imputation
performance by comparing normalized gene expression profiles of selected genes measured by smFISH with the corresponding imputed gene expression
profiles.
c
, Visualization of brain subclusters in embryo 2 and virtual dissection of the MHB, highlighted by the red rectangle and inset zoom; C, caudal;
R, rostral; D, dorsal; V, ventral.
d
, ‘Digital in situ’ showing detected mRNA molecules of a mesencephalon and prosencephalon marker
Otx2
(orange dots)
and a rhombencephalon marker
Gbx2
(purple dots) to identify the MHB; scale bar, 50
μ
m.
e
, MA (log ratio and mean average) plot showing differential
gene expression analysis using a two-sample
t
-test between the virtually dissected hindbrain region (orange; 48 genes significantly upregulated; absolute
LFC
>
0.2, FDR-adjusted
P
value of
<
0.05) and virtually dissected midbrain region (purple; 18 genes significantly upregulated; absolute LFC
>
0.2,
FDR-adjusted
P
value of
<
0.05) using the imputed transcriptome.
f
, Diffusion pseudotime analysis of the virtually dissected region to understand the
dynamics of gene expression at the MHB. The scatter plot of diffusion-based embedding of virtually dissected cells displays diffusion components (DC) 1
and 2. Cell colors correspond to inferred diffusion pseudotime.
g
, Spatial graph showing virtually dissected cells colored by inferred diffusion pseudotime
dominated by DC1. Arrow sizes correspond to the magnitude of change of the pseudotime value within the region in the direction from large to small
pseudotime values. The highest pseudotime values are observed along the MHB region, smoothly diffusing outward to the midbrain and hindbrain regions.
h
, Spatial graph showing virtually dissected cells colored by DC2. Arrow sizes correspond to the magnitude of change of the DC2 value within the region.
The most extreme DC2 values are observed perpendicular to the MHB region, smoothly diffusing outward to the floor plate and roof plate regions.
i
, Visualization of normalized log expression counts of important regulators of midbrain/hindbrain formation. Gene names shown in red font indicate
imputed expression, while gene names shown in black font indicate measured expression.
NATuRE BIo
TECHNoL
oG
y
| VOL 40 |
JANUARY 2022 | 74–85 |
www.nature.com/naturebiotechnology
79