1
Highly multiplexed spatially resolved gene expression profiling of mouse organogenesis
T.
Lohoff
1,2,3
*
, S. Ghazanfar
4
*
, A. Missarova
4,5
‡, N. Koulena
6
‡, N. Pierson
6
‡, J.
A.
Griffiths
4^
,
E.
S.
Bardot
7
,
C
.
-
H.
L. Eng
6
, R.C.V.
Tyser
8
,
R. Argelaguet
5
, C. Guibent
i
f
1
,9,10
, S. Sri
ni
vas
8
, J.
Briscoe
11
, B.
D.
Simons
1,12,13
,
A.
-
K. Hadjantonakis
7
, B. G
ö
ttgens
1,9
, W
.
Reik
1,3,14,15
†
, J.
Nichols
1,2
†
, L. Cai
6
†
, J.
C.
Marioni
4,5,15
†
*
1
Wellcome
-
Medical Research Council Cambridge Stem Cell Institute, University of Cambridge,
Cam
b
r
idge, UK
2
Department of Physiology, Development and Neuroscience, University of Cambridge,
Cambridge, UK
3
Epigenetics Programme, Babraham Institute, Cambridge, UK
4
Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
5
Europe
a
n
Molecular Biology Laboratory,
E
u
ropean Bioinformatics Institute, Wellcome
Genome Campus, Cambridge, UK
6
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena,
CA, USA
7
D
evelopmental Biology Program, Sloan Ketteri
n
g
Institute, Memorial Sloan Kette
r
ing Cancer
Center, New York, NY, USA
8
Department of Physiology Anatomy and Genetics, University of Oxford, Oxford, UK
9
Department of Haematology, University of Cambridge, Cambridge, UK
10
Sahlgrenska Cancer Center, Depa
r
t
ment of Microbiology and Immunol
o
gy, University of
Gothenburg, Gothenburg, Sweden
11
The Francis Crick Institute, London NW1 1AT, UK
12
The Wellcome/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge,
UK
13
Cavendish Laboratory, Depa
r
t
ment of Physics, University of C
a
mbridge, Cambridge, UK
14
Centre for Trophoblast Research, University of Cambridge, Cambridge, UK
15
Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted November 21, 2020.
;
https://doi.org/10.1101/2020.11.20.391896
doi:
bioRxiv preprint
2
†
Correspondence to:
W
.
R.:
wolf.reik@bab
raham.ac.uk
J.N.:
jn270@cam.ac.uk
L
.
C
.
:
lcai@caltech.edu
J
.
C
.
M
.
:
marioni@ebi.ac.uk
*
The authors cont
r
ibute equally
‡ The authors con
tribute equally
^
Current address: Genomics Plc, 50
-
60 Station Road, Cambridge, CB1 2JH
, UK
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted November 21, 2020.
;
https://doi.org/10.1101/2020.11.20.391896
doi:
bioRxiv preprint
3
Abstract
Transcript
ional and epigenetic
profiling
of
single
-
cells has
advanced our knowledge
of the
molecular bases of gastrulat
i
o
n and early organogenesis. Howe
ver, current approaches rely on
dissociating
cells from
tissues, thereby
losing
the
cru
cial spatial context
that
is
necessary for
understanding cell and tissue interactions during development
. Here, we apply an image
-
based
s
i
ngle
-
cell transcriptomics metho
d, seqFISH, to simultaneously and precisely
detect
mRNA
molecules for
387
selected targ
et genes in 8
-
12 somite stage mouse embryo tissue sections.
By
integrating
spatial context
and
highly multiplexed transcriptional measur
e
m
ents
with two single
-
cell
trans
criptome
atlases
we
accurately
characterize cell types across the embryo and
demonstrat
e
how spatially
-
resolved expression of genes not profiled by seqFISH can be imputed.
We use this high
-
resolution spatial map to
characte
r
i
ze
fundamental
steps in the
pat
terning
of the
midbrain
-
hindbrain boundary and the developing gut tube.
O
ur spatial atl
as
uncovers axes of
resolution that are not apparent from single
-
cell RNA sequencing data
–
for example, in the gut
tube we
observe earl
y
dorsal
-
ventral separation of es
ophageal and tracheal progenitor
populations.
In sum, by computationally integrating hi
gh
-
resolution spatially
-
resolved gene
expression maps with single
-
cell genomics data, we
provide
a powerful new
approach
for
studying ho
w
and when cell fate decisions ar
e made during early mammalian development.
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted November 21, 2020.
;
https://doi.org/10.1101/2020.11.20.391896
doi:
bioRxiv preprint
4
Int
roduction
L
ineage
priming
,
cell fate specification
and tissue patterning
during early mammalian
developm
ent
are
complex process
es
involving signals from surrounding tissues, me
chanical
constraints, and transcriptional and epigenetic changes
,
which together
prompt the ad
o
pt
ion of
unique cell fates
1
–
7
.
All of these factors play ke
y roles in
gastrulation, the process by
which
the
three germ layers
emerge
,
and the
body axis is
established
. S
ubsequently, the germ layer
progenitors, formed during gastrulat
i
on,
will
give r
ise to all major
organs in a process known as
organogenesis.
Rec
ently, single
-
cell RNA
-
sequencing (scRN
A
-
seq) and other single
-
cell genomic approaches
have been used to invest
igate how the molecular landscape of cells within the mouse embr
y
o
change
s
durin
g early developme
nt
. In particular, these methods
have
provid
ed
i
nsights into how
symmetry breaking of t
he epiblast population lead
s
to commitment
to different fates as the
emb
ryo passes through gastrulation and on to organogenesis
1
–
3,6
–
14
.
By computation
ally ordering
cel
ls
through th
eir differentiation
(“
pseudotime
”)
, an understanding of the molecular chan
ges
that underpin cell type development
has
be
en
obtained,
providing
ins
ight into the underlying
regulatory mechanisms, including the ro
l
e of the epigen
ome.
Recently,
te
chnological
a
dvances
have
enabled
scRNA
-
seq
to be performed
alongside CRISPR/Cas9 scarr
ing,
t
hus
simultaneously
document
ing
a
cell’s molecular state and lineag
e. Such approaches have been applied to track
zebrafish developm
e
nt
15
–
17
a
nd more recently mouse embryogenesis
9,18
. Together, the
se
experimental strategies have enhanced our understanding of developmen
tal
lineage
relationships
and the
associated
molecular chang
es.
However,
t
o da
te,
single
-
cell
g
enomics
studies of
early
mammalian
development have focused
on profiling dissociated po
pulations of cells
,
where spatial information is lost. Although regions
of the embryo have been micro
-
dissected and profiled using s
mall
cell
-
number RNA
-
sequencing
proto
cols, these approaches neither scale to later stages of development
,
where tens of thou
sands
of cells are present within an embryo
,
nor do they
yet
provide sin
gle
-
cell resolution, which
may
be
critical given the
role of
loc
a
l environmental
cues in conditio
ning cell fate and patterning
at
these developmental stages
13,19,20
.
By contrast,
in situ
hybridi
z
ation
,
s
ingle
-
mo
lecule RNA FISH
and other related approaches allow gene expr
essi
o
n levels to be
measured
within a
defined
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted November 21, 2020.
;
https://doi.org/10.1101/2020.11.20.391896
doi:
bioRxiv preprint
5
spatial context.
However, these approaches are typically limited to either
qua
ntifying
expression
patterns
in broad domains
21,22
or
to
studying
a limit
ed number
of genes in an experiment
, thus
precluding generation of comprehensive cell
-
r
esolution maps of expression across an entire
embryo, which is key for u
nderstanding complex processes such as gastrulation and
organoge
n
esis.
Recent
te
chnological advan
ces promise to overcome these limitations
:
approaches that exploit highly
-
multiplexed R
NA FISH
23
–
28
,
sequencing on intact tissues
29
–
31
, or
that h
ybridize tiss
ue sections to spatially
-
barcoded microarrays
32,33
promise to simultaneously
profile the expression of hundreds or thousands of genes within single cells whose spatial
location is preserved.
Here,
u
sing an exist
i
ng scRNA
-
seq at
las coveri
n
g stages of mouse de
velopment from
gastrulation to early organogenesis
6
(
‘
Gastrulation atlas
’
)
, we designed probes against a panel of
387
genes
and
spatially localized
their
expression in multiple 8
-
12 somite stage embryo sections
using a version of the seqFISH (se
q
uential fluores
cence
in s
i
tu
hybridization)
me
thod
modified
to allow highly
-
effective cell segmentation. Assigning each cell in the seqFISH
-
profiled embryos
a distinct cell type identity revealed different patterns of co
-
localization
of cells within and
b
e
tween cell type
s.
I
ntegra
t
ing
scRNA
-
seq and se
qFISH data
enabled
the
genome
-
wide
imputation of expression, thus generating a complete quantitative and spatially
-
resolved map of
gene
expression
at single
-
cell resolution
across the entire embryo.
To illustr
a
te the
power
of
this
reso
u
rce, we
used these i
mputed data to perform a virtual dissection
of the mid
-
and hind
-
brain
region of the embryo
, uncovering
spatially resolved patterns of expression associated with both
the
dorsal
-
ventral
and
rostral
-
caudal
axes
.
Finally, by in
tegrating
a
second,
independent
scRNA
-
seq dataset that characterized cell types within the developing gut tube
2
, we resolved
the position
of two clus
t
ers of cells th
at were bo
t
h previously assigne
d a lung precursor identity using the
scRNA
-
seq data
2
. Our spatial data revealed that these two clusters were exclusiv
e
ly located on
e
ither the
d
orsal or ventral sid
e of the gut tube, with corresponding
transcriptional
differences
indicating that the dorsal cells give rise to the
e
sophagus
, while
the ventral cells give rise to the
lung and trachea
.
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted November 21, 2020.
;
https://doi.org/10.1101/2020.11.20.391896
doi:
bioRxiv preprint
6
Results
A
s
ingle
-
c
e
ll spatial
expres
sion pro
f
il
ing
of mouse
organ
ogenesis
We
performed
seqFISH
10,11
on
sagittal sections from three
mouse embryos at the 8
-
12 somite
stage, corresponding to embryonic day (E)8.5
-
8.75
(Figure 1A
-
C). The sections
analyzed
were
chosen to
corres
p
ond
as close as p
ossible
t
o
the midline of the
embryo, albeit some variation
along the left
-
right
axis could be
observed
due to embryo tilt
(Figure 1B).
In each section w
e
probed the expression
of 351 barcoded genes specifically
chosen
to distinguish dist
i
nct cell types
at
t
hese de
ve
lopmental st
ages
(Supplementary Figure
1
; Supplementary Table 1
-
2
)
.
To
do this
,
we exploited a
recently published single
-
cell molecular map of mouse gastrulation and early
organogenesis
6
,
an
d
determined
computationally a
set of
lowly
-
to moderately
-
expressed
gene
s
that were best a
bl
e
to re
co
ver the cell
type identit
ies
(
Methods
; Supplementary Figure
1
).
Low
-
to moderately
-
expressed genes were selected since
low overall expression of the library is
needed
to reduce the optical density of detected transcripts in a c
e
ll
so
that crowdi
ng
does
n
ot
prevent sin
gle mRNA spots
from
be
ing
resolved reliably.
To obtain a good signal
-
to
-
noise ratio for the mRNA spots, we
performed tissue clearing
to
reduce the tissue background signal
, as introduced before
26,34
.
Briefly, t
he tissue section
s
were
embedded into a hydrogel scaffold, RNA molecules cross linked into the hydrogel, and lipid and
protei
n removed to achieve optimal tissue transparency for seqFISH
(Methods)
. On
e
c
onsequence of dep
le
ting
pr
ot
eins
is that
delineating the
cell membrane, and hence
cell
segmentation
,
becomes
challenging. To
address this
, prior to tissue embedding
we performed
immunodetection
for selected surface
antigens
,
Pan
-
cadherin, N
-
cadherin,
β
-
Catenin, and E
-
cadherin
,
which
could in turn be
recognized by a secondary antibody conjugated to a unique
DNA sequence. We then hybridized a tertiary probe to the DNA sequence of the secondary
antibody, which
had
a unique single
-
molecular FISH (smFISH) r
eadout sequence and an acrydite
group. The acrydite group
becomes
cross
-
linked into the hydrogel scaffold and
remains
in
position, even after protein degradation
35
. The unique smFISH readout sequence can
subsequently be hybridi
z
ed
with
a read
-
out probe conjugated to a fluoroph
ore
, allowing
the
cell
mem
brane to be
visualize
d
(Figure 1D
) and enabling segmentation using the interactive learning
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted November 21, 2020.
;
https://doi.org/10.1101/2020.11.20.391896
doi:
bioRxiv preprint
7
and cell segmentation tool
I
lastik
36
.
To validate
this strateg
y
, we applied it to a 10
μm thick
transverse section of
an
E8.5 mouse embryo
, which
confirmed label
ing of the cell membrane
(Figure 1E
;
Supplemen
tary
Figure
2
).
B
efore imaging samples for seqFISH,
overall RNA
integrity was examined by
ensuring
co
-
localization of two
Eef2
probe s
e
ts, each detected by a
unique read
-
out probe conjugated to a different fluorophore (Supplementary
F
igure
2
;
Supplementary Table
s
1
and
3
).
Following imaging
, the
resulting data
were
segmented as
detailed
above and
individual mRNA
molecule
s were detected
b
y decoding
barcode
s
over the multiple rounds
of imaging
. To
guarantee high sample quality,
the first round of hybridization was repeated following all
intervening hybridization rounds, allowing
for
consistency of
mRNA signal intensity
to be
assessed
(
Supp
l
ementary
F
igure
3
).
In total, following
cell
-
level quality control
,
we identified
57,536 cells across three embryos
with a combined total of 11,004,298 individual mRNA
molecules detected.
In the embryo tissue sections,
each
cell contained on average
196
±
19.3
(mean ± s.
e
.) mRNA transcripts
from 93.2 ± 6.6 (mean ± s.e.) genes (Supplementary
F
igure 4),
corresponding to
an average of
26.6% of
all
gene’s
profiled.
The set of genes expressed was not
biased towards a specific germ layer, with an average of 21.0
%
±
1.1% (mean
±
se) genes most
associated with a mesoderm identity in the E8.5 Gastrulation atlas being expressed per seqFISH
cell, through to 31.6%
±
3.3% (mean
±
se) of ectoderm genes.
Next, t
o confirm
the quality of our data
, we examined the expressi
o
n of twelve genes (Figure 1F)
with well
-
characterized expression patterns
.
As expected, t
he cardiomyocyte markers
Ttn
37
and
Popdc2
38
showed the highest express
ion
in th
e region of the developing heart t
ube, while
Hand1
39,40
and
Gata5
41
showed expression in the heart, as well as t
he more
posterior
lateral plate
mesoderm.
Similarly,
the expression of four known b
rain markers,
Six3
42
,
Lhx2
43
,
Otx2
44
–
46
and
Pou3f1
47
confirmed the strongest expression of these genes in the developing brain.
Turning to
genes that mark broader territories
within the embryo, t
he
neural tub
e
marker
Sox2
showed
strong expres
sion in the brain and along the dorsal side of the
embryo
48,49
.
Additionally,
expression of t
he
mesoderm marker
Foxf1
was localized to mesodermal cells outlining the
developing gut tube, the lateral plate mesoderm and extraembryonic mesoderm of the allantois
50
.
Las
tly,
two gut endoderm markers
Foxa
1
51
and
Cldn4
52,53
mark
ed
the developing gut tube along
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted November 21, 2020.
;
https://doi.org/10.1101/2020.11.20.391896
doi:
bioRxiv preprint
8
the anterior
-
posterior axis of the embryo.
The
tissue
-
specific
expression profile of the
se
genes
was
consi
stent
with
both
the
G
astrulation
atlas
6
(
Supplementary
F
igure
4
)
as well as the broad
expression territories defined in the EMAGE
database
21
.
As a further confirmation of the quality
of our data
, we confirmed the positional expression profiles of the
measured
Hox gene family
members
, which follow
ed
the described ‘Hox code’ along the anterior
-
posterior
axis
54,55
(Supplementary Figure
5
).
Finally
, the high
-
resolution of seqFISH allows
visuali
zation of
mRNA molecules at sub
-
cellular resolution
, enabling the
generation of high quality d
igita
l
in
situs
(Figure 1G).
Taken together,
these analyses demonstrate that
we
can
reliab
l
y record the
expression profiles of
hundreds
of genes across an entire
embryo cross
-
section
at single
-
cell
resolution
.
Cell type identity and spatial
transcriptiona
l het
erogeneity
Thus far
we have focused on the expression of individual
gene
s
. However,
the real power of
the
data derives from
the ability to study co
-
expressi
on of hundreds of genes within
their
spatial
context.
To develop this potential
,
as a first st
ep,
w
e
assigned each cell within the seqFISH
-
profiled embryos a distinct cell type identity
using
cell type mapping
. To
make this assignment
we integrated each ce
ll’s
expression profile
from seqFISH
with the
E8.5
cells from the
Gastrulation atlas
6
using batch
-
aware dimension reduction and Mu
tual
Nearest Neighbours
(MNN)
batch correction
56
(Suppl
ement
ary
F
igure
6
)
, before
annotat
ing
seqFISH cells based on
their nearest
neighbors
in the
Gastrulation atlas
(Figure 2A
; Supp
lementary
F
igure
6
).
We further
ref
ined this automated cell type classification by performing joint clustering of both datasets a
nd
co
mparing
their relative cell type contribution and gene expression profiles (Supplementary
F
igure
6
;
Methods).
We observed that
the assigned cell type identit
ies were consistent with
known anatomy as well as with the expression of distinct marker genes
(Fig
ure 1
F
;
Figure
2
B
-
C
;
Supplementary Figure
7
-
9
).
As an alternative, we
performed
direct clustering of the seqFISH data
, which
revealed similar
groupings of
cells
(Supp
lementary
F
ig
ure
10
),
indicating
that a small number of
carefully
-
chosen
genes
can
provi
de enough information
to accurately group cells
.
However, we note that
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted November 21, 2020.
;
https://doi.org/10.1101/2020.11.20.391896
doi:
bioRxiv preprint
9
assigning cell type identity
using
only
a small number of
marker
genes
is likely to be
less reliable
than imputing identity through reference to the Gastrulation
atlas
.
Next,
to
study
when
boundaries between emerging tissue compartments are established in the
develop
ing
embryo,
we
statistically
quantified
whether cells assigned to the sam
e type were
spatially coherent within the embryo
,
as well as determin
ing
the
extent to
which p
airs
of cell
types were co
-
located (Fig
ure
2
D
-
E
, Methods
).
We used a
permutation
strategy to evaluate the
relative enrichment or depletion of direct cell
-
cell con
tact events between each cell type
(
compared to
a
random
distribution of cell types
)
resulting
in
a
cell
-
cell contact map
(Figure 2
D
,
Supplementary Figure
11
).
C
ertain cell types,
such as
cardiomyocytes and the gut tube
were
spatially
and morphologically
d
istinct
within the embryo
,
while others, like the endothelium,
were
interspersed and
spread ac
ross
the entire embryo space.
More generally, while most cell types are characterized using prior knowledge of expression
markers and lineage inference, other po
pulations
such as
the
mixed
mesenchy
mal mesoderm
represent
a
cell state
rather than a
defined
cell
type. Mesenchyme represents a state in which
cells
express markers characteristic of migratory cells loosely dispersed within an extra
-
cellular
matrix
57
. Th
is
strong overriding transcriptional signature of mesenchyme, irrespective of
location, ma
kes it challenging to distinguish which cell types this
mixed
mesenchymal
mesoderm
population repre
sen
ts using classical scRNA
-
seq
data. In contrast, our integrated
spatial expression map allowed us to resolve five transcriptionally distinct subpopu
lations
(cluster 1
-
5)
that
were spatially defined (Supplementary Figure 12
; Method
s
).
Based on it
s
a
natomical
position overlaying the developing heart, we infer that cluster 1
reflects
cells with a
cardiac mesoderm and pericardium identity.
Clusters 2 an
d 3 are
located in the
septum transversum, in the region of the forming hepatic plate and proepicar
diu
m.
At this
developmental stage BMP signaling from the developing heart and FGF signaling from the
septum transversum mesenchyme is critical for the induct
ion of
hepatic fate specification in the
foregut
58,59
. Consistent with this we observed enrichment
fo
r BMP signaling in cluster
1
(Supplementary
F
igure 12). Additionally, in cluster
3
we observed the co
-
expression of
proepicardi
al
markers
Tbx1
8
and
Wt1
60,61
whose deletion results in heart
62
and liver
63
defects
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted November 21, 2020.
;
https://doi.org/10.1101/2020.11.20.391896
doi:
bioRxiv preprint
10
(Supplementary
Figure 12). Our
ability to
spatial
ly
map
cluster 3 reve
ale
d its position
caudal
to
the forming heart, correspond
ing
with the known location of the proepicardium,
thereby
allowing
us to
characterize
this cluster
. Together, their location and expression prof
iles indicate that the
cells from
cluster 2 and 3
will
con
tribute to the hepatic mesenchyme (important for hepatoblast
specification) and the pro
epicardium, respectively. Lastly,
cluster 4 and 5
are located toward the
body wall, suggesting a somatic mesode
rm identity that will
contribute
to the dermis
64
.
To
assess
additional
, more subtle,
spatially
-
d
riv
en
transcriptional
heterogeneity
, we
used a linear
model to identify
genes that show a strong spatial expression pattern
within each
cell type
(Figure 2
E
;
Supplementary Table 4
;
Methods
)
.
Th
is
indica
ted
that
residual
transcriptional
heterogeneity in the
F
o
rebrain/
M
idbrain/
H
indbrain cluster
can be
explained by
localized patterns
of
expression
,
most
likely
resulting from
the presence of
regionally
-
specific
developing brain
subtypes
(Supplementary Table
5)
. To investigate this further,
we performed a focuse
d r
e
-
clustering of
F
orebrain/
M
idbrain
/H
indbrain cells
, recovering four major brain subregions and
seven
subclusters (Figure 2
F
-
G).
Cross
-
referencing
spatial location and underlying gene
expression signa
ture allowed us to identify sub
clusters associated wi
th
the prosencephalon,
mesencephalon, rhombencephalon and the tegmentum (Figure 2
G
-
H
;
Supplementary
F
igure
11)
.
A
single
-
cell
10,000
-
plex spatial map of
inferred
gene
expression in the mouse embryo
B
y design,
our
seqFISH library allow
ed
us to
probe the
exp
ression of
specific
genes associated
with
cell type identit
y
. Additionally,
we
directly
measured
the expression of
a number of genes
associated with key signaling cascades e.g. Notch
65
and Wnt
66
. Nevertheless,
a full
, unbiased,
view of the int
erplay between a cell’s spatial location
and
its molecular profile, and how this
influenc
es development
would benefit from measuring expression
of the entire transcriptome,
so
mething that is not
straightforward
with existing high
ly
-
multiplexed RNA FISH pro
tocols.
To overcome these limitations, we
built upon the
MNN
mapping
approach
described
earlier
(Figure 2
, Supplementary
Figure 6
) and inferred
the
full transcriptom
e
of
eac
h seqFISH cell
by
considering the
weighted
expression profile of the cells to
which it is most transcript
ionally
similar in the
Gastrulation
atlas
(Figure 3A
;
Supplementa
ry Figure
13
;
Methods
).
To test
the
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted November 21, 2020.
;
https://doi.org/10.1101/2020.11.20.391896
doi:
bioRxiv preprint
11
integrity of this
strategy, for each gene
probed i
n our seqFISH experiment (excluding
Xist
,
as it is
sex
specific
)
,
we
used the
remaining 349
measured
gene
s to map all
cell
s
to the
Gastrulation
atlas
and imputed the expr
ession of the
withheld
gene. To evaluate performance, we calculated
,
for each
gene and
across all cells,
the Pearson correlation
(
‘
performance score
’
)
between
the
i
mputed
expression count
s
an
d
the
measured seqFISH
expression level
s
.
To estimate an upper
bo
und on the performance score (i.e., the maximum correlation we might expect to
observe
) we
exploited
the four independent batches of E8.5 cells that were processed
in the scRNA
-
seq
Gastrulati
on
atlas.
W
e treated one of the four batches as the query set and
used the leave
-
one
-
out
approach described above to impute the expression of
t
he 350
g
enes
of interest
by mapping cells
onto a reference composed of the remaining t
hree batches
, before comput
ing the Pearson
correlation between the imputed and true expressi
on counts (‘prediction score’; Methods)
.
Computing the ratio of the Performanc
e (seqFI
SH
–
scRNA
-
seq) and Prediction (scRNA
-
seq
–
scRNA
-
seq) scores yields a normali
zed performance score. Acro
ss genes, we observed
a
median
normalized performance score of
0.
73
(lower quartile 0.32, upper quartile 1.09)
(Supp
lementary
Fi
gure
13)
, sugge
sting th
at our ability to infer gene expression is comparable to what might be
expecte
d when combining independen
t scRNA
-
seq datasets
and providing confidence in our
approach.
T
o further validate
our
imputation
strategy
, we
used
non
-
barcoded
sequential
s
mFISH
to
measure
the expression of
36
additional
genes
in the embryo sections
probed b
y seqFISH
and
contrasted
th
e
true
expression profile
with
the imputed values
(Figure
3
B). Th
i
s
independent
validation
–
these smFISH genes were not used in the
MNN
mappin
g
–
confi
rmed that
imputation reliably
recovered
gene expression profiles
(
Figure 3B;
Supplementary Figure
1
4
-
1
8
)
.
For example
, we observed a strong overlap between measured and i
mputed expression for
Dlx5
67
,
an essential
and spatially
-
restricted
r
egulator
of craniofacial structures, in the anterior
surface ectoderm and first branch
ial arch
. Additionally,
we
noted that
Tmem54
was
inferred to be
specifically expressed
in the
anterior surface
ectoderm
and along the gut tube,
Nkx2
-
5
68,69
was
inferred to be expressed in the
developing
heart
,
and
Mesp1
was inferred to
be expressed in
the
posteri
or presom
i
tic mesoderm (PSM
;
70,71
)
.
Finally,
the ubiquitous expression profile of
Basp1
and the absence of expression
of
t
he germ line marker
Utf1
72
was also recapitulat
ed in the
imputed expressio
n maps
.
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted November 21, 2020.
;
https://doi.org/10.1101/2020.11.20.391896
doi:
bioRxiv preprint