1
Combinatorial expression motifs in signaling pathways
1
Alejandro A. Granados
1
,2,3
†
, Nivedita Kanrar
1
,2
†
, Michael B. Elowitz
1
,2,
*
2
3
1
D
ivision of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA
4
91125,USA
5
2
H
oward Hughes Medical Institute and Department of Applied Physics, California Institute of
6
Technology, Pasadena, CA 91125, USA
7
3
P
resent address: Chan-Zuckerberg Biohub, San Francisco, CA 94158, USA
8
† Equal contribution
9
*
Correspondence:
melowitz@caltech.edu
(M.B.E.)
10
A
bstract
11
12
Cell-cell signaling pathways comprise sets of variant recept
ors that are expressed in different
13
combinations in different cell types. This architecture al
lows one pathway to be used in a variety
14
of configurations, which could provide distinct function
al capabilities, such as responding to
15
different ligand variants. While individual pathways ha
ve been well-studied, we have lacked a
16
comprehensive understanding of what receptor combinations
are expressed and how they are
17
distributed across cell types. Here, combining data from m
ultiple single-cell gene expression
18
atlases, we analyzed the expression profiles of core signali
ng pathways, including TGF-β,
19
Notch, Wnt, and Eph-ephrin, as well as non-signaling
pathways. In many pathways, a limited set
20
of receptor expression profiles are used recurrently in many dist
inct cell types. While some
21
recurrent profiles are restricted to groups of closely related
cells, others, which we term pathway
22
expression motifs, reappear in distantly related cell type
s spanning diverse tissues and organs.
23
Motif usage was generally uncorrelated between pathways, re
mained stable in a given cell type
24
during aging, but could change in sudden punctuated tra
nsitions during development. These
25
results suggest a mosaic view of pathway usage, in which
the same core pathways can be
26
active in many or most cell types, but operate in one of
a handful of distinct modes.
27
28
29
30
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
2
Introduction
31
In metazoans, a handful of core cell-cell communication pathw
ays such as TGF-β, Notch, Eph-
32
ephrin, and Wnt play critical roles in diverse development
al and physiological processes
33
(
Antebi, Nandagopal, et al., 2017; Gerhart, 1999; Li & Elowitz, 2019; Lim et al., 2015)
. Each of
34
these pathways includes multiple, partly redundant, receptor
variants that are expressed in
35
distinct combinations in different cell types and intera
ct in a many-to-many, or promiscuous,
36
manner with corresponding sets of ligand variants (Figure 1A
)
(Derynck & Budi, 2019;
3
7
Massagué, 2012; Okigawa et al., 2014; Rohani et al., 2014; Verkaar & Zaman, 2010; Wang et
38
al., 2016)
. Within a given cell, the function of the pathway—which
ligands it responds to, or
39
which intracellular targets it activates—in general depends o
n which combination of components
40
a cell expresses. For example, the TGF-β pathway, which
plays pivotal roles in diverse
4
1
developmental and physiological processes (David & Massagué, 2018)
, comprises 7 type I and
42
5 type II receptor subunits that combine to form heterotet
rameric receptors composed of two
43
type I and two type II subunits
(Wrana et al., 1992)
. Cell types with distinct receptor expression
44
profiles preferentially respond to distinct combination
s of BMP ligands
(Antebi, Linton, et al.,
4
5
2017; Vilar et al., 2006)
, suggesting that different receptor combinations could p
rovide distinct
46
ligand specificities. Similarly, in mice, the Wnt pathway c
omprises a set of 10 Frizzled receptor
47
variants that interact with 2 different LRP co-receptors,
all of which are expressed in different
48
combinations, and collectively control the cell’s response to
combinations of Wnt ligand variants
49
(
Eubelen et al., 2018; Goentoro & Kirschner, 2009; Voloshanenko et al., 2017)
. The theme
50
continues in the juxtacrine Notch and Eph-ephrin pathway
s where different membrane-bound
51
ligand and receptor variants are expressed in diverse combinat
ions and interact promiscuously
52
to control which cells can signal to which others
(Groot et al., 2014; Kania & Klein, 2016; Klein,
5
3
2012; Lafkas et al., 2015; LeBon et al., 2014; Sprinzak et al., 2010)
. Despite the prevalence of
54
these promiscuous combinatorial architectures, it has genera
lly remained unclear what pathway
55
expression profiles exist and how they are distributed ac
ross cell types and tissues.
56
I
n principle, pathway expression profiles could be distributed across cell types in three
57
qualitatively different ways. At one extreme, each cell type could express its own, completely
58
unique, profile of pathway components (Figure 1B, left). In this case, one would observe as
59
many distinct pathway profiles as cell types. Alternatively, sets of closely related
60
(transcriptionally similar) cell types could share the same pathway expression profile (Figure 1B,
61
center). This would result in fewer pathway profiles than cell types, and a correlation between
62
the similarity of pathway profiles and the similarity of the overall transcriptomes of the cells in
63
which they appear. Finally, a third possibility would be to observe a limited number of recurrent
64
pathway profiles (as in the second case), but with individual profiles dispersed across multiple,
65
distantly related cell types, rather than confined to sets of closely related cell types (Figure 1B,
66
right). In this regime, otherwise similar cell types could exhibit divergent profiles for the pathway
67
of interest, while, conversely, more distantly related cell types would converge on similar
68
pathway profiles. In this last regime, a limited repertoire of profiles, which we term “pathway
69
expression motifs,” are re-used in diverse cell contexts. Assuming that differences in pathway
70
profile confer corresponding differences in ligand responsiveness or other properties, each of
71
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
3
these regimes implies something different about the number and distribution of functionally
72
distinct signaling modes for a pathway of interest.
73
Previously, systematically distinguishing among these potenti
al classes of behavior would be
74
difficult. Recently, however, single-cell RNA sequencing
(scRNA-seq) cell atlases have begun to
75
provide comprehensive gene expression profiles across most
or all cell types in embryos and
76
adult organisms. For example, the Tabula Muris project
provided expression profiles for
77
~100,000 cells across 20 organs in adult mice
(Tabula Muris Consortium et al., 2018)
. This data
78
set was later augmented with studies of mice at two add
itional ages
(Tabula Muris Consortium,
7
9
2020)
. In parallel, scRNA-seq studies of embryonic development
have similarly provided
80
transcriptional profiles for the cell states in the ea
rly embryo
(Grosswendt et al., 2020)
and
81
specific organs later in organogenesis
(He et al., 2020)
. Collectively, these data provide an
82
opportunity to determine the combinatorial structure of
pathway expression.
83
Here, we introduce a statistical framework to identif
y pathway expression profiles and
84
characterize their distribution across cell types in an agg
regated data set spanning multiple
85
atlases. This approach allowed us to identify the pathway
expression motifs described above
86
(Figure 1B, right) as well as “private” profiles that are
limited to sets of closely related cell types
87
(Figure 1B, middle) in core communication pathways includ
ing TGF-β, Notch, Eph-ephrin, and
88
Wnt. These results suggest that each pathway can operate in
a handful of distinct “modes.”
89
Further, the mode used by one pathway appears to be ind
ependent of those used by other
90
signaling pathways. Dynamically, pathway modes can remain r
emarkably stable during aging,
91
or change suddenly as cells progressively differentiate duri
ng development. Together, these
92
results provide a combinatorial view of signaling pathwa
y states and suggest that many of the
93
most central pathways can exist in a handful of different
modes, which, in the future, may be
94
studied independently of the cell types in which they app
ear.
95
R
esults
96
Integration of cell atlas data sets
97
T
o analyze pathway expression profiles across a broad diversity of cell types, we first compiled
98
data from multiple adult and developmental cell atlas data sets (Figure 2A, Table 1). These
99
included the Tabula Muris cell atlas (Tabula Muris Consortium et al., 2018), which comprises
100
40,000 cells distributed across 18 organs from a 3 month old mouse, as well as Tabula Senis
101
(Tabula Muris Consortium, 2020), which augmented these data with ~100,000 additional cells
102
from mice aged 1, 18, 24, and 32 months. We also included two early developmental whole
103
embryo atlases from E6.5 to E8.5 (Grosswendt et al., 2020; Pijuan-Sala et al., 2019a), and a
104
forelimb organogenesis atlas from E10.5 to E15 (He et al., 2020). Each of these data sets also
105
contained a cell type annotation for each cell based on expression of known markers.
106
Altogether, the aggregated data set included expression profiles and cell type annotations for
107
~700,000 individual cells.
108
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
4
To allow a unified analysis of these data, we clustered the global transcriptional profiles from
109
each dataset independently. This procedure resulted in 1206 clusters, spanning 917 unique cell
110
type annotations (e.g. “Organ: Lung, cell type:endothelial, age: 3m”), providing a unified data set
111
for further analysis (Figure 2B, Methods). For simplicity, in this work, we will refer to each global
112
gene expression cluster as a “cell state” and not distinguish between formal “cell types” and
113
other levels of variation. This clustering procedure and the cell states recovered from each
114
dataset matched previous published analyses (Fig. 2–figure supplement 1A).
115
To focus on expression differences between cell states, reduce the complexity of the data set,
116
and minimize the impact of measurement noise, we computed the average transcriptome profile
117
of each one of the 1206 clusters (Methods), similar to other recent integration approaches (Qiu
118
et al., 2021). Similar cell states in different data sets shared similar expression profiles, including
119
for the specific pathways discussed below (Figure 2–figure supplement 1B). A UMAP projection
120
displays the variety of cell classes comprising the integrated atlas (Figure 2B, right). We note
121
that cluster averaging potentially eliminates biologically meaningful gene expression variability
122
within a cluster. However, pairs of genes that were highly expressed within a cluster average
123
also showed significant co-expression in single cells (p < 0.001; Figure 2–figure supplement
124
1C). The integrated, cluster-averaged dataset provides a basis for analyzing systematic
125
changes in pathway gene expression between cell states in embryonic and adult contexts.
126
TGF-β receptors exhibit recurrent expression profiles
127
U
sing the integrated data set, we first focused on the TGF-β pathway. A functional TGF-β
128
pathway requires expression of at least one type I and one type II receptor subunit. Across the
129
1206 cell states, approximately half met this criteria, expressing at least one receptor of each
130
type above a minimum threshold (Figure 2C, Methods). The most prevalent receptors, Bmpr1a
131
and Acvr2a, were expressed in ~10 times more cell types than the least prevalent, Acvr1c and
132
Bmpr1b (Figure 3–figure Supplement 1A
).
Nearly every receptor subunit was co-expressed with
133
each other receptor subunit in at least some cell types (Figure 3–figure supplement 1C). Even
134
Acvrl1 and Bmpr1a, which were mainly expressed in endothelial and epithelial cells,
135
respectively, were also co-expressed in mesenchymal cells (Figure 3–source data 1).
136
Exceptions included Bmpr1b and Acvr1c, which were less prevalent overall and were co-
137
expressed with a more limited set of other subunits (Figure 3–figure supplement 1C). Overall,
138
these results provided TGF-β transcriptional expression profiles across cell types and revealed
139
that they were strongly combinatorial.
140
To test whether certain receptor profiles recurred across cell types as in Figure 1B, middle and
141
right panels, we clustered cell types based only on their TGF-β pathway expression profiles
142
(Figure 3A). To detect recurrent profiles, we computed the silhouette score, which compares the
143
separation of points between clusters to the separation of points within a cluster, and penalizes
144
for both over- and under-clustering (Figure 3–figure supplement 2A) (Rousseeuw, 1987). The
145
silhouette score provides a metric to quantify the approximate number of distinct clusters in a
146
dataset. We compared the silhouette from actual profiles to those determined from randomized
147
data sets in which the expression level of each receptor was independently scrambled among
148
cell types (Figure 3–figure supplement 2B, black and gray lines). Subtracting the randomized
149
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
5
silhouette score from that of the actual profile, and dividing by the standard deviation of
150
randomized data, we obtained a z-score that quantifies how much the silhouette score from the
151
actual profiles deviates that observed in the randomized control data, for a given number of
152
clusters
k
. Finally, we selected the optimal number of clusters,
푘
௧
, that maximized this z-score
153
(Figure 3—figure supplement 2B, blue). Altogether, this analysis revealed that 622 cell states
154
expressing TGF-β receptors, collectively exhibit only about 30 distinct, recurrent pathway
155
expression profiles (Figure 3A). Critically, every receptor subunit was expressed in at least one
156
of these profiles, consistent with a combinatorial view of receptor utilization.
157
TGF-β pathway expression motifs appeared in diverse cell ty
pes
158
H
aving identified recurrent pathway expression profiles, we next asked how they were
159
distributed across cell types, as in Figure 1B. To answer this question, we first visualized TGF-β
160
pathway expression profiles on the dendrogram of global cell types (Figure 3B—supplementary
161
file 1). We color-coded each profile in Figure 3A and then annotated each cell state on the
162
global dendrogram with the color corresponding to its TGF-β profile (Figure 3B). Strikingly,
163
many profiles were broadly distributed over diverse cell types (Figure 3B, colored arrows). For
164
example, profile 10 (mint green) appeared in adult macrophages and leukocytes as well as
165
mesenchymal adipose stem cells. On the other hand, a smaller number of pathway profiles
166
showed the opposite behavior. They were restricted exclusively to a particular clade of closely
167
related cell states (Figure 3B, colored asterisks). These results suggest that TGF-β could exhibit
168
both pathway motifs and private profiles.
169
One potential explanation for the dispersion of recurrent pathway profiles could be if general
170
classes of cell types, such as macrophages, fibroblasts, epithelial cells, or endothelial cells each
171
adopted a particular, characteristic profile, irrespective of their tissue or organ context. For
172
example, a pathway profile could appear dispersed if it occurred in a broad set of otherwise
173
diverse macrophage cell types. We therefore used a Sankey diagram to visualize the
174
relationship between each of these four cell type classes, based on cell type annotations in the
175
atlas, and the full set of TGF-β profiles (Figure 3C). Some classes, such as epithelial cells, used
176
more diverse TGF-β profiles than others, such as endothelial cells. Nevertheless, each of the
177
four cell type classes mapped onto multiple TGF-β profiles. Conversely, most of the profiles
178
appeared in multiple cell type classes or cell types (Figure 3C, inset). These results rule out
179
these cell type classes as an explanation for dispersed use of recurrent pathway profiles, and
180
suggest that pathway profile usage is based on other aspects of cell states.
181
To more systematically and quantitatively characterize the distribution of each pathway profile,
182
we defined the “dispersion” of a given TGF-β profile as the mean value of the pairwise euclidean
183
transcriptome distances among all cell types that express it, computed in the space of the 100
184
most significant principal components (Figure 4A). About 60% of TGF-β profiles were
185
predominantly observed in specific sets of closely related cell types (Figure 4B, points between
186
dashed lines). By contrast, 40% of
TGF-β
profiles were dispersed more broadly, often spanning
1
87
distantly related cell types (Figure 4B, points above expected range). In fact, this subset of
TGF-
188
β
profiles exhibited cell type dispersion levels approachin
g those expected if
TGF-β
profiles
1
89
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
6
were assigned to cell types randomly (Figure 4C, blue versus black lines). Based on this
190
analysis, we defined pathway expression motifs as profiles whose mean cell type dispersion
191
exceeded a cutoff. For most analysis here, we set this cutoff at the 90th percentile of
192
dispersions among groups of globally similar cell types (Figure 4D, Methods). Alternative
193
dispersion metrics produced broadly similar, but not identical, motif sets, indicating some
194
sensitivity to the definition of dispersion (Figure 4–figure supplement 1C). Finally, we note that
195
this criteria is sensitive to an arbitrary threshold, the motif cutoff, here chosen at the 90th
196
percentile. Reducing the motif cutoff would allow less dispersed profiles to be classified as
197
motifs.
198
To better understand the structure of motifs, we also examined expression correlations among
199
individual BMP receptors. Among cell states expressing pathway motifs, almost half of the
200
receptor pairs (25/55) showed no significant correlation, with the remaining pairs exhibiting a
201
mix of positive and negative pairwise correlations (Figure 4–figure supplement 1A). For
202
example, Bmpr1a was positively correlated with Acvr1 and Acvr2a, while Acvrl1 and Tgfbr2
203
were strongly correlated, with Acvrl1 expressed in a subset of cell types that expressed Tgfbr2.
204
Acvrl1 and Tgfbr2, which were previously shown to mediate signaling by BMP9, could also
205
function together as a module in this context (Chen et al., 2013).
206
TGF-β pathway motifs exhibited several interesting features. First, they were enriched for
207
expression of the type I receptors Bmpr1a and Acvr1, as well as the type II receptor Acvr2a. In
208
fact, almost all motifs co-expressed all three of these receptor subunits (Figure 4D). On the
209
other hand, Bmpr1b, Acvrl1 and Acvr1c were the least represented receptor subunits, appearing
210
in only 3, 3, or 4 of the motifs, respectively. The most prevalent motif, 8, was expressed in 9
211
different mouse organs and is similar to the profile of NMuMG mammary epithelial cells, which
212
were shown to compute complex responses to ligand combinations (Antebi, Linton, et al., 2017;
213
Klumpe et al., 2020) (Figure 4D, rows). Motif 8 included the type 1 subunits Bmpr1a, Acvr1, and
214
Tgfbr1, as well as the type II subunits Acvr2a, and Tgfbr2. Motif 15, which is similar to motif 8
215
but with more Bmpr1b, was shown to exhibit reduced complexity of combinatorial ligand
216
responsiveness (Klumpe et al., 2020), suggesting that even a change in a single receptor
217
between profiles could be functionally significant.
218
Motifs were broadly distributed across the organism, with some appearing in as many as 9
219
different mouse organs (Figure 4E, rows). Conversely, multiple motifs appeared in the same
220
organ. For example, the adult kidney included cell states with 9 different TGF-β receptor
221
expression motifs (Figure 4E, columns). These results underscore the breadth of the dispersion
222
of the pathway motifs.
223
In contrast to motifs, other TGF-β profiles recurred in multiple cell types but exhibited low
224
dispersion, as in Figure 1B, middle panel (Figure 4–figure supplement 1B). One of these
225
groups, consisting of profiles 1,2, and 5, was in fact dispersed among diverse developmental
226
cell types, including the primitive streak, ectoderm derivatives, and mesodermal tissues.
227
However, it received a lower dispersion score due to the relative similarity of early embryonic
228
cell types compared to adult cell types. We therefore classified these profiles as a
229
developmental motif (Figure 3B, hot pink). These three profiles expressed a combination of
230
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
7
Bmpr1a and Acvr2b, and resembled the BMP receptor profile previously identified in mouse
231
embryonic stem cells, suggesting that the early embryonic receptor profile is stably maintained
232
during early germ layer cell fate diversification (Klumpe et al., 2020).
233
By contrast, profiles 29 and 30 were each confined to a single set of closely related cell types:
234
chondrocytes (E13.5-E15.0) and macrophages, respectively. Because they were tightly
235
associated with a particular set of cell types, these profiles are effectively the opposite of a
236
motif, and we refer to them as “private” profiles. Notably, these private profiles both expressed
237
Bmpr2, which is less prevalent compared to other receptors. Nevertheless, Bmpr2 is not a
238
marker of private profiles, as it is also expressed in dispersed motifs, such as motifs 8, 9, 10, 13,
239
and 27 (Figure 4D). Together, these results suggest that the TGF-β pathway exhibits a set of
240
recurrent and dispersed expression motifs, as well as a relatively small number of private
241
profiles.
242
Additional signaling pathways also exhibit pathway expressi
on motifs.
243
O
ther signaling pathways also exhibited recurrent expression profiles (Figure 5). Using the
244
PathBank database of biological pathways (Wishart et al., 2020), we identified 56 different
245
annotated biological pathways involved in signaling and other functions (Figure 5–source data
246
1). For each pathway, we assembled a corresponding list of genes, normalized their expression,
247
clustered the resulting profiles, computed silhouette scores, and compared them to a null
248
hypothesis in which the expression levels of each gene were independently and randomly
249
reassigned to different cell types as described previously (Figure 5–figure supplement 1A). As
250
with TGF-β, we identified the optimal number of clusters for each pathway by determining the
251
peak value of the silhouette z-score.
252
To classify pathways as recurrent or cell type-specific, we generated, for each pathway, a
253
corresponding ensemble of ~100 pseudo-pathways of the same size but composed of randomly
254
selected genes (Figure 5A, black; Figure 5–figure supplement 1B, black). By clustering
255
expression for each pseudo-pathway, we computed a null hypothesis distribution of
푘
௧
for
256
each pathway of interest (Figure 5A, blue; Figure 5–figure supplement 1B, blue). We then
257
calculated the difference between the observed number of clusters in the real pathway and the
258
mean number of clusters found in the corresponding ensemble of pseudo-pathways (Figure 5B).
259
Similar to TGF-β, several pathways exhibited fewer clusters than expected given their number
260
of genes, indicating recurrent expression profiles (Figure 5B, right). These included core cell-cell
261
communication pathways such as Notch, Ephrin, as well as the Srsf splicing protein family,
262
including all 11 SR family splice regulatory proteins, and a protein degradation pathway defined
263
at Pathbank consisting predominantly of different proteasome subunits
(Wishart et al. 2020)
.
264
W
e also observed the opposite behavior: in some cases, pathway expression profiles in
265
different cell states differed even more from one another than the expression levels of randomly
266
chosen sets of genes. These pathways were thus the opposite of recurrent, or equivalently,
267
highly cell type-specific, in their expression. They included CXCR4 (Figure 5A, right), Rac1, and
268
Lysophosphatidic acid (LPA6) signaling. In each of these cases, the silhouette z-score exhibited
269
no clearly defined peak and remained elevated compared to the null distribution, even as the
270
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
8
number of clusters was increased (Figure 5B, left; Figure 5–figure supplement 1AB). Non-
271
recurrent pathways may allow cells to fine tune a pathway to highly individualized requirements
272
of each cell type. For example, in the CXCR4 or LPA6 pathways, this mechanism could allow
273
each cell state to respond with a distinct amplitude and specificity to different sets of cytokines
274
or LPA variants. These results indicate that some pathways have a non-recurrent structure
275
dominated by private profiles.
276
Under our null hypothesis, signaling pathways were compared against a distribution of pseudo-
277
pathways composed of randomly selected genes across the transcriptome (Figure 5B,
p
-
278
values). We noted that this null distribution could underestimate the signal of the silhouette
279
score since randomly selected genes exhibit different expression statistics compared to real
280
pathways. Comparing against other randomized controls could increase the signal-to-noise for
281
some pathways.
282
Notch, Eph, and Wnt pathways exhibit dispersed expressio
n motifs.
283
Notch Signaling
284
W
e next asked whether other developmental signaling pathways similarly exhibited
285
combinatorial expression patterns with recurrent, dispersed profiles. Based on their status as
286
core signaling pathways and their recurrence scores (Figure 5B), we focused on Notch, Eph-
287
ephrin, and Wnt.
288
In contrast to TGF-β and Wnt, which both use secreted ligands, the Notch pathway involves
289
juxtacrine interactions between a set of membrane anchored ligands, including Dll1, Dll4, Jag1,
290
Jag2, and the cis-inhibitor Dll3, and a set of four Notch receptors, Notch1-4 (Artavanis-
291
Tsakonas et al., 1999; D’Souza et al., 2008; Siebel & Lendahl, 2017). Further, a set of three
292
Fringe proteins (M-, R-, and L-Fng) modulates cis and trans ligand-receptor interaction
293
strengths, both between adjacent cells (trans) as well as within the same cell (cis) (Kakuda et
294
al., 2020; Kakuda & Haltiwanger, 2017). We therefore defined a minimal Notch pathway
295
comprising 11 ligands, receptors, and Fringe proteins (Figure 5C). This definition excludes
296
ADAM family metalloproteases, γ-secretase, the CSL complex, and other components, in order
297
to focus specifically on ligands, receptors, and the Fringe proteins that directly modulate their
298
interactions, all of which exist in multiple variants. We classified pathway expression as “on” if at
299
least 2 of these genes were expressed above a minimum threshold of 20% of the maximum
300
observed expression level across all cell types. With these criteria, the Notch pathway was “on”
301
in 37% of cell states (450 out of 1200) (Figure 2C).
302
As with TGF-β, the Notch pathway exhibited combinations of co-expressed components,
303
including receptors, ligands and Fringe proteins (Figure 5C). The pathway exhibited a peak
304
Silhouette score at ~31 cell clusters (Figure 5–figure supplement 1A), 16 of which qualified as
305
motifs based on their dispersion scores (Figure 5-figure supplement 2, Figure 5C).
306
These profiles agreed with previous observations. For example, B cells (Notch motif 19) are
307
known to express the Notch2 receptor and no ligands (Saito et al. 2003; Yoon et al. 2009). The
308
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
9
combination of Notch1, Notch2 and Jag1 was prevalent, occurring in most of the motifs, which
309
were distinguished by expression of other components (Figure 5C). Nevertheless, even among
310
motifs expressing both Notch1 and Notch2, the ratio of the two receptors varied (compare Notch
311
motifs 19 and 28, Figure 5C). Among the Fringe proteins, R-fng was expressed in all motifs,
312
while L-fng and M-fng were restricted to a limited subset (Figure 5C). Nearly all motifs, with the
313
exception of motif 26, which is expressed in cell types that comprise the blood vessels, co-
314
expressed both ligands and receptors. Notch ligands and receptors are known to exhibit
315
inhibitory (cis-inhibition) and activating (cis-activation) same-cell interactions that can generate
316
complex interaction specifiities with other cell types expressing similar or different ligand and
317
receptor combinations. The prevalence of multi-component Notch motifs could help explain
318
complex Notch behaviors with the potential to send or receive signals to or from specific cell
319
types (del Álamo et al., 2011; LeBon et al., 2014; Li & Elowitz, 2019; Nandagopal et al., 2019).
320
In addition to its expression motifs, Notch also exhibited a smaller set of ‘private’ expression
321
profiles limited to closely related cell types (Figure 5–figure supplement 3A). Private motifs were
322
used by muscle cells during forelimb development (profile 25), basal cells of the mammary
323
gland (profile 21), mesodermal lineages at E7.0-E8.0, and the adult endothelium (profile 8). The
324
private profiles exhibited greater expression of M-fng, and the Delta family ligands Dll1, 3, and 4
325
compared to the motifs (Figure 5–figure supplement 3A). Taken together, these results reveal
326
that the Notch pathway uses a set of recurrent and dispersed combinatorial expression motifs,
327
as well as private expression profiles in some lineages.
328
Eph-ephrin signaling
329
T
he most recurrent core signaling pathway in our panel was Eph-ephrin (Figure 5B, rightmost
330
blue point), another juxtacrine signaling pathway that plays key roles in development, including
331
tissue boundary formation, axon guidance, bone development, and vasculogenesis, among
332
many other processes (Arthur & Gronthos, 2021; Cramer & Miko, 2016; Kania & Klein, 2016;
333
Klein, 2012). Eph-ephrin signaling has also been implicated in numerous cancers (Astin et al.,
334
2010; Merlos-Suárez & Batlle, 2008). The pathway implements juxtacrine communication
335
bidirectionally between adjacent cells through combinations of Eph receptors and ephrin
336
ligands, which are grouped into A and B families based on the specificity of their signaling
337
interactions. Like Notch, Eph-ephrin interactions occur both in
cis
and in
trans
, and can also
338
involve the formation of multi-component clusters (Dudanova & Klein, 2011). Furthermore,
339
since the same ephrin ligand signaling through different Eph receptors can produce different
340
and even opposite physiological responses (Seiradake et al., 2013), these features are
341
consistent with the idea that component combinations could dictate signaling specificity.
342
Here, we tabulated the expression of 11 Eph variants and 8 ephrin variants, spanning both type
343
A and B families (19 genes total). Silhouette analysis revealed a broad peak with a maximum at
344
54 clusters for the combined Eph-ephrin pathway (Figure 5–figure supplement 1A). Strikingly, all
345
of these clusters exhibited co-expression of multiple Eph and ephrin variants (Figure 5D and
346
Figure 5–figure supplements 2B, 3B). While Ephs and ephrins were generally not expressed in
347
blood cell types (Figure 5–source data 2), they were broadly expressed in many others (Figure
348
5D). The Eph receptor expression profiles were also broadly distributed across these cell states,
349
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
1
0
generating a set of motifs (Figure 5D). Inspection of the motifs revealed highly combinatorial
350
expression patterns, co-expressing 3.67±1.88 and 2.89±1.23 Eph and ephrin variants,
351
respectively, and nearly always expressing components from both A and B families. As with
352
TGF-β and Notch, individual motifs often occurred in multiple organs and, conversely, individual
353
organs often contained multiple motifs (Figure 5D, right). However, tissue coverage was more
354
sparse than the other two pathways, possibly reflecting the greater number of distinct motifs
355
(Figure 5D, left). These observed motifs agree with established signaling interactions observed
356
in vivo
. For example, an EphB4-EfnB2 signaling complex is known to regulate vasculature
357
formation and maintenance in developing and adult mice (Salvucci & Tosato, 2012). Endothelial
358
cells (motifs 24 and 47) notably co-expressed these components, in addition to other Eph
359
receptor and ephrin ligand components.
360
The pathway also exhibited private profiles, which notably co-expressed a greater number of
361
distinct components than the motifs (Figure 5–figure supplement 3B). Private profiles appeared
362
in a variety of developmental tissues (profiles 17, 10, 7, 2, 1, and 14), as well as adult cell types
363
(Figure 5–source data 2). Together, these results indicate that Eph-ephrin components are
364
expressed in a combinatorial fashion with a mixture of motifs and private profiles, each broadly
365
distributed across embryonic and adult tissues.
366
Wnt Signaling
367
F
inally, as a fourth signaling pathway, we also analyzed Wnt, which plays critical roles in a vast
368
range of developmental and physiological processes. Wnts can function as morphogens and
369
are involved in regeneration, cancer, and disease (Grigoryan et al., 2008). Extracellular
370
interactions between Wnt ligand and receptor variants exhibit promiscuity, with each ligand
371
typically interacting with many receptor variants (Voloshanenko et al., 2017). Signaling involves
372
Wnt ligands binding to Frizzled (Fzd1-10) receptors and low-density lipoprotein related co-
373
receptors 5/6 (LRP5/6) to stabilize β-Catenin, allowing it to activate transcription of target genes
374
(Goentoro & Kirschner, 2009; MacDonald & He, 2012; Mikels & Nusse, 2006). Wnt signaling
375
has also been shown to have combinatorial features (Buckles et al., 2004).
376
The recurrence score for Wnt was slightly less than that of TGF-β and nitric oxide signaling
377
(Figure 5B, red asterisks). Nonetheless, the pathway exhibited recurrent profiles. Silhouette
378
score analysis showed a peak elevation at
푘
௧
= 30
profiles, similar to TGF-β, and was
379
elevated compared to a null model of randomly scrambled pathways constructed from the same
380
genes (Figure 5–figure supplement 1A). Strikingly, these profiles all exhibited co-expression of
381
multiple Fzd variants, and all but two co-expressed both the Lrp5 and Lrp6 co-receptors (Figure
382
5–figure supplement 2C).
383
A subset of Wnt pathway expression profiles were broadly dispersed (Figure 5–figure
384
supplement 3D). All of these high dispersion profiles co-expressed multiple Frizzled variants
385
(Figure 5–figure supplement 3D). Conversely, most Frizzled variants were expressed in multiple
386
high dispersion profiles. The exceptions were Fzd9 and Fzd10, which were expressed at much
387
lower levels in most cell types, although Fzd9 was highly expressed in profile 28, along with
388
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
1
1
other receptors (Figure 5–figure supplement 3C). These results show that the Wnt pathway also
389
exhibits combinatorial expression motifs.
390
Inter-pathway correlations reveal independent profile us
age
391
I
dentifying combinatorial expression profiles in multiple pathways provokes the question of
392
whether component configurations are correlated between pathways. For example, in the limit
393
of tight coordination, cells expressing one TGF-β profile might always express a corresponding
394
Notch profile. In the opposite limit, profiles from one pathway might be used independently of
395
those from another pathway, suggesting a more mosaic cellular organization.
396
To quantify the correlation between expression profiles of different pathways, we computed the
397
pairwise adjusted mutual information (AMI) between the profile labels of each pair of pathways
398
across all cell types (numbers, Figures 2A, Figure 5–figure supplement 1A-C). The AMI metric
399
quantifies the degree of statistical dependence between the two clusterings, controlling for
400
correlations expected in a null, or completely independent, model. The full dataset of 1206 cell
401
states was used for computing the pairwise AMI, assigning the profile label ‘0’ to cell states that
402
do not express a given pathway. We visualized the results with a heatmap showing the pairwise
403
AMI values across the main recurrent pathways (Figure 5E).
404
In general, most pathway-pathway correlations were weak (AMI < 0.4) (Figure 5E). To ensure
405
that the AMI was indeed capable of capturing correlations, we included a subset of the TGF-β
406
receptors (the 7 BMP receptors) as a separate pathway (“BMP receptors”). Given their
407
overlapping components, TGF-β and BMP showed elevated AMI values of ~0.6, as expected
408
(Figure 5E). A notable exception was the strong correlation between the Ubiquitin-Proteasome
409
pathway and SRSF splice regulators, which arose predominantly from developmental cell states
410
expressing Ubiquitin-Proteasome profile 1 with SRSF profiles 1 and 2 (Figure 5–source data 2).
411
Other pathway pairs, consisting of TGF-β, Wnt, or Eph-ephrin exhibited weaker relationships,
412
whereas the Notch pathway showed little correlation with almost all other pathways. These
413
results suggest that, at least for the limited set of components considered here, different
414
pathways seem to adopt profiles largely independently of one another.
415
Pathway profiles exhibit distinct dynamic behaviors during
differentiation
416
T
he relative independence of profile selection between pathways provokes the dynamic
417
question of when and how pathways switch profiles during development. At one extreme,
418
profiles could switch in a stepwise fashion, changing one component at a time. At the opposite
419
extreme, they could change multiple components simultaneously, directly switching from one
420
profile to another. Further, either type of change could occur gradually or suddenly, and could
421
be temporally synchronized or unsynchronized between different pathways.
422
Neural crest differentiation provides a well-characterized developmental process to address
423
these questions. The neural crest is responsible for diverse cell types, including sensory
424
neurons, autonomic cell types, and mesenchymal stem cells (Kléber et al., 2005; Simões-Costa
425
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
1
2
& Bronner, 2015). Further, TGF-β, Notch, Eph-ephrin, and Wnt, all play key roles in its
426
differentiation (Bhatt et al., 2013).
427
Soldatov et al. performed deep scRNA-seq analysis of neural crest development from
428
embryonic day 9.5 cells using SMART-seq2 (Soldatov et al., 2019). We used the Slingshot
429
package (Street et al., 2018) to construct pseudotime trajectories from these data and further
430
identified 7 distinct pseudotime stages (Figure 6A). All expression counts were scaled to match
431
the normalization used in the integrated atlas (Figure 2, Methods). This reconstruction
432
recapitulated known cell fate trajectories, with neural crest progenitors differentiating into
433
sensory neurons, autonomic neurons, and mesenchymal cells (Figure 6A). Except for a
434
transient upregulation of Bmpr1b early on, the TGF-β profile was remarkably stable during the
435
trajectory from progenitors to more differentiated cell types. The profile was dominated by
436
Bmpr1a, Tgfbr1, Acvr2a, and Acvr2b (Figure 6B, first panel), closely matching profile 6 (Figure
437
3A), which occurs in the developing forebrain and spinal cord, adult mesenchymal, and adult
438
podocyte cell types. This profile is potentially functional, as TGF-β pathway inhibition in neural
439
crest stem cells leads to cardiovascular defects (Wurdak, 2005). These results indicate that a
440
developmental pathway can retain a stable profile along a differentiation trajectory.
441
In contrast to the stability of TGF-β along this trajectory, Notch components exhibited a step-like
442
transition at the end of the pseudotime trajectory (Figure 6B, second panel). Progenitors
443
predominantly express the receptors Notch1 and Notch2; the ligands Dll1 and Jag1; and high
444
levels of Rfng. This profile resembles Notch motif 16 (Figure 5C). Upon differentiation into
445
sensory neurons, they switch on expression of Notch1, Dll3, and Mfng, as well as a lower level
446
of Jag2, while down regulating Notch2, thus changing to private profile 27 (Figure 5C).
447
Consistent with this analysis, profile 27 was independently derived from neural crest cells in the
448
integrated data set (Figure 5–source data 2). A similar pattern of discrete change also occurred
449
in the Wnt pathway, where expression shifted in ~2 steps from profile 11 to profile 10 (Figure
450
6B, fourth panel). Thus, the transition to the sensory neural fate involves an abrupt multi-gene
451
alteration of Notch and Wnt pathway components, neither of which was synchronized with
452
changes in TGF-β.
453
By contrast, the dynamics of the Eph-ephrin pathway were more complex and gradual, with
454
changes occurring in the expression of individual receptors at nearly every pseudotime stage.
455
Eph-ephrin expression initially resembled profile 19 (Figure 5-figure supplement 2B), then
456
switched more gradually to profile 11, before diverging slightly from it in the last pseudotime
457
point (Figure 6B, third panel). Collectively, these results show that during neural crest
458
development, different pathways can exhibit both stability and multi-step changes in their
459
expression profiles.
460
As a second case, we analyzed hematopoiesis, which occurs in temporally and spatially
461
overlapping waves in close proximity to blood vascular endothelial cells (Canu & Ruhrberg,
462
2021). Mesodermal hematoendothelial progenitors differentiate into both endothelium and
463
erythroid cells (E7.5-E8.5), allowing analysis of how pathway profiles change during a branched
464
differentiation trajectory (Figure 6C). Endothelial cells exhibit ‘private’ TGF-β profiles,
465
characterized by expression of ACVRL1. Thus, this process provides an opportunity to analyze
466
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
1
3
how pathway profiles change during a branched transition and how private profiles are acquired
467
dynamically.
468
We clustered the subset of haemato-endothelial lineages from (Pijuan-Sala et al., 2019b)
469
(15,645 single-cells), applied Slingshot to reconstruct bran
ching pseudotime trajectories (Figure
470
6C), and then analyzed changes in TGF-β receptor expression profiles over these trajectories.
471
In contrast to its stability during neural crest differentiation, the TGF-β profiles exhibited
472
complex, dynamic changes during vascular differentiation. Mesodermal cells predominantly
473
express Bmpr1a, Acvrl1, Tgfbr1, Acvr2a and Acvr2b, and Acvr2b, resembling profile 5, which is
474
prevalent in early development (Figure 3A). Along the erythroid lineage, cells exhibited a
475
gradual reduction in expression of all TGF-β receptors. Similar decreases in expression were
476
also observed for receptors and ligands in other pathways (Figure 6D, upper row), and may
477
reflect preparations for the dramatic events of erythropoiesis. By contrast, cells differentiating
478
into endothelial fates maintained Bmpr1a and Acvr2b expression and additionally up-regulated
479
Acvrl1, an endothelial-specific BMP receptor known to mediate signaling by BMP9 and BMP10,
480
and required for angiogenesis (Tual-Chalot et al., 2014). Thus, while one lineage gradually turns
481
off receptor expression, the other activates a distinct endothelial specific receptor profile.
482
Looking more broadly at the four pathways during differentiation to endothelium, we see similar
483
themes as observed in the neural crest differentiation: unsynchronized transitions to different
484
profiles in different pathways. Together, these results show how pathways discretely and
485
independently alter their expression profiles during different developmental lineages.
486
Discussion
487
In multicellular organisms, a core set of molecular signaling pathways mediate a huge variety of
488
developmental and physiological events. How can such a limited set of pathways play such a
489
broad range of different roles? At a coarse level, each pathway may be considered competent
490
for signaling in a given cell type if its receptors and other components are expressed and not
491
inhibited by other cellular components. However, examining pathway expression patterns
492
globally, as we did here, reveals a more subtle situation, in which pathways can be expressed in
493
a finite number of distinct configurations, characterized by different expression levels for its
494
components, all potentially competent to signal in response to suitable inputs. Each
495
configuration could be functional in some contexts but nevertheless differ from other
496
configurations in the specific input ligands it senses, or the downstream effectors it activates
497
within the cell (Antebi, Nandagopal, et al., 2017; Buckles et al., 2004; Klumpe et al., 2020;
498
LeBon et al., 2014; Li & Elowitz, 2019; Rohani et al., 2014; Su et al., 2020; Verkaar & Zaman,
499
2010).
500
To find out what configurations exist, we focused on cell-cell signaling pathways known to use
501
sets of partially redundant component variants. Each of these pathways was already known to
502
adopt multiple expression configurations in specific biological contexts. However, cell atlas data
503
permit a systematic analysis of expression profiles in a broad set of cell and tissue contexts
504
(Figures 2-5), revealing what pathway profiles are expressed, how they correlate with one
505
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
1
4
another between pathways (Figure 5G), and how they change dynamically during aging and
506
development (Figure 6).
507
The expression profiles of pathways are strikingly combinatorial. Across each of the four major
508
pathways studied here, no two components exhibited identical expression patterns, and all were
509
differentially regulated in some cell types. Further, almost all motifs comprised multiple receptor
510
and/or ligand variants. The number of distinct expression profiles for each pathway was much
511
smaller than one would expect if individual components varied independently. For instance, the
512
Eph-ephrin pathway with 19 components exhibits ~54 profiles, which is less than two-fold
513
greater than the ~30 profiles observed for the 11 TGF-β receptors, and far less than the
514
2
19
=524,288 pathway profiles one would expect if each of its 19 genes could independently vary
515
between low and high expression states. Assuming that the pathway profile plays a key role in
516
controlling pathway function, this finding suggests that analysis of a limited number of profiles
517
could potentially explain pathway behavior in a much larger number of cell types.
518
Expression profiles for different pathways appeared to vary independently across cell types
519
(Figure 5G). This observation argues against tight coupling of specific expression receptor
520
profiles in one pathway with those in another. However, it does not rule out the possibility that
521
signaling through combinations of pathways could play special roles in some cases (Muñoz
522
Descalzo & Martinez Arias, 2012). Analysis of pseudotime trajectories also revealed that
523
different pathways sometimes switch among motifs in a punctuated manner, and largely
524
independently of one another. While we focused on the pathways that show strong motif
525
signatures, it is equally important to note that other pathways predominantly used cell type
526
specific, or private, profiles (Figure 5B), and even the pathways that we focused on here also
527
contained some private profiles. Nevertheless, these results suggest a “mosaic” view of cells, in
528
which each cell type adopts a particular motif or private profile for each of its general purpose
529
pathways (Figure 6E).
530
Why use motifs? Motifs could provide a rich but limited repertoire of distinct functional behaviors
531
for each pathway (Su et al., 2020). One appealing possibility is that each motif has a distinct but
532
related signaling function that is retained in some way even in different cell types or contexts.
533
For example, in a “combinatorial addressing” system, different ligand combinations could
534
selectively activate sets of cell types based on their receptor expression profiles, to achieve
535
greater cell type specificity in signaling (Klumpe et al., 2020; Su et al., 2020). A similar principle
536
could apply to juxtacrine signaling pathways such as Notch and Eph-ephrin, where the
537
combination of components expressed in a given cell type could control which other cell types it
538
can communicate with, based on their own pathway expression profiles. To test this possibility,
539
it will be important to determine what inputs each motif can respond to, and whether that
540
specificity is retained across different cell contexts.
541
Several limitations apply to the findings reported here. First, pathway definition starts with a
542
human-curated list of receptors, ligands, or other components or previously annotated pathway
543
definitions. Different pathway definitions could potentially alter these results. Second, while
544
comprehensive, the data sets used here are likely incomplete, and could miss profiles used only
545
by rare cell types or could inaccurately report expression levels for weakly expressed genes.
546
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
1
5
Third, clustering is an imperfect representation of expression variation, potentially averages over
547
subtle quantitative differences in individual component levels between cells. In particular,
548
unsynchronized single cell dynamics, such as those that occur during Notch-dependent fate
549
determination (Kageyama et al., 2018), could therefore be missed. Moreover, we explored
550
signaling dynamics in only a few developmental trajectories. A broader exploration of more
551
developmental processes could potentially reveal other types of dynamic behaviors beyond
552
those shown here. Finally, subcellular localization patterns, post-translational modifications,
553
alternative splice forms, and other types of regulation could diversify the functional modes of the
554
pathway beyond what can be detected by scRNA-seq. However, as single cell technology
555
continues to improve and expand to the protein level, we anticipate that it should be possible to
556
obtain more precise views of pathway states.
557
The combinatorial nature of pathways makes it infeasible to experimentally characterize all
558
possible configurations. Fortunately, however, a handful of motifs account for a large fraction of
559
cell types, potentially enabling one to understand most of the functional repertoire of a pathway
560
from a limited number of motifs and private profiles. While we focused on signaling here, the
561
approach could be applied more generally to non-signaling pathways, such as splice regulation
562
or protein degradation (Figure 5B). In the future, we anticipate that a functional understanding of
563
pathway motifs could enable one to predict and control the activities of pathways in cell types
564
based on their expression profiles.
565
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
1
6
Acknowledgements
566
We would like to thank Heidi Klumpe, Rachael Kuintzle, Matthew Langley, James Linton,
567
Benjamin Emert, Nicolas Pelaez-Restrapo, and other members of the Elowitz lab for
568
suggestions and critical feedback on this work, as well as critical feedback from Matt Thomson,
569
Kai Zinn, Yaron Antebi, and Miri Adler. This research was supported by the Allen Discovery
570
Center program under Award No. UWSC10142, a Paul G. Allen Frontiers Group advised
571
program of the Paul G. Allen Family Foundation, and by the National Institutes of Health grant
572
R01 HD075335A. N.K. was supported in the summer of 2020 by the Samuel N. Vodopia and
573
Carol J. Hasson SURF Endowment.
574
575
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
1
7
Figures
576
577
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
1
8
Figure 1. Pathway expression profiles could be distribute
d across cell types
578
in different ways (schematic)
579
A
. Cell-cell signaling pathways comprise multiple variants of key components such as
580
receptors (cartoons, R
n
). These variants can be expressed in different combinations in
581
different cell types. Colored dots identify receptor profiles for comparison with B.
582
B. Cell types can be arranged hierarchically based on similarities among their global
583
(genome-wide) gene expression profiles (dendrogram). A hypothetical signaling pathway
584
profile for each cell type is indicated by the gray intensity in the corresponding row of
585
squares. In principle, each cell type could have a unique signaling pathway profile
586
(unique, left); exhibit a smaller set of recurrent profiles, each used by a set of related cell
587
types (recurrent and clustered, middle); or exhibit signaling pathway profiles that recur
588
even among otherwise distantly related cell types (recurrent and dispersed, right). These
589
possibilities are not exclusive and it is possible that some pathways or subsets of cell
590
types might operate in different regimes.
591
592
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
1
9
Figure 2. Integration of scRNA-seq atlas data reveals w
idespread
593
expression of signaling pathway components
594
5
95
A. We integrated 14 published developmental and adult scRNA-seq datasets spanning
596
different stages in the mouse lifespan from embryonic development to old age. These
597
data sets differ in their representation of organs and cell type classes (colors).
598
B. To generate an integrated cell state atlas, we first independently clustered each scRNA-
599
seq dataset, treating distinct time-points in the data set separately (Methods). We then
600
averaged expression over all cells in each cluster to yield a “cell state” profile for that
601
cluster, and represented each cluster by a single dot in an integrated cell state atlas data
602
set (UMAP on right). Colors are consistent with the legend in (A). Notably, this
603
integration captures cell type similarity across different datasets and sequencing
604
technologies.
605
C. Components of core signaling pathways are broadly expressed. Black or gray dots show
606
clusters whose pathway components are expressed above or below threshold,
607
respectively.
608
609
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
2
0
Figure 2, Supplement 1
610
A
. Analysis of scRNA-seq datasets using the standard Scanpy pipeline recapitulates
611
published analyses, including (He et al., 2020). Independent analysis of mouse forelimb
612
over days E10.5-E15.0 shows similar cell types (colors, left) and gene expression (right).
613
B. The integrated atlas captures cell type similarity across datasets. Cell clusters with
614
similar annotations in different data sets remain similar to each other in the integrated
615
atlas.
616
C. Cluster-averaged profiles reflect co-expression in single-cells. Shown is an example of a
617
single cluster from the forelimb epithelial tissue data set at day E15. Left, expression of
618
TGF-β receptor genes averaged over all cells in the cluster corresponding to forelimb
619
epithelial tissue at day E15.0. Right, pairwise conditional probability in single cells of
620
gene 2 expression conditioned on gene 1 expression. Pairs of genes with significant
621
entries (**) are co-expressed in the cluster-averaged profile. Higher-order conditional
622
probabilities were not computed due to dropout effects in scRNA-seq data.
623
624
625
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
2
1
Figure 3. TGF-β receptors exhibit recurrent and dispersed
pathway
626
expression profiles.
6
27
A. Silhouette score analysis (Figure 3-figure supplement 2A) identified approximately 30
628
TGF-β receptor expression profiles, indicated as color-labeled groups of rows. Colored
629
arrows indicate examples of dispersed profiles highlighted on the global cell fate
630
dendrogram in B. Asterisks indicate private profiles, also shown in B. Dendrogram at left
631
represents similarity among different profiles. Each gene is standardized to a range of 0-
632
1 across all cell types (grayscale).
633
B. Distribution of TGF-β receptor expression profiles across cell types. The global cell type
634
dendrogram was computed using a cosine distance metric applied to the integrated
635
transcriptome data set in a 20-component PCA space constructed from 4,000 highly
636
variable genes (HVGs). Arrows indicate featured TGF-β profiles that are broadly
637
dispersed across cell types, while asterisks indicate examples of private profiles. Cell
638
types that do not express TGF-β receptors have no color (white). Colors match those in
639
A. Note that blood cell types are relatively lacking in expression of TGF-β receptors.
640
C. Key cell type classes, including epithelial, macrophage, fibroblast, and endothelial cell
641
types, each span multiple TGF-β profiles. The white bar (top right) indicates the non-
642
expressing profile. Profiles are ordered to maximize the similarity of adjacent profiles.
643
Each cell class mapped to multiple distinct pathway profiles, yet differed in their profile
644
diversity. For example, epithelial cells comprise a broad spectrum of 18 distinct profiles,
645
whereas macrophages and endothelial cells are primarily restricted to smaller subsets of
646
more closely related profiles. Inset, cell type composition of each TGF-β profile, where
647
“other” includes all cell states in the atlas that do not fall into the epithelial, macrophage,
648
fibroblast or endothelial cell types.
649
650
651
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
2
2
Figure 3, Supplement 1. Further analysis of TGF-β pathway
expression
652
profiles.
653
A
. Histogram showing the number of cell types in the integrated atlas with normalized
654
expression of TGF-β receptors above a threshold of 0.2 in standardized expression
655
units.
656
B. Number of TGF-β receptor components simultaneously expressed for different values of
657
the minimum expression threshold (colors).
658
C. Pairwise co-expression of TGF-β receptor expression reveals broad receptor co-
659
expression patterns. Off-diagonal elements indicate the number of cell states co-
660
expressing, above threshold, the indicated pair of components. Diagonal elements
661
indicate the number of cell states expressing the corresponding individual gene.
662
663
664
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
2
3
Figure 3, Supplement 2. Silhouette analysis can be use
d to identify optimal
665
clustering thresholds.
666
A
. The silhouette score quantifies clustering quality (schematic). For a given clustering, we
667
compute the silhouette score on every data point
푖
. We compute
푎(푖)
, the mean distance
668
between
푖
and every other point in the same cluster, and
푏(푖)
, the mean distance
669
between
푖
and the nearest neighboring cluster. The silhouette score for data point
푖
is
670
then defined as the difference between the inter- and intra-cluster distances, normalized
671
to the maximum of the two (equations). A silhouette score value close to 1 corresponds
672
to well-defined clusters, where data point
푖
is similar to other members of its cluster and
673
dissimilar to other clusters, while a value close to -1 suggests poor cluster assignment.
674
The silhouette score for a given clustering, is taken as the average of the individual
675
scores for all data points.
676
B. The silhouette score identifies the approximate number of unique TGF-β receptor
677
expression profiles. We computed the silhouette score across expression values of the
678
pathway genes (black), as well as for 100 random gene sets (gray) where pathway gene
679
expression was independently scrambled for each gene. We then computed the z-score
680
(blue), defined as the silhouette score for pathway genes normalized to the silhouette
681
score for randomized gene sets. We defined the optimal number of receptor profiles
푘
௧
682
as the number of clusters that produced the peak z-score value (dashed line).
683
684
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
2
4
Figure 4. TGF-β expression motifs are dispersed across cell types
and
685
organs.
686
A
. We defined the dispersion of a receptor expression profile to be the within-class pairwise
687
distance computed in a 100 dimensional PCA space constructed from the top 4,000
688
highly variable genes (HVGs) (left). Dispersed profiles (black) show high cell type
689
diversity, whereas non-dispersed profiles (gray) are closer together in PCA space.
690
B. The dispersion of actual TGF-β expression profiles. Dashed lines indicate the range of
691
dispersions obtained for scrambled profiles. Note the large number of profiles with larger
692
dispersions than expected from random profiles.
693
C. Empirical cumulative distribution functions of TGF-β profile dispersion. The observed
694
dispersion distribution (turquoise) lies between the extremes of cell type-specific profiles
695
(gray) and profiles obtained by randomizing cell type distances by shuffling cell type
696
labels (black). We classified motifs in the shaded region, defined as being in at least the
697
90th percentile of the related cell type dispersion distribution (gray) as motifs.
698
D. We identified 14 TGF-β motifs, displayed in ranked order of dispersion from most (top) to
699
least (bottom) dispersed. For each motif, the number of cell states in which it appears is
700
indicated by the histogram at right.
701
E. TGF-β motifs (rows) are broadly distributed across different tissues and organs
702
(columns). Each matrix element represents the number of cell states in the indicated
703
tissue or organ expressing the corresponding motif. Note that most motifs are expressed
704
in multiple tissues or organs and most tissues or organs contain multiple motifs.
705
706
707
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
2
5
Figure 4, Supplement 1. Pairwise correlations among T
GF-β receptors and
708
identification of private profiles.
709
A
. TGF-β profiles exhibit unique pairwise receptor correlations. Each matrix represents the
710
correlation coefficient for each pair of receptors across all cell states (left), cell states
711
associated with motifs (middle), cell states associated with private profiles (right).
712
B. TGF-β profiles with less than 30 percentile cell type dispersion were classified as
private
713
profiles
. We identified 5 such profiles for TGF-β. Profiles 1, 2, and 5 come from
714
developmental states, while 29 and 30 represent adult cell types.
715
C. Alternative definitions of the dispersion metric recover similar sets of motifs. The mean of
716
intra-class pairwise distances was used as the dispersion metric throughout this work,
717
but we tested two additional dispersion metrics, one that uses the maximum of intra-
718
class pairwise distances, the second that uses the top 10th percentile. The Venn
719
diagram shows profiles identified as motifs from these three distinct definitions of the
720
dispersion metric. The majority of profiles (shown in the intersection of the three circles)
721
are robust to the definition of dispersion. Notably, the dispersion metric that utilizes the
722
maximum of pairwise distances only captures profiles in this intersection. The mean
723
pairwise distance, however, captures two additional profiles as motifs, profiles 21 and
724
24. Profile 24 contains only two cell states, liver B cells and bone marrow NK cells. The
725
top 10th percentile of pairwise distances captures the adult endothelium-specific profile,
726
25, as a motif. However, the maximum metric omits profiles 13 and 15, even though they
727
appear to be motifs, since they are both dispersed across the adult smooth muscle and
728
adult kidney epithelium.
729
730
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
2
6
Figure 5: Expression motifs occur in multiple pathways.
731
A
. In order to classify a pathway as cell type-specific or recurrent, we compared the number
732
of distinct profiles for a pathway (blue line) against a null distribution of the numbers of
733
distinct profiles identified in random gene sets (black line). We computed these null
734
distributions specific to the number of components in a pathway to avoid confounding
735
the number of distinct profiles with pathway size, i.e., we would expect more
736
combinatorial profiles for a pathway containing more genes. Left: examples of recurrent
737
pathways (TGF-β and SRSF splice regulators), which have fewer clusters than expected
738
from the null distribution. Right: example of pathway with more clusters than expected
739
from the null distribution.
740
B. Deviations of pathways from random gene sets. We curated 56 gene sets from the
741
PathBank database and generated corresponding null distributions, analyzing each
742
pathway for cell type-specific or recurrent behavior as in A. We normalized the number
743
of identified clusters to the number of pathway components and computed the deviation
744
of this ratio from the null distribution (y-axis). Negative deviations show that a signaling
745
pathway has fewer clusters than expected for a given pathway size, indicating
746
recurrence. By contrast, positive deviations occur when there are more clusters than
747
expected, indicating strong cell type specificity. Pathways with significant deviations from
748
the null distribution (adjusted p-value < 0.05) are highlighted in blue. Red asterisks
749
indicate recurrent pathways that have strong, but not statistically significant, deviation
750
from the null distribution.
751
C. Motifs in the Notch pathway and their distribution across tissues and organs, similar to
752
Figure 4D,E.
753
D. Motifs in the Eph-ephrin pathway and their distribution across tissues and organs, similar
754
to Figure 4D,E.
755
E. Correlations in profile usage between pathways were quantified by the adjusted mutual
756
information between their respective profile labels.
757
758
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
2
7
Figure 5, Supplement 1. Silhouette profiles for various
pathways.
759
A
. Silhouette analysis of indicated pathways, as in Figure 3-figure supplement 2B.
760
B. Gene set null distributions for various pathways, as in Figure 5A.
761
762
763
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
2
8
Figure 5, Supplement 2. Pathway profiles for Notch, Ep
h-ephrin, and Wnt
764
receptor receptors.
765
A
-C. For each pathway, all pathway profiles are indicated with corresponding labels, as in
766
Figure 2A.
767
768
769
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
2
9
Figure 5, Supplement 3. Pathway component prevalence an
d private
770
profiles for Notch, Eph-ephrin, and Wnt pathways.
771
A
-C. Left: Histogram showing the number of cell types in the integrated atlas with normalized
772
expression of Notch (A), Eph-ephrin (B), or Wnt (C) components above a threshold of 0.2 in
773
standardized expression units. Center: Pairwise co-expression analysis of indicated pathway
774
components. Off-diagonal elements indicate the number of cell states co-expressing, above
775
threshold, the indicated pair of components. Diagonal elements indicate the number of cell
776
states expressing the corresponding individual gene. Right: Private profiles for each pathway.
777
Each profile is shown alongside the number of cell states in which it appears (histogram, far
778
right).
779
780
D. Wnt pathway motifs and their distribution across tissues and organs. These plots are similar
781
to Figure 4D,E and 5C,D but for the Wnt pathway.
782
783
784
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
3
0
Figure 6. Developmental transitions of pathway profiles.
785
A
. Pseudotime trajectory analysis of the trunk neural crest (Soldatov et al., 2019) captures
786
delamination of progenitors into three distinct cell fates in a ForceAtlas projection:
787
sensory neurons, autonomic neurons, and the mesenchyme. Here, we follow the
788
sensory neuron trajectory (black arrow).
789
B. Developmental pathways show distinct expression dynamics in neural crest
790
differentiation. For each pathway, corresponding mean expression profiles are shown in
791
grayscale for each of the cell states indicated in A. Colored dots indicate which
792
populations are being averaged. Profile numbers indicate the closest match to one of the
793
reference pathway profiles shown in Figures 3A and 5-figure supplement 2. Two
794
numbers are indicated for profiles that are approximately equally similar to the
795
corresponding reference profiles.
796
C. In early vascular differentiation (Pijuan-Sala et al., 2019b), mesodermal progenitors
797
differentiate into endothelial and erythroid cell fates (gray arrows in ForceAtlas
798
projection).
799
D. Dynamics of four core pathways for each of the two trajectories in C: erythroid
800
differentiation (upper row of heat maps) and endothelial differentiation (lower row).
801
Colored dots indicate cell populations in C. Profile numbers indicate closest matches in
802
reference profiles (Figure 3A, Figure 5-figure supplement 2).
803
E. Mosaic view of profile usage (schematic). Cell states can express each of their
804
pathways, using any of the distinct available profiles (indicated schematically by profile
805
ticks). In this way, cell states can be thought of, in part, as mosaics built from
806
combinations of available pathway profiles.
807
808
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
3
1
Methods
809
Clustering single cells and defining cell states
810
W
e obtained raw scRNA-seq matrices directly from the GEO repositories or specific locations
811
indicated by the authors for the data sets appearing in the table below. Clustering of single cells
812
started from the count matrices of single cells vs genes. First, we applied quality control (when
813
needed, since some datasets were already filtered) by filtering out cells with high mitochondrial
814
RNA content, low number of detected transcripts or low number of detected counts. We then
815
applied a standard pipeline for clustering scRNA-seq data. Briefly, we applied principal
816
component analysis and used the first 50 principal components as input for graph-based
817
(Leiden) clustering using Scanpy (Traag et al., 2019; Wolf et al., 2018). Finally, we labeled the
818
resulting clusters using the cell type annotations provided by the authors. All datasets analyzed
819
in this study included ground truth cell type annotations that we use throughout the manuscript.
820
All raw and processed data, along with scripts, are available at . Code can be found at
821
https://github.com/nkanrar/motifs.git.
822
Table 1. Single-cell data sets used in this work
823
824
Dataset
Time points Reference
Cells
Mice
sampled
Technology
Forelimb atlas (The
c
hanging mouse
embryo transcriptome at
whole tissue and single-
c
ell resolution)
E10.5, E11.0,
E11.5, E12.0,
E13.0, E13.5,
E14.0, E15.0
(He et al., 2020)
90,637 Pair of
forelimbs
per time
point
10X
A single-cell molecular
ma
p of mouse
gastrulation and early
organogenesis
E6.5, E6.75,
E7.0, E7.25,
E7.5, E7.75,
E8.0, E8.25,
E8.5
(Pijuan-Sala et al.,
2019b)
116,312 411 mouse
embryos
10X
The emergent
l
andscape of the mouse
gut endoderm at single-
cell resolution
E5.5
(Nowotschin et al.,
2019)
-
-
10X
Single-cell RNA-seq
a
nalysis unveils a
prevalent
epithelial/mesenchymal
hybrid state during
mouse organogenesis
E9.5-E11.5 (Dong et al., 2018)
1916 7 embryos Smart-seq2
Epigenetic regulator
f
unction through mouse
gastrulation
E6.5, E7.0,
E7.5, E8.0,
E8.5
(Grosswendt et al.,
2020)
88,779 50 embryos
10X
Tabula muris and
T
abula muris senis
1mo, 3mo,
18mo, 21mo,
24mo, 30mo
(Tabula Muris
Consortium, 2020;
Tabula Muris
450,000+
-
10X, Smart-
s
eq2
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint
3
2
Consortium et al., 2018)
Integration of multiple datasets
825
T
o integrate the above datasets into a single matrix of gene expression, we first generated a
826
pseudo-bulk expression matrix for each dataset by averaging the log-normalized gene
827
expression values of individual cells in a cluster. The resulting matrix has dimensions N x M,
828
where N is the number of cell states in the dataset and M is the number of distinct genes. To
829
account for differences in gene detection across datasets, we found the intersection of detected
830
genes across all datasets and subsampled each matrix to include only genes that appeared in
831
all data sets. The intersection of detected genes across all datasets comprised ~11,000 genes
832
that we then used for all downstream analysis. Having defined the intersection gene set, we
833
concatenated individual datasets into a global average expression matrix containing 1206
834
clusters and ~11,000 genes.
835
To normalize gene expression values from different datasets to a common scale, we applied a
836
second round of normalization to the global expression matrix. First, we transformed the log-
837
normalized matrix using the exponential function to obtain a matrix
푀
of “counts” per gene:
838
푀′
= 푒푥푝(푀
) + 1
.
We then normalized, scaled and clustered the resulting ma
trix following the
839
standard methods from Seurat v3 (total RNA counts per cell state = 1e4, 4,000 highly-variable
840
genes and 50 principal components), which resulted in the clustering and UMAP shown in
841
Figure 2. We verified that cell states from different datasets and sequencing technologies
842
clustered together (Figure 2B), as an indication that the integrated and normalized UMAP
843
recovers the biological diversity across development, adult and aging datasets.
844
Clustering pathway expression profiles across cell states
845
A
ll downstream analysis on pathway genes starts from the normalized pseudo-bulk gene
846
expression matrix described above. We noticed that pathway genes showed different dynamic
847
ranges in their expression across cell states. To give each pathway gene equal weight during
848
clustering of pathway profiles, we applied a MinMax scaling for each gene, using the 95%
849
percentile observed across all 1206 cell states as the maximum value. After scaling, each gene
850
in the pathway had a dynamic range from 0 to 1, corresponding to the range of 0-95% of the
851
maximum value in the data set for that gene. For each cell state, we classified a pathway as
852
being “on” if at least two of the pathway genes showed expression above a threshold of 0.2 on
853
this scale, meaning that the gene is expressed at a level of at least 20% of its maximum
854
observed value. This threshold allowed us to filter out cell states in which all genes in the
855
pathway are zero or showed low expression compared to most other cell states, and focus
856
instead on the cell states showing combinatorial expression of multiple genes (Figure 2–figure
857
supplement 1B, C). We computed all pairwise cosine distances between cell states with an “on”
858
pathway profile, considering only the pathway genes, and applied hierarchical clustering to the
859
resulting distance matrix (Figure 3A).
860
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 22, 2022.
;
https://doi.org/10.1101/2022.08.21.504714
doi:
bioRxiv preprint