of 25
Multi-modular structure of the
gene regulatory network for
speci
fi
cation and commitment
of murine T cells
Boyoung
Shin
*
and Ellen V.
Rothenberg
*
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena,
CA, United States
T cells develop from multipotent progenitors by a gradual process dependent on
intrathymic Notch signaling and coupled with extensive proliferation. The stages
leading them to T-cell lineage commitment are well characterized by single-cell
and bulk RNA analyses of sorted populations and by direct measurements of
precursor-product relationships. This process depends not only on Notch
signaling but also on multiple transcription factors, some associated with
stemness and multipotency, some with alternative lineages, and others
associated with T-cell fate. These fa
ctors interact in opposing or semi-
independent T cell gene regulatory network (GRN) subcircuits that are
increasingly well de
fi
ned. A newly comprehensive picture of this network has
emerged. Importantly, because key factors in the GRN can bind to markedly
different genomic sites at one stage than they do at other stages, the genes they
signi
fi
cantly regulate are also stage-speci
fi
c. Global transcriptome analyses of
perturbations have revealed an under
lying modular structure to the T-cell
commitment GRN, separating decisions to lose
stem-ness
from decisions to
block alternative fates. Finally, the updated network sheds light on the intimate
relationship between the T-cell program, which depends on the thymus, and the
innate lymphoid cell (ILC) program, which does not.
KEYWORDS
gene regulatory network, early T cell development, transcription factors, gene expression
program, cell fate decision, thymus, epigenetic control, multipotency
1 Introduction
Conventional T cells provide lifelong pro
tection against infection and cancer by
recognizing their cognate antigens and mediating effector functions. To ensure that the
host can exert various immune responses in a context-speci
fi
c manner, T cells have extensive
diversity in their sub-lineages. The unique properties of T cells stem from the intersections of
functional similarities with different types of immune cells (
1
,
2
). Both T and B cells utilize
antigen receptors whose diversities are achieved by DNA recombination and selection
mechanisms. Distinct from B cells though, T cells can differentiate to various subsets
Frontiers in
Immunology
frontiersin.org
01
OPEN ACCESS
EDITED BY
Mihalis Verykokakis,
Alexander Fleming Biomedical Sciences
Research Center, Greece
REVIEWED BY
Barbara L. Kee,
The University of Chicago, United States
Avinash Bhandoola,
National Institutes of Health (NIH),
United States
*CORRESPONDENCE
Boyoung Shin
boyoung@caltech.edu
Ellen V. Rothenberg
evroth@its.caltech.edu
SPECIALTY SECTION
This article was submitted to
T Cell Biology,
a section of the journal
Frontiers in Immunology
RECEIVED
26 November 2022
ACCEPTED
11 January 2023
PUBLISHED
31 January 2023
CITATION
Shin B and
Rothenberg EV (2023)
Multi-modular structure of the gene
regulatory network for speci
fi
cation
and commitment of murine T cells.
Front. Immunol.
14:1108368.
doi: 10.3389/fimmu.2023.1108368
COPYRIGHT
© 2023 Shin and Rothenberg. This is an
open-access article distributed under the
terms of the
Creative Commons Attribution
License (CC BY).
The use, distribution or
reproduction in other forums is permitted,
provided the original author(s) and the
copyright owner(s) are credited and that
the original publication in this journal is
cited, in accordance with accepted
academic practice. No use, distribution or
reproduction is permitted which does not
comply with these terms.
TYPE
Review
PUBLISHED
31 January 2023
DOI
10.3389/fimmu.2023.1108368
showing functional parallelism with distinct types of helper-like
innate lymphoid cells (ILCs) and natural killer (NK) cells, which
lack recombined antigen receptors but produce effector cytokines
and/or perform cytolytic functions in response to environmental
signals. In addition, T cells poss
ess proliferative potential and
generate self-renewing long-lived populations, which are features
shared with the multipotent hematopoietic progenitor cells and
stem cells, respectively.
A recent global comparison of a wide range of hematopoietic cell
types found that lineage-speci
fi
c transcription factor motif
signatures
distinguished the active chromatin patterns for nearly
every major analyzed hematopoietic lineage, suggesting the impacts of
distinct lineage-determining transcription factors, but that T-lineage
cells notably failed to show any cell type-speci
fi
c
signature
(
3
). How,
then, is the T-cell identity established and robustly maintained,
despite functions broadly overlapping with those of other immune
cells? We propose that this distinctive positioning as T cells can be
supported by combinatorial actions of transcription factors, instead of
relying on a lineage-speci
fi
c
master regulator
. T cells utilize many
transcription factors that are commonly employed by other types of
hematopoietic cells for their own respective lineage speci
fi
cation and
function. However, by precisely regulating combinatorial actions of
these transcription factors over time under the in
fl
uence of Notch
signaling, and possibly also using epigenetic chromatin changes to
inhibit reversibility, intrathymic hematopoietic precursors launch and
lock down the speci
fi
c T-lineage program. In this analytical review, we
bring together recent evidence that assembles a newly clear picture of
how this comes about.
2 Gene regulatory network
models as explanations of
developmental pathways
2.1 Gene regulatory networks
in development
Developmental progression to generate a specialized cell type
requires continuous, ordered changes in gene expression (
4
,
5
). As a
regulatory program, development must thus be distinct both from the
stable epigenetic mechanisms that maintain a mature cell type
s
identity (e.g., superenhancers) a
nd from random transcriptional
noise
, both of which have been much studied in cell lines (
6
,
7
).
While environmental signals are often essential to trigger and support
development, the unidirectional regulatory cascade that emerges
depends on the transcription factors that are present in the cell
receivingthesignalsandtheirimpactsonothertranscription
factors and cellular chromatin
states. The genes encoding
transcription factors and signa
ling components collectively
determine the regulatory state of the cell, and then the genes they
control (downstream genes) encoding effector molecules determine
the functional identity of a cell.
All these genes, whether they encode signaling receptors,
transcription factors, or effector proteins, are controlled by
cis
-
regulatory modules encoded in the genomic DNA, such as
enhancers, silencers, and insulators, largely
via
interactions with
trans
-acting sequence-speci
fi
c transcription factors (
8
,
9
).
Speci
fi
cally activated long noncoding RNAs (lncRNAs) can
contribute to chromatin states as well, although their actions are
still only characterized in a few cases (
10
13
). Therefore, the
important components driving development, the genes and their
regulatory modules, transfer information in a directional manner.
For example, transcription factor
X
binds to the regulatory elements
of target genes
Y
and
Z
, which in turn induces expression of other
regulatory factors,
Y
and
Z
. Importantly, the activities of these
cis
-
regulatory elements are never driven by single transcription factors
alone, but rather by combinations of factors, even though one or
another factor can be rate-limiting in a particular experimental
situation (
14
20
). As a result, gene regulation cascades are not
explained by a linear pathway but by a hierarchical network (
21
,
22
). These features of developmental regulatory networks can be
effectively captured by using topological network models, in which
the functional interactions are represented as inputs and outputs.
While there are some caveats about the interpretation of these
networks for hematopoietic development (
23
,
24
), topological
models are indispensable for compiling evidence to explain cell
state differences in terms of gene regulation mechanisms.
In this review, we will focus on the gene regulatory programs
utilized in the early stages of thymic T-cell development, in which
multipotent progenitor cells undergo de
fi
nitive T-lineage
commitment and establish T-cell identity. Some gene regulatory
network (GRN) circuits have been shown to promote cell type
stability, as in the pluripotency state of ES cells, while others have
been shown to drive ultra-rapid, deterministic cascades of change, as
in the early Drosophila embryo (
25
,
26
). As shown below, early T cell
development falls between these models. It includes both regulatory
subcircuits that resist change and regulatory connections that enforce
change; the stochastic balance between these network subcircuits is
likely to underlie the distinctively asynchronous kinetics of T cell
program entry (
27
,
28
). We review the major regulatory genes that are
involved in different sub-programs to establish T-cell fate, resolve
coherent but distinct program modules that need to be deployed, and
propose an updated gene regulatory network (GRN) model.
2.2 Technical requirements and challenges
for accurate GRN construction
There are signi
fi
cant challenges and caveats of experimental
strategies that are utilized to understand developmental GRNs. As
transcription factors need to bind to speci
fi
c sequences in the DNA in
order to work, genome-wide transcription factor occupancy data
should in principle help to map where the
direct
interactions of
the putative regulatory factor occur. In the past
fi
fteen years it has
become relatively easy to map transcription factor binding across the
genomebyChIP-seq(orCUT&RUN,orCUT&Tag)(
29
31
).
Potential transcription factor inputs to active regulatory elements
can also be mapped in the DNA even without direct evidence of
transcription factor binding, based on the enrichment of their motifs
in accessible chromatin in a given cell type (
32
34
). Mapping open
chromatin by DNase-seq or ATAC-seq and using the cell type-
speci
fi
c enrichment of motifs predicted to be bound by given
transcription factors (
35
37
) in the open chromatin has become a
Shin and Rothenberg
10.3389/
fi
mmu.2023.1108368
Frontiers in
Immunology
frontiersin.org
02
powerful way to predict which transcription factors may be important
in that cell type (
38
44
). Even without any evidence that a particular
target gene actually responds to the presence of the transcription
factor itself, such predictive analysis can be a valuable step toward
perceiving network relationships at a global level (
39
,
45
). Adding
measured evidence for actual transcription factor occupancy at sites
around key genes, when possible, enables speci
fi
c network predictions
to be made (
46
,
47
).
However, actually confronting binding-based or motif-based
predictions with empirical tests of individual target gene responses
to transcription factor activit
y perturbations has given a more
complex picture. Binding shou
ld not be overinterpreted.
Transcription factor-DNA interaction sites in a cell type are often
much more numerous (order of 10
4
sites) than the number of genes
that change expression in response to the loss or gain of a given
transcription factor in that cell type (order of 10
2
genes) (
48
51
). This
indicates that a given transcription factor
soccupancyisoften
dispensable for regulation of most of the genes that it binds. Even
when the transcription factor may be required for the existence of the
cell type, its binding to the promoter of a particular gene in that cell
type can be functionally irrelevant for that gene. It is thus challenging
to predict which binding sites are actually functional, or what mode of
actions they mediate (activation vs. inhibition), solely based on the
transcription factor occupancy pattern. In addition, it is not simple to
assign a binding site to potential target genes, especially if the binding
site is surrounded with multiple genes or when the binding site is
distant from a promoter. Developmentally important enhancers of
key genes in the T-cell gene regulatory network can be hundreds of
kilobases (kb) away from the promoters (e.g.
Bcl11b
,
Ets1, Gata3, Id2;
also
Myc
)(
52
59
). Disrupting individual regulatory elements by
deletion or a mutation of a speci
fi
c motif sequences can reveal the
enhancer-target gene link, but functional redundancy of regulatory
elements can lead to underestimation (
60
62
).
For identi
fi
cation of the gene network linkages described here,
therefore, we have required evidence from functional tests that acutely
perturbed the levels of transcription factor proteins in a speci
fi
c
developmental context. This has been achieved by germline/
conditional deletion using a Cre-
loxP
or CRISPR-Cas9 system,
knocking down using RNA interference (RNAi), repression using
CRISPR interference (CRISPRi),
or acute overexpression/ectopic
activation of the target gene, each of which was then followed by
in
vivo
and/or
in vitro
phenotype scoring at the cellular, transcriptomic,
and epigenomic levels. Furthermore, we have given greater weight to
results from experiments where the role of a transcription factor was
tested in a precise, relevant T-cell developmental context and time
window. This was important because recent results of stage-speci
fi
c
perturbation tests have shown that the same transcription factor
s
function may change or disappear entirely in a different context, or
even at a different stage within the same lineage (
49
,
51
,
63
68
). We
have sought to apply consistent statistical criteria to these expression
differences and to emphasize relationships that have been
independently con
fi
rmed.
Precise controlling of perturbation timing can be experimentally
challenging, especially in the context of a developmental process, as
the cells are constantly progres
sing forward to the next stage.
Inadequately de
fi
ned perturbation time-windows could lead cells to
developmental deviation toward irrelevant fates prior to reaching the
stages of interest, or allow cells to activate compensatory mechanisms.
Wide perturbation time windows could also span states in which the
same transcription factor plays different roles, which could
complicate data interpretation. Also, although stage-speci
fi
c
perturbation approaches can capture the functional consequences of
the perturbation effectively, it is dif
fi
cult to dissect whether the
phenotype is resulting from direct effect, indirect effect
via
gene
network, or indirect effect
via
differential population survival. As
noted above, simply detecting transcription factor binding in the
vicinity of a possible target gene is not enough to prove a direct
functional relationship.
These limitations and challenges are not completely avoided
within the data sources that we have analyzed to construct the
update of the early T-development GRN. However, support for
speci
fi
c sites through which a transcription factor could exert direct
control of a target gene can be gleaned from the recent increase in
available genome-wide transcription factor binding data together with
measurements of local chromatin states such as chromatin
accessibility, 3D chromatin structure, and changes in histone
marks, when these data are coupled with analyses of transcription
factor perturbation effects.
2.3 Construction of an updated T-cell
speci
fi
cation GRN model
The current gene network model we present differs from previous
versions (
27
,
69
74
) in several ways. In particular, initial models for
early T-cell developmental GRNs were based primarily on candidate
gene measurements, due to technical considerations. Targeted assay
systems such as qPCR were used to examine perturbation effects on
sets of only 100-150 genes out of 10,000 expressed transcripts, focused
mainly on high sensitivity monitoring of transcription factor coding
genes. It is now routine to use RNA-seq to measure the entire
transcriptome quantitativel
y in an unbiased manner with low
numbers of input cells, both at the bulk population level and at the
single-cell level. This reveals whole batteries of genes coregulated by a
given transcription factor perturbation in the speci
fi
c context, which
help to identify the changes in developmental status that have been
induced. Genome-wide transcriptome data processing pipelines also
standardize accepted statistical criteria. Where available, single-cell
transcriptome analyses are also useful to separate perturbation effects
on cells within a lineage from perturbation effects on population
balances between the lineage of interest and contaminants.
Another change has been the advent of better technology for acute
loss of function as well as acute gain of function of transcription factor
genes within a well-de
fi
ned developmental time window. Whereas
Cre excision required a separate mouse strain to be developed for each
gene to be targeted, using Cas9-transgenic mice (
75
) as cell sources for
in vitro
T-cell differentiation has made it possible to use guide RNAs
(gRNAs) targeting one or several genes anywhere in the genome to be
introduced, to disrupt genes ef
fi
ciently at any stage desired in the
same genetic background. Previous analyses of transcription factor-
target linkages in early T cell development have often depended on
gain of function or ectopic expression experiments because these
techniques change transcription factor levels acutely in a speci
fi
c cell
type with even faster kinetics. However, multiple recent examples
Shin and Rothenberg
10.3389/
fi
mmu.2023.1108368
Frontiers in
Immunology
frontiersin.org
03
have shown that transcription factor impacts on a network can differ
markedly depending on the level of expression of the transcription
factor protein (GATA3, PU.1, Runx), and levels may be hard to
control in gain of function. While loss-of-function experiments can
pose other problems (asynchronous, sometimes slow loss of targeted
protein; potential viability losses in the affected cells; potential
masking by redundancy), they measure effects of factors at their
normal levels of expression. Thus, gain of function data have needed
to be re-evaluated in light of corresponding loss of function data, and
the manipulated levels of factor protein have needed to be compared
to physiological levels.
To construct the network models shown below, we have compiled
data from several sources where developmentally well-de
fi
ned gain or
loss of function perturbations were carried out, with all signi
fi
cantly
responding genes from each perturbation study tabulated in
Table S1
.
The studies used are described in
Box 1
.
Table S1
also compiles
evidence of local binding by the transcription factor of interest around
each functional target gene, wherever this evidence was available.
Note that different perturbation experiments used as sources focused
on different developmental time windows and were more or less
sensitive to loss or gain of a given factor
s activity, depending on the
normal expression baseline at that timepoint. Where there was
variation between different controls or inadequate expression of a
gene within the time window tested, even repeatedly observed effects
may have missed statistical signi
fi
cance cutoffs. Therefore, while we
have generally depended on more than one corroborating piece of
evidence for each connection shown, we have not required that all
RNA-seq sources should score the same genes as
signi
fi
cantly
affected. Taken together, however, these results now provide a
clearer view of the architecture of the T-cell speci
fi
cation gene
regulatory network, showing its modular construction, and the
coordination of changes in activities of its component subcircuits
from stage to stage.
3 Overview of early thymic
T-cell development
Thymic T-development begins as hematopoietic progenitor cells
possessing lymphoid potentials migrate into the thymus and interact
with the cortical epithelial cells providing Notch ligands, growth
factors, and cytokine signaling (
106
109
). This drives the
developmental program shown in
Figure 1A
. At early stages,
intrathymic precursor cells undergo extensive cell proliferation and
upregulate some of the T-lineage associated genes. However, they still
preserve multipotentiality and can differentiate into non-T lineage
cells, especially if Notch signaling is withdrawn. This uncommitted
stage is referred to as
Phase 1
, which includes double-negative 1
(DN1, or Early T-cell precursor, ETP) and DN2a stages. In Phase 1,
both ETP and DN2a cells express high levels of cKit and CD44, but
CD25 surface expression distinguishes ETP (CD25
-
) from DN2a
(CD25
+
)(
Figure 1A
). Recent single-cell transcriptomics studies
reveal heterogeneity in Phase 1 population both in human and
mouse. The pro-T cells comprising Phase 1 actually include
multiple subsets of ETPs and populations transitioning to DN2a
cells that display distinct gene regulatory programs, with patterns
mostly conserved between mouse and human based on single-cell and
bulk analyses (
94
98
,
110
). Of interest, the intermediate-stage ETP
populations, more than the most primitive ETPs, transiently express a
set of non-T lineage-associated genes (e.g.
Mpo
,encoding
myeloperoxidase) even though these cells are on the T-cell pathway
(
94
). This suggests that these genes are induced as a part of a normal
developmental progression program in ETPs, and multilineage
priming occurs before T-lineage commitment.
Upon sustained exposure to Notch-ligand and other thymic
microenvironmental signals, progenitor cells intrinsically commit to
a T cell fate, and the developmental plasticity to non-T-lineages is
terminally blocked. After T-lineage commitment, pro-T cells establish
a T-cell identity gene expression program and start to rearrange some
forms of T cell receptor (TCR) genes. For conventional T cells,
successful gene rearrangement for expression of TCR
b
chain is
assured by quality control at the
b
-selection checkpoint during
DN3 stage. Other T cell precursors rearrange and express genes
encoding TCR
g
and TCR
d
instead, to become
gd
T cells. These stages
from commitment to
b
-selection are collectively referred to as
Phase
2
, which is comprised of DN2b (cKit
int
CD25
+
cells) and DN3a
(cKit
low
CD25
+
CD28
-
CD27
low
)stages(
Figure 1A
)(
111
113
).
Further development depends on the cells
TCR interactions.
Following
b
-selection, most developing T cells accumulate as
CD4
+
CD8
+
(
DP
) cells, while they complete their TCR
a
gene
Box 1: Sources of data compiled in network models
Table S1
presents the main data and sources used to establish speci
fi
c connections in the GRN models that follow. These sources constitute RNA-seq and microarray results
from studies of germline deletion of genes encoding the high mobility group factor TCF1 (encoded by
Tcf7
)(
28
,
76
,
77
), the basic helix-loop-helix E proteins E2A (
Tcf3
) and
HEB (
Tcf12
)(
78
,
79
), and the later-activated zinc
fi
nger factor associated with T-cell lineage commitment, Bcl11b (
28
,
80
). In addition, we used studies of acute, stage-speci
fi
c
deletions of the ETS family subgroup member PU.1 (encoded by
Spi1
, previously called
Sfpi1
in mice) (
28
,
48
); of Lmo2 (
81
); of GATA3 (
28
,
82
,
83
); of Bcl11a (
28
); of Erg (
28
);
of Notch1 and Notch2 together (
67
); and of Runx1 and Runx3 together (
51
,
84
). Although data for cells in the same developmental stages with complete disruption of Ikaros
(
Ikzf1
) were not available, differentially expressed genes that responded to Ikaros (
Ikzf1
) zinc
fi
nger 4 deletion were also added (
85
). Finally, we included data from studies of
acute gain of function of factors at stages after they would normally have been shut down, including PU.1 (
Spi1
)(
48
), and the transcription factor adaptor Lmo2 (
86
88
). In
addition, supporting results came from studies introducing into pro-T cells acute antagonists of key transcription factors, including the natural
E protein antagonist ID2 (
89
)or
an arti
fi
cially constructed dominant repressor form of PU.1 (
90
). Additional supporting results came from earlier perturbation studies knocking out E protein genes
Tcf3
(E2A)
and
Tcf12
(HEB) or
Bcl11b
(
71
,
79
) and studies utilizing progenitor or pro-T cell lines and acute T-cell malignancies to interrogate roles of the early-acting transcription factors
Lmo2 and Hhex (
81
,
88
,
91
). For data on normal developmental expression dynamics of these genes in pro-T cells, RNA-seq and single-cell RNA-seq datasets were used (
92
94
), corroborated by highly curated microarray data (
95
). Data used were all from experiments in the mouse system, but the underlying gene expression patterns involved are
largely conserved in human data (
94
,
96
103
)(rev. in (
103
)). In addition to the data incorporated into
Table S1
, we consulted data from other studies for TCF1, GATA3, and
Bcl11b target gene regulation as well (
16
,
63
,
83
,
90
,
104
,
105
). Finally, note that positive and negative regulatory connections shown in the models indicate a measurable effect in
the indicated developmental window, but usually not a pure Boolean function.
Shin and Rothenberg
10.3389/
fi
mmu.2023.1108368
Frontiers in
Immunology
frontiersin.org
04
rearrangements and express for the
fi
rst time the TCR
ab
that they
will use forever, if they are allowed to live. However, they undergo an
ultra-stringent selection process to reject all cells with inadequate or
highly autoreactive TCR speci
fi
city. The rare surviving cells can
fi
nally mature, undergoing divergent programs of positive selection
into CD4 or CD8 single-positive cells before they emerge from the
thymus, associated with
helper
or
cytotoxic
function respectively.
Notably however, most core T-identity program genes that pro-T
cells activate in Phase 2 are irreversibly maintained throughout all
later stages of T cell development and immunological responses.
4 Gene regulatory network models
for early T cell development
through commitment
As the thymus-seeding precursor cells migrate from the bone
marrow and
fi
rst enter the thymus, they express transcription factors
inherited from their progenitor cells in the bone marrow. Expression
of many such legacy transcriptio
n factors is maintained across
multiple cell divisions. While many progenitor-associated factors
B
C
A
FIGURE 1
T cell development stages and transcription factor expression kinetics.
(A)
Diagram depicts different stages of early thymic T cell development that T-
progenitor cells sequentially go through. Informative surface proteins that are utilized to de
fi
ne each stage are indicated (cKit, CD44, CD25 for ETP to
DN3; CD4 and CD8 for DP). Developmental plasticity to generate alternative, non-T-lineage cells is shown with dotted arrows. Note that these
alternative lineage potentials are silenced after T-lineage commitment. Lineage commitment to a T cell fate distinguishes Phase 1 (before T-lineag
e
commitment) and Phase 2 (after T-lineage commitment). CLP, Common lymphoid progenitor, LMPP, lympho-myeloid primed progenitor.
(B)
Graphs
show mRNA expression kinetics of important transcription factors involved in early T cell gene regulation programs. Left: Transcription factors in
herited
from the bone-marrow progenitor cells whose expressions gradually decline during T-development (
Spi1, Bcl11a, Erg, Lyl1
, and
Lmo2
). Middle:
transcription factors upregulated in pro-T cells by thymic microenvironment (
Tcf7, Gata3
, and
Bcl11b
). Right: transcription factors expressed from the
bone-marrow progenitor cells and stably and maintained during T cell development (
Ikzf1, Tcf3, Tcf12, Runx1
, and
Runx3
). Gene expression data was
plotted by utilizing publicly available mRNA expression datasets for immune cells with curve smoothing (
https://www.immgen.org
)(
93
).
(C)
Diagram
illustrates transcription factors (arrows) providing distinct forces to different gene expression program modules in individual cells.
Shin and Rothenberg
10.3389/
fi
mmu.2023.1108368
Frontiers in
Immunology
frontiersin.org
05
are turned off eventually, some legacy transcription factors maintain
their expression throughout thymic T cell development, but these
often adopt new roles during stage transitions by occupying different
sets of genomic regions from those in hematopoietic progenitor cells.
The co-existence of
inherited transcription factors
together with
newly induced regulators
in the thymic precursor cells generates
numerous possible combinatorial inputs to different target genes
which dynamically change their expression during developmental
progression in Phase 1 and Phase 2 (
Figure 1B
). This review focuses
on key driving regulatory factors and responders composing the
Phase 1- and Phase 2-GRNs, in which many components show
dynamic changes throughout developmental progression.
4.1 Gene regulatory network modules in
Phase 1 and Phase 2
The newly proposed GRN that we present here highlights the
observation that in each stage of early T-lineage development, subsets
of target genes for a given transcription factor are often regulated in
parallel ways, collectively composing a speci
fi
c module that often has
coherent biological function (
28
,
114
). While the de
fi
nition of a
module is not precise, we use this framework to describe fairly
discrete components of the dev
elopmental process which are
regulated distinctly even if the cell is expressing other sets of genes
at the same time. One module (de
fi
ned by expression of a group of
genes) may remain active consistently throughout a series of stages
while other modules are sharply changing activity, or a module can be
affected coherently by a perturbation that does not affect genes in
other modules within the same cells. As shown below, this gives the
overall process of early T cell development an
assembled
quality.
Key transcription factors often appear to play roles in activity of an
entire module, even while the individual target genes they act upon
also have other regulatory inputs. However, it is noteworthy that a
transcription factor does not de
fi
ne the module on its own. The same
transcription factor may even work in more than one module, as
shown below. Instead, different transcription factor ensembles can be
seen to de
fi
ne distinct modules, working together to drive expression
of shared target genes. They establish positive and negative feedback
loops within a module, for stabilization, or between different modules,
which results in dynamic gene network behavior. This modular
structure is signi
fi
cant because the timing of transitions that
individual cells make along the pathway is very asynchronous, with
cells showing an ability to linger in any of several states for several cell
cycles before progressing (
27
).Theslowpaceofdifferentiation
implies that regulation of the modules that predominate in any
particular stage is fairly stable; however, developmental progression
depends on eventually breaking this stability. In developmental gene
networks, a regulatory state is often stabilized when different
transcription factors with concordant effects on the same module
also support each other
s expression (
115
), and some examples of this
are shown below. In contrast, inter-module inhibition or repression
between regulators can make expression of different modules unstable
and eventually incompatible.
Mutually exclusive expression of different regulators generates
sharp cell-fate boundaries in bin
ary cell fate decisions and in
embryos, for example (
115
,
116
). However, one notable feature
of the T cell developmental networ
k linkages is that the repression
which is detected is often incomplete. Many repressive interactions
in this system cause a dampenin
g of expression levels but not a
silencing of the target genes. This
soft
repression function
enables factors with mutually anta
gonistic activities to coexist in
the cells for days and multiple cell divisions, and it enables
activities of opposing modules to overlap within an individual
cell at the same time. Therefore, distinct sets of transcription
factors can pull and push activities of their target modules in
different directions in the same cell (
Figure 1C
), and the resultant
sum of the vectorized
forces
instructs cell fate (
28
). In the
following, we review both the regulatory circuits that maintain
coherent module activity, and also
the sources of antagonists that
fi
nally keep them from persisting.
The architecture of the modular subprograms within the T-cell
GRN has become much clearer through recent work. The distinct
subprograms that comprise the early T-cell GRNs can be categorized
as 1) the T-identity module 2), the stem or progenitor module, 3)
conditional access to alternative fates as a side effect of the stem/
progenitor module, and 4) a cell survival and proliferation module.
While the T-identity module needs to be successfully installed to
provide constitutive expression of the core T-cell genes, the second
module must be silenced in order to convert multipotent precursors
to T-lineage committed cells. Finally, the cell survival and
proliferation program in Phase 1 and Phase 2 ensures the
production of suf
fi
cient number of immature T cells to
accommodate later positive and negative selection. Because the T-
cell identity module introduces the initiators and overall structure of
the entire process, it is discussed
fi
rst.
4.2 Installation and speci
fi
cation of the T-
identity module in Phase 1 and Phase 2
Commitment to the T-cell fate involves two major events: 1)
terminal blockade of alternative lineages, concomitant with 2)
acquisition of T-identity gene expression. The gene network
instructing the constitutive expression of the T-lineage de
fi
ning
geneswillbereferredtoastheT-identitymodule.Successful
establishment of the T-identity module is pivotal because the core
T-identity established in early
thymic developmental phases is
robustly maintained even after these progenitor cells become
mature T cells, long after they leave the Notch ligand-rich thymic
environment. The
T-cell markers
include CD3 clusters and TCR
signaling mediators, as well as transcription factors necessary for the
induction and maintenance of these marker genes. The T-identity
module requires activities of Notch signaling, TCF1, GATA3, Runx
transcription factors, and Ikaros starting from Phase 1, then receives
critical positive inputs from the E protein complex and Bcl11b as pro-
T cells progress through Phase 2. Expression patterns of the key
factors in different immune cell populations are shown in
Figure 2A
(data from
www.immgen.org
(
93
)).
4.2.1 Notch signaling as an indispensable driver
Notch signaling is an absolute requirement for initiation and
establishment of the T-identity module. In the absence of Notch
signaling, B cells develop in the thymus instead of T cells, whereas
Shin and Rothenberg
10.3389/
fi
mmu.2023.1108368
Frontiers in
Immunology
frontiersin.org
06
constitutive expression of intracellular domain of the Notch (ICN) in
the bone-marrow progenitor ce
lls induce extrathymic T cell
development (
118
120
). In addition, many if not all thymic seeding
progenitors need to be primed by some level of Notch signaling in the
bone marrow. Whereas Jagged-class Notch ligands expressed in the
bone marrow do not signal as strongly as the Delta-class Notch
ligands in the thymic microenvironment (
121
123
), this prior
experience seems to be important for the cells to acquire
competence to initiate the T cell program (
124
) A thymic seeding
progenitor population that had received Notch signaling before
thymic entry also exists in human
s, supporting a physiological
contribution of Notch signaling to T-lineage speci
fi
cation from the
prethymic stages in both humans and mice (
97
). The stromal cells in
the thymic cortex express the strongest ligands for signaling through
Notch1, mainly Delta-like 4 (Dll4), as well as Dll1 and to a lesser
extent Jagged 2 (Jag2). These engage with Notch receptors (Notch 1-
Notch3) on the thymic precursor cells with varied af
fi
nities (
122
,
125
).
This ligand-receptor interaction induces proteolytic cleavages in the
Notch receptor, releasing the ICN to the cytoplasm. Then, the Notch
ICN interacts with DNA-binding transcription factor RBPJ
k
and
B
C
A
FIGURE 2
Gene regulatory networks driving the T-identity module.
(A)
Heatmap shows expression patterns of key transcription factors supporting the T-identity
module across different immune cell types, using the ImmGen MyGeneSet tool (
https://www.immgen.org
)(
93
). Bone marrow progenitor (BM progen), B
cells precursors (B progen), B cells, T cells,
gd
T cells ("gd T cells"), natural killer cells (NK), Innate lymphoid cells (ILC)s, Dendritic cells (DC), Macrophage
(MF), Monocyte (Mo), Granulocyte (GN), Mast cell (MC). See
Table S2
for the number keys.
(B, C)
BioTapestry models (
71
,
117
) of gene regulatory network
relationships described in the text. Evidence for all connections shown is in
Table S1
. Structure of BioTapestry models: Genes are shown with regulatory
regions (horizontal lines) distinct from their encoded outputs (bent arrow at
promoter
). Connections with arrows represent positive regulation.
Connections ending in blocking lines represent negative regulation. When gene products combine to produce a functional unit, or when activity is not
a
simple function of transcriptional output, a
bubble
symbol is placed to represent the emergent function from their collective activities. Both negative
and positive regulation can impinge on such activity bubbles, e.g. ID factors inhibiting E protein activity without necessarily inhibiting the expr
ession of
the E protein coding genes themselves. Double chevron symbol indicates ligand-receptor interactions, here used to represent Notch ligand
Notch
interaction as a source of regulatory input to other genes. Arrows consisting of dotted lines show decreasing or weak activities. These conventions a
re
used also in
Figures 4B
,
C
,
5B
,
C
.
(B)
Current model for the T-identity module regulator transcription factors in Phase 1 (top) and Phase 2 (bottom).
(C)
Current model for the T-identity module downstream molecules in Phase 1 (top) and Phase 2 (bottom). Expression of
Il2ra
(CD25) indicates
developmental progression from ETP to DN2 and DN3 stages in mice. Surface markers:
Thy1
,
Cd3
clusters genes (
Cd3d, Cd3e, Cd3g
), Cd247; TCR
signaling molecules:
Lck, Itk, Fyn, Fyn, Trat1, Lat, Zap70, Ptcra
; TCR rearrangement molecules:
Rag1, Rag2, Dntt
.
Shin and Rothenberg
10.3389/
fi
mmu.2023.1108368
Frontiers in
Immunology
frontiersin.org
07
functions as co-activator, recruiting other transcriptional coactivators
and chromatin modifying enzymes (
122
,
126
).
Although Notch1 is most strongly expressed throughout, acute
deletion of different Notch family genes in pro-T cells shows that
Notch1 and Notch2 cooperate to induce the T-identity module in
Phase 1 cells by activating genes encoding essential transcription
factors (e.g.,
Tcf7
,
Myb
,
Gata3
,
Hes1
), the core-T cell marker genes
(
Cd3g
,
Cd3e
,
Thy1
) and the useful DN2-DN3 stage marker
Il2ra
(in
mouse; not in human at this stage). Later, in Phase 2, Notch signaling
induces genes necessary for TCR rearrangement and signaling (
Rag1
,
Rag2
,
Lck
,
Ptcra
)
(
67
)(
Figures 2B
,
C
). Notch actions in the T-cell
identity module are not hit-and-run; the signals must be sustained
through lineage commitment and then to sustain most cells
viability
into the beginning of
b
-selection [rev. by (
127
)]. Notably, however,
the target genes regulated by Notch signals either positively or
negatively change markedly between Phase 1, early Phase 2, and the
end of Phase 2 (
67
). The shifting but essential roles of Notch signaling
in pro-T cells provide an example of context-dependent shifts in
regulator deployment which are seen for other factors as well (
15
,
49
,
51
).
4.2.2 Notch-induced effectors TCF1 and GATA3,
cooperative but nonredundant
Notch-activated targets in Phase 1 cells include genes encoding
TCF1 (
Tcf7
) and GATA3, which are pivotal for instituting the T-
identity module (
Figures 2A
,
B
top). The activation of
Tcf7
by Notch
signaling appears to be direct, although its maintenance becomes
Notch-independent in Phase 2 (
67
,
76
). The functional importance of
TCF1 and GATA3 in T-lineage speci
fi
cation is demonstrated by
transgenic animal models that lack
Tcf7
or
Gata3
expression.
Tcf7-
or
Gata3
-de
fi
ciency in pro-T cells abrogates T-cell development from
the earliest Phase 1 stage (
76
,
77
,
82
,
104
,
128
130
).
Tcf7
deletion in
bone-marrow derived progenitors using
Vav1-
Cre caused
developmental arrest at ETP stage and allowed abnormal
transcriptome clusters to accumulate among thymocytes in steady
state, based on single-cell RNA-seq (scRNA-seq) (
63
). In accord with
these results, another recent single-cell transcriptome study using
dual guide-RNAs (gRNAs) to disrupt
Tcf7
or
Gata3
speci
fi
cally in
Phase 1 cells showed that the precursor cells lacking TCF1 completely
failed to enter the normal T-cell developmental trajectory, while those
lacking GATA3 failed to progress properly (
28
). While
Tcf7
and
Gata3
both depend on inputs from Notch signaling and Runx family
transcription factors to be turned on, they also create a possible
stabilization circuit f
or early T-cell speci
fi
cation by positively
regulating each other as well (
28
,
77
,
82
)(
Figure 2B
).
Despite these positive feedbacks, acute
Tcf7
deletion results in
different impacts on the developing pro-T cell population than acute
Gata3
deletion in the same Phase 1 developmental time window,
based on scRNA-seq data. TCF1 and GATA3 regulate distinct target
genes, indicating that these factors do not perform redundant
functions. Also, studies using gain-of-function approaches suggest
that TCF1 overexpression is completely different from the effect of
high dosage GATA3 in pro-T cells. A high level of TCF1 in the mouse
bone-marrow progenitor cells can upregulate essential genes in the T-
identity module, such as
Gata3
and
Bcl11b
, even without Notch
signaling, causing cells apparently to bypass the ETP stage (
77
). In
contrast, elevated GATA3 levels block T cell development, promoting
deviation to an alternative, mast-cell lineage in the absence of Notch
signaling, and killing pro-T cells if Notch signaling is sustained (
79
,
82
,
83
,
131
). This difference in part involves the different impacts
these factors have on genes within proliferation and survival modules
used in Phase 1 cells, including effects on
Kit
,
Il7r
and
Spi1
(see below).
In pro-T cells, TCF1 and GATA3 instruct T-cell development by
upregulating many T-program genes (
Notch1, Notch3, Hes1
,
Gata3
,
Bcl11b
,
Lef1, Ets2
,
Il2ra
, all
Cd3
genes,
Cd247
,
Tcrb
,
Lat, Fyn, Rag1,
Rag2
, and
Trat1
by TCF1;
Myb, Ets1, Tcf7
, and
Bcl11b
by GATA3)
(
Figure 2C
). Importantly, both TCF1 and GATA3 are required
together for the initial induction of Bcl11b, which will be important
in Phase 2, suggesting that multiple transcription factor inputs are
non-redundantly required (
16
).
As TCF1 and GATA3 are necessary to initiate the T-lineage
speci
fi
cation program, the positive regulatory factors inducing these
transcription factors are also critical. TCF1 and GATA3 positively
regulate each other and Runx transcription factors also provide
supportive inputs, as described b
elow. The critical regulatory
elements of the
Tcf7
gene in T-lineage cells, 30-40kb upstream of
its promoter regions (
132
), are also occupied by RBPJ
k
, Runx factors,
GATA3, and TCF1, consistent with direct positive regulation by these
factors (
Figure 3
left, in "DN2b/DN3" samples; note that both
Tcf7
and
Gata3
in this
fi
gure are transcribed from right to left). Similarly,
an enhancer region 280 kb downstream of the
Gata3
gene, which is
known to be necessary for
Gata3
expression in the T-lineage (
54
), is
occupied by RBPJ
k
, Runx factors, TCF1, Bcl11b, and GATA3 by
Phase 2 (
Figure 3
right,
[i.e., "DN2b/DN3"] samples).
The strong force that TCF1 exerts to drive the T-cell program
involves reprogramming of chromatin accessibility and long-range
looping. TCF1 overexpression in
fi
broblasts opens the chromatin
regions near the T lineage-associated genes, which are naturally
demarcated by repressive histone marks in
fi
broblasts (
135
). Recent
studies demonstrate a potent role of TCF1 in chromatin architecture
remodeling in pro-T cells, later DP thymocytes, and mature T cells
(
136
,
137
). In Phase 2, TCF1 occupies key sites in evolutionary
conserved topologically associating domains (TAD), i.e. regions of
chromatin containing clusters of regulatory elements that interact
within the TAD but are usually insulated from other TADs. When
TCF1 binds to the inter-TAD sites along with CTCF (a transcription
factor that can anchor chromatin architecture), this co-occupancy
weakens insulation of the TAD boundaries and enables intermingling
of TADs, potentially allowing new enhancer-promoter interactions.
In addition, TCF1 establishes long-range looping around the T cell
genes and marks the surrounding regions with H3K27ac (
136
).
4.2.3 Multitasking positive roles of Runx
family factors
Runx transcription factors are broadly expressed in all
hematopoietic lineage cells, but they exert context-speci
fi
c functions
by switching their interaction sites across the genome. The expression
of Runx1 and Runx3 is established prior to the thymic entry, and
these two paralogs are co-expressed within individual Phase 1 and
Phase 2 pro-T cells alike (
51
,
93
,
94
). The sum of Runx1 and Runx3
activities, measured by total binding by ChIP-seq, is maintained
stably throughout the early T-li
neage developmental process,
suggesting that overall Runx availability between Phase 1 and Phase
Shin and Rothenberg
10.3389/
fi
mmu.2023.1108368
Frontiers in
Immunology
frontiersin.org
08