of 20
First-principles prediction of the information processing
1
capacity of a simple genetic circuit
2
Manuel Razo-Mejia
1
and Rob Phillips
1, 2, 3, *
3
1
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
4
2
Department of Physics, California Institute of Technology, Pasadena, CA 91125, USA
5
3
Department of Applied Physics, California Institute of Technology, Pasadena, CA 91125, USA
6
*
Correspondence: phillips@pboc.caltech.edu
7
8
Abstract
9
Given the stochastic nature of gene expression, genetically identical cells exposed to the same
10
environmental inputs will produce di
erent outputs. This heterogeneity has consequences for how
11
cells are able to survive in changing environments. Recent work has explored the use of information
12
theory as a framework to understand the accuracy with which cells can ascertain the state of their
13
surroundings. Yet the predictive power of these approaches is limited and has not been rigorously
14
tested using precision measurements. To that end, we generate a minimal model for a simple genetic
15
circuit in which all parameter values for the model come from independently published data sets.
16
We then predict the information processing capacity of the genetic circuit for a suite of biophysical
17
parameters such as protein copy number and protein-DNA a
nity. We compare these parameter-
18
free predictions with an experimental determination of the information processing capacity of
E.
19
coli
cells, and find that our minimal model accurately captures the experimental data.
20
As living organisms thrive in some given environment, they are faced with constant changes in their
21
surroundings. From abiotic conditions such as temperature fluctuations or changes in osmotic pressure,
22
to biological interactions such as cell-to-cell communication in a tissue or in a bacterial biofilm, living
23
organisms of all types sense and respond to external signals. Fig. 1(A) shows a schematic of this
24
process for a bacterial cell sensing a concentration of an extracellular chemical. At the molecular
25
level where signal transduction unfolds mechanistically, there are physical constraints on the accuracy
26
and precision of these responses given by intrinsic stochastic fluctuations [
1]. This means that two
27
genetically identical cells exposed to the same stimulus will not have an identical response [
2].
28
The implication of this biological noise is that cells do not have an infinite resolution to distinguish
29
signals and, as a consequence, there is a one-to-many mapping between inputs and outputs. Further-
30
more, given the limited number of possible outputs, there are overlapping responses between di
erent
31
inputs. In that sense, one might think of cells performing a Bayesian inference of the state of the
32
environment given their phenotypic response, as schematized in Fig. 1(B). The question then becomes
33
how to analyze this probabilistic rather than deterministic relationship between inputs and outputs?
34
The abstract answer to this question was worked out in 1948 by Claude Shannon who, in his seminal
35
work, founded the field of information theory [
3]. Shannon developed a general framework for how
36
to analyze information transmission through noisy communication channels. In his work, Shannon
37
showed that the only quantity that satisfies simple conditions of how a metric for information should
38
behave, was of the same functional form as the thermodynamic entropy – thereby christening his met-
39
ric the information entropy [
4]. He also gave a definition, based on this information entropy, for the
40
relationship between inputs and outputs known as the mutual information. The mutual information
41
I
(
p
;
c
)betweeninput
c
and output
p
, given by
42
I
(
p
;
c
)=
X
c
P
(
c
)
X
p
P
(
p
|
c
) log
2
P
(
p
|
c
)
P
(
p
)
,
(1)
1
.
CC-BY-NC 4.0 International license
It is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint
.
http://dx.doi.org/10.1101/594325
doi:
bioRxiv preprint first posted online Mar. 31, 2019;
quantifies how much we learn about the state of the input
c
given that we get to observe the output
43
p
.
44
It is natural to conceive of scenarios in which living organisms that can better resolve signals might
45
have an evolutionary benefit, making it more likely that their o
spring will have a fitness advantage [
5].
46
In recent years there has been a growing interest in understanding the theoretical limits on cellular
47
information processing [
6, 7], and in quantifying how close evolution has pushed cellular signaling
48
pathways to these theoretical limits [
8–10]. While these studies have treated the signaling pathway
49
as a “black box” explicitly ignoring all the molecular interactions taking place in them, other studies
50
have explored the role that molecular players and regulatory architectures have on these information
51
processing tasks [
11–17]. Despite the great advances in our understanding of the information processing
52
capabilities of molecular mechanisms, the field still lacks a rigorous experimental test of these ideas
53
with precision measurements on a simple system tractable both theoretically and experimentally.
54
Over the last decade the dialogue between theory and experiments in gene regulation has led to
55
predictive power not only over the mean, but the noise in gene expression as a function of relevant
56
parameters such as regulatory protein copy numbers, a
nity of these proteins to the DNA promoter,
57
as well as the extracellular concentrations of inducer molecules [
18–21]. These models based on
58
equilibrium and non-equilibrium statistical physics have reached a predictive accuracy level such that
59
for simple cases it is now possible to design input-output functions [
22, 23]. This opens the possibility
60
to exploit these predictive models to tackle the question of how much information genetic circuits
61
can process. The question lays at the heart of understanding the precision of the cellular response to
62
environmental signals. Fig. 1(C) schematizes a scenario in which two bacterial strains respond with
63
di
erent levels of precision to three possible environmental states, i.e. inducer concentrations. The
64
overlap between the three di
erent responses is what precisely determines the resolution with which
65
cells can distinguish di
erent inputs. This is analogous to how for an imaging system the point spread
66
function limits the ability to resolve two light emitting point sources.
67
In this work we follow the same philosophy of theory-experiment dialogue used to determine model
68
parameters to predict from first principles the e
ect that biophysical parameters such as transcription
69
factor copy number and protein-DNA a
nity have on the information processing capacity of a simple
70
genetic circuit. Specifically, to predict the mutual information between an extracellular chemical signal
71
(input
c
) and the corresponding cellular response in the form of protein expression (output
p
)(Eq.
1)we
72
must compute the input-output function
P
(
p
|
c
). To do so, we use a master-equation-based model to
73
construct the protein copy number distribution as a function of an extracellular inducer concentration
74
for di
erent combinations of transcription factor copy numbers and binding sites. Having these input-
75
output distributions allows us to compute the mutual information between inputs and outputs
I
(
c
;
p
)
76
for any arbitrary input distribution
P
(
c
). We opt to compute the channel capacity, i.e. the maximum
77
information that can be processed by this gene regulatory architecture, defined as Eq. 1 maximized
78
over all possible input distributions
P
(
c
). By doing so we can examine the physical limits of what cells
79
can do in terms of information processing by harboring these genetic circuits. All parameters used for
80
our model were inferred from a series of studies that span several experimental techniques [
19, 24–26],
81
allowing us to perform parameter-free predictions of this information processing capacity [
27].
82
These predictions are then contrasted with experimental data, where the channel capacity is in-
83
ferred from single-cell fluorescence distributions taken at di
erent concentrations of inducer for cells
84
with previously characterized biophysical parameters [
19, 26]. We find that our parameter-free pre-
85
dictions closely match the experiments. In this sense we demonstrate how our minimal model can
86
2
.
CC-BY-NC 4.0 International license
It is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint
.
http://dx.doi.org/10.1101/594325
doi:
bioRxiv preprint first posted online Mar. 31, 2019;
be used to quantify the resolution with which cells can resolve the environmental state with no free
87
parameters.
88
The reminder of the paper is organized as follows. In Section
1.1
we define the minimal theoretical
89
model and parameter inference for a simple repression genetic circuit. Section
1.2
discusses how
90
all parameters for the minimal model are determined from published datasets that explore di
erent
91
aspects of the simple repression motif. Section
1.3
computes the moments of the mRNA and protein
92
distributions from this minimal model. In Section
1.4
we explore the consequences of variability in
93
gene copy number during the cell cycle. In this section we compare experimental and theoretical
94
quantities related to the moments of the distribution. Specifically the predictions for the fold-change
95
in gene expression (mean expression relative to an unregulated promoter) and the gene expression
96
noise (standard deviation over mean). Section
1.5
follows with reconstruction of the full mRNA and
97
protein distribution from the moments using the maximum entropy principle. Finally Section
1.6
uses
98
the distributions from Section
1.5
to compute the maximum amount of information that the genetic
99
circuit can process. Here we again contrast our zero-parameter fit predictions with experimental
100
inferences of the channel capacity.
101
input
[
]
inducer concentration
signal processing
cellular response
output
protein expression
(A)
(B)
(C)
protein copy number
probability
probability
probability
protein copy number
input-output function
noisy
response
precise
response
cellular response as Bayesian inference
posterior
likelihood
prior
protein copy number
probability
low
high
medium
probability
low
high
medium
probability
protein copy number
probability
w
ell resolved input
poorly resolved input
environment state inference
[
]
[
]
[
]
stimulus
l
o
w
m
e
d
h
i
g
h
l
o
w
m
e
d
h
i
g
h
l
o
w
m
e
d
h
i
g
h
l
o
w
m
e
d
h
i
g
h
=
=
=
l
o
w
m
e
d
h
i
g
h
l
o
w
m
e
d
h
i
g
h
Figure 1. Cellular signaling systems sense the environment with di
erent degrees of precision
.
(A) Schematic representation of a cell as a noisy communication channel. From an environmental input
(inducer molecule concentration) to a phenotypic output (protein expression level), cellular signaling systems
can be modeled as noisy communication channels. (B) We treat cellular response to an external stimuli as a
Bayesian inference of the state of the environment. As the phenotype (protein level) serves as the internal
representation of the environmental state (inducer concentration), the probability of a cell being in a specific
environment given this internal representation
P
(
c
|
p
) is a function of the probability of the response given
that environmental state
P
(
p
|
c
). (C) The precision of the inference of the environmental state depends on
how well can cells resolve di
erent inputs. For three di
erent levels of input (left panel) the green strain
responds more precisely than the purple strain since the output distributions overlap less (middle panel). This
allows the green strain to make a more precise inference of the environmental state given a phenotypic
response (right panel).
3
.
CC-BY-NC 4.0 International license
It is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint
.
http://dx.doi.org/10.1101/594325
doi:
bioRxiv preprint first posted online Mar. 31, 2019;
1 Results
102
1.1
Minimal model of transcriptional regulation
103
We begin by defining the simple repression genetic circuit to be used throughout this work. As a
104
tractable circuit for which we have control over the parameters both theoretically and experimentally
105
we chose the so-called simple repression motif, a common regulatory scheme among prokaryotes [
28].
106
This circuit consists of a single promoter with an RNA-polymerase (RNAP) binding site and a single
107
binding site for a transcriptional repressor [
19]. The regulation due to the repressor occurs via exclusion
108
of the RNAP from its binding site when the repressor is bound, decreasing the likelihood of having
109
a transcription event. As with many important macromolecules, we consider the repressor to be
110
allosteric, meaning that it can exist in two conformations, one in which the repressor is able to bind
111
to the specific binding site (active state) and one in which it cannot bind the specific binding site
112
(inactive state). The environmental signaling occurs via passive import of an extracellular inducer
113
that binds the repressor, shifting the equilibrium between the two conformations of the repressor [
26].
114
In previous publications we have extensively characterized the mean response of this circuit under
115
di
erent conditions using equilibrium based models [
27]. In this work we build upon these models to
116
characterize the full distribution of gene expression with parameters such as repressor copy number
117
and its a
nity for the DNA being systematically varied.
118
Given the discrete nature of molecular species copy numbers inside cells, chemical master equations
119
have emerged as a useful tool to model the inherent probability distribution of these counts [
29]. In
120
Fig. 2(A) we show the minimal model and the necessary set of parameters needed to predict mRNA
121
and protein distributions. Specifically, we assume a three-state model where the promoter can be
122
found 1) In a transcriptionally active state (
A
state), 2) in a transcriptionally inactive state without
123
the repressor bound (
I
state) and 3) with the repressor bound (
R
state). We do not assume that the
124
transition between the active state
A
and the inactive state
I
happens due to RNAP binding to the
125
promoter. The transcriptional initiation kinetics involve several more steps than simple binding [
30].
126
We coarse-grain all these steps into an e
ective “on” and “o
” states for the promoter consistent with
127
experiments demonstrating the bursty nature of gene expression in
E. coli
[18]. These three states
128
generate a system of coupled di
erential equations for each of the three state distributions
P
A
(
m, p
;
t
),
129
P
I
(
m, p
;
t
) and
P
R
(
m, p
;
t
), where
m
and
p
are the mRNA and protein count per cell, respectively and
130
t
is the time. Given the rates shown in Fig. 2(A) we define the system of ODEs for a specific
m
and
131
p
. For the transcriptionally active state we have
132
dP
A
(
m, p
)
dt
=
A
!
I
z
}|
{
k
(
p
)
o
P
A
(
m, p
)+
I
!
A
z
}|
{
k
(
p
)
on
P
I
(
m, p
)
+
m
1
!
m
z
}|
{
r
m
P
A
(
m
1
,p
)
m
!
m
+1
z
}|
{
r
m
P
A
(
m, p
)+
m
+1
!
m
z
}|
{
m
(
m
+ 1)
P
A
(
m
+1
,p
)
m
!
m
1
z
}|
{
m
mP
A
(
m, p
)
+
p
1
!
p
z
}|
{
r
p
mP
A
(
m, p
1)
p
!
p
+1
z
}|
{
r
p
mP
A
(
m, p
)+
p
+1
!
p
z
}|
{
p
(
p
+ 1)
P
A
(
m, p
+ 1)
p
!
p
1
z
}|
{
p
pP
A
(
m, p
)
.
(2)
4
.
CC-BY-NC 4.0 International license
It is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint
.
http://dx.doi.org/10.1101/594325
doi:
bioRxiv preprint first posted online Mar. 31, 2019;
For the transcriptionally inactive state
I
we have
133
dP
I
(
m, p
)
dt
=
A
!
I
z
}|
{
k
(
p
)
o
P
A
(
m, p
)
I
!
A
z
}|
{
k
(
p
)
on
P
I
(
m, p
)+
R
!
I
z
}|
{
k
(
r
)
o
P
R
(
m, p
)
I
!
R
z
}|
{
k
(
r
)
on
P
I
(
m, p
)
+
m
+1
!
m
z
}|
{
m
(
m
+ 1)
P
I
(
m
+1
,p
)
m
!
m
1
z
}|
{
m
mP
I
(
m, p
)
+
p
1
!
p
z
}|
{
r
p
mP
I
(
m, p
1)
p
!
p
+1
z
}|
{
r
p
mP
I
(
m, p
)+
p
+1
!
p
z
}|
{
p
(
p
+ 1)
P
I
(
m, p
+ 1)
p
!
p
1
z
}|
{
p
pP
I
(
m, p
)
.
(3)
And finally, for the repressor bound state
R
we have
134
dP
R
(
m, p
)
dt
=
R
!
I
z
}|
{
k
(
r
)
o
P
R
(
m, p
)+
I
!
R
z
}|
{
k
(
r
)
on
P
I
(
m, p
)
+
m
+1
!
m
z
}|
{
m
(
m
+ 1)
P
R
(
m
+1
,p
)
m
!
m
1
z
}|
{
m
mP
R
(
m, p
)
+
p
1
!
p
z
}|
{
r
p
mP
R
(
m, p
1)
p
!
p
+1
z
}|
{
r
p
mP
R
(
m, p
)+
p
+1
!
p
z
}|
{
p
(
p
+ 1)
P
R
(
m, p
+ 1)
p
!
p
1
z
}|
{
p
pP
R
(
m, p
)
.
(4)
As we will discuss later in Section 1.4 the protein degradation term
p
is set to zero since we do
135
not consider protein degradation as a Poission process, but rather we explicitly implement binomial
136
partitioning as the cells grow and divide.
137
It is convenient to rewrite these equations in a compact matrix notation [
29]. For this we define
138
the vector
P
(
m, p
) as
139
P
(
m, p
)=(
P
A
(
m, p
)
,P
I
(
m, p
)
,P
R
(
m, p
))
T
,
(5)
where
T
is the transpose. By defining the matrices
K
to contain the promoter state transitions,
R
m
140
and
m
to contain the mRNA production and degradation terms, respectively, and
R
p
and
p
to
141
contain the protein production and degradation terms, respectively, the system of ODEs can then be
142
written as (See Appendix
S1 for full definition of these matrices)
143
d
P
(
m, p
)
dt
=(
K
R
m
m
m
m
R
p
p
p
)
P
(
m, p
)
+
R
m
P
(
m
1
,p
)+(
m
+ 1)
m
P
(
m
+1
,p
)
+
m
R
p
P
(
m, p
1) + (
p
+ 1)
p
P
(
m, p
+ 1)
.
(6)
1.2
Inferring parameters from published data sets
144
A decade of research in our group has characterized the simple repression motif with an ever
145
expanding array of predictions and corresponding experiments to uncover the physics of this genetic
146
circuit [
27]. In doing so we have come to understand the mean response of a single promoter in the
147
presence of varying levels of repressor copy numbers and repressor-DNA a
nities [
19], due to the e
ect
148
that competing binding sites and multiple promoter copies impose [
25], and in recent work, assisted by
149
5
.
CC-BY-NC 4.0 International license
It is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint
.
http://dx.doi.org/10.1101/594325
doi:
bioRxiv preprint first posted online Mar. 31, 2019;
the Monod-Wyman-Changeux (MWC) model, we expanded the scope to the allosteric nature of the
150
repressor [
26]. All of these studies have exploited the simplicity and predictive power of equilibrium
151
approximations to these non-equilibrium systems [
31]. We have also used a similar kinetic model to
152
the one depicted in Fig. 2(A) to study the noise in mRNA copy number [
24]. As a test case of the
153
depth of our theoretical understanding of the so-called “hydrogen atom” of transcriptional regulation
154
we combine all of the studies mentioned above to inform the parameter values of the model presented
155
in Fig. 2(A). Fig. 2(B) schematizes the data sets and experimental techniques used to measure gene
156
expression along with the parameters that can be inferred from them.
157
Appendix
S2 expands on the details of how the inference was performed for each of the parameters.
158
Briefly the promoter activation and inactivation rates
k
(
p
)
on
and
k
(
p
)
o
, as well as the transcription rate
159
r
m
were obtained in units of the mRNA degradation rate
m
by fitting a two-state promoter model
160
(no state
R
from Fig. 2(A)) [
32] to mRNA FISH data of an unregulated promoter (no repressor
161
present in the cell) [
24]. The repressor on rate is assumed to be of the form
k
(
r
)
on
=
k
o
[
R
]where
k
o
162
is a di
usion-limited on rate and [
R
] is the concentration of active repressor in the cell [
24]. This
163
concentration of active repressor is at the same time determined by the mean repressor copy number
164
in the cell, and the fraction of repressors in the active state. Existing estimates of the transition rates
165
between conformations of allosteric molecules set them at the microsecond scale [
33]. By considering
166
this to be representative for our repressor of interest, the separation of time-scales between the rapid
167
conformational changes of the repressor and the slower downstream processes such as the open-complex
168
formation processes allow us to model the probability of the repressor being in the active state as an
169
equilibrium MWC process. The parameters of the MWC model
K
A
,
K
I
and
"
AI
were previously
170
characterized from video-microscopy and flow-cytometry data [
26]. For the repressor o
rate
k
(
r
)
o
we
171
take advantage of the fact that the mean mRNA copy number as derived from the model in Fig. 2(A)
172
cast in the language of rates is of the same functional form as the equilibrium model cast in the
173
language of binding energies [
34]. Therefore the value of the repressor-DNA binding energy
"
r
174
constrains the value of the repressor o
rate
k
(
r
)
o
. These constraints on the rates allow us to make
175
self-consistent predictions under both, the equilibrium and the kinetic framework.
176
1.3
Computing the moments of the mRNA and protein distributions
177
Solving chemical master equations represent a challenge that is still an active area of research.
178
An alternative approach is to find schemes to approximate the distribution. One such scheme, the
179
maximum entropy principle, makes use of the moments of the distribution to approximate the full
180
distribution. In this section we will demonstrate an iterative algorithm to compute the mRNA and
181
protein distribution moments.
182
Our simple repression kinetic model depicted in Fig. 2(A) consists of an infinite system of ODEs
183
for each possible pair
m, p
. To compute any moment of the distribution we define a vector
184
h
m
x
p
y
i⌘
(
h
m
x
p
y
i
A
,
h
m
x
p
y
i
I
,
h
m
x
p
y
i
R
)
T
,
(7)
where
h
m
x
p
y
i
S
is the expected value of
m
x
p
y
in state
S
2
{
A, I, R
}
for
x, y
2
N
. In other words, just
185
as we defined the vector
P
(
m, p
), here we define a vector to collect the expected value of each of the
186
promoter states. By definition any of these moments
h
m
x
p
y
i
S
are computed as
187
h
m
x
p
y
i
S
1
X
m
=0
1
X
p
=0
m
x
p
y
P
S
(
m, p
)
.
(8)
6
.
CC-BY-NC 4.0 International license
It is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint
.
http://dx.doi.org/10.1101/594325
doi:
bioRxiv preprint first posted online Mar. 31, 2019;
2
0
4
0
6
0
0
l
l
e
c
/
A
N
R
m
probability
0.05
0.04
0.03
0.02
0.01
0.00
(A)
KINETIC MODEL AND MINIMAL PARAMETER SET
(B)
PARAMETER DETERMINATION
promoter
on
and
o
ff
rates
transcription
rate
repressor unbinding rate
repressor binding rate
fold-change
1.0
0.8
0.6
0.4
0.2
0.0
0
10
–6
10
–4
10
–2
)
M
(
G
T
P
I
10
0
10
1
10
2
10
3
number of repressors
10
0
10
–1
10
–2
10
–3
10
–4
fold-change
10
0
10
–1
10
–2
10
–3
10
–4
10
0
10
1
10
2
10
3
number of repressors
fold-change
single-site repression
(via colorimetric assay)
transcription factor competition
(via video
microscopy)
allosteric regulation
(via
fl
ow cytometry)
mRNA expression noise
(via
mRNA FISH)
IPTG dissociation
constants
free energy
active-inactive
energy di
ff
erence
STATE
R
STATE
I
STATE
A
repressor-DNA
binding energy
Figure 2. Minimal kinetic model of transcriptional regulation for a simple repression
architecture.
(A) Three-state promoter stochastic model of transcriptional regulation by a repressor. The
regulation by the repressor occurs via exclusion of the transcription initiation machinery, not allowing the
promoter to transition to the transcriptionally active state. All parameters highlighted with colored boxes
were determined from published datasets based on the same genetic circuit. (B) Data sets used to infer the
parameter values. From left to right Garcia & Phillips [
19
]isusedtodetermine
k
(
r
)
o
and
k
(
r
)
on
, Brewster et al.
[
25
]isusedtodetermine
"
AI
and
k
(
r
)
on
, Razo-Mejia et al. [
26
]isusedtodetermine
K
A
,
K
I
, and
k
(
r
)
on
and
Jones et al. is used to determine
r
m
,
k
(
p
)
on
, and
k
(
p
)
o
.
Summing over all possible
m
and
p
values in Eq.
6
results in a ODE for any moment of the
188
7
.
CC-BY-NC 4.0 International license
It is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint
.
http://dx.doi.org/10.1101/594325
doi:
bioRxiv preprint first posted online Mar. 31, 2019;
distribution of the form (See Appendix
S3 for full derivation)
189
d
h
m
x
p
y
i
dt
=
K
h
m
x
p
y
i
+
R
m
h
p
y
[(
m
+
1
)
x
m
x
]
i
+
m
h
mp
y
[(
m
1
)
x
m
x
]
i
+
R
p
D
m
(
x
+
1
)
[(
p
+
1
)
y
p
y
]
E
+
p
h
m
x
p
[(
p
1
)
y
p
y
]
i
.
(9)
Given that all transitions in our stochastic model are first order reactions, Eq. 9 has no moment-
190
closure problem [
13]. What this means is that the dynamical equation for a given moment only depends
191
on lower moments (See Appendix
S3 for full proof). This feature of our model implies, for example,
192
that the second moment of the protein distribution
p
2
depends only on the first two moments of the
193
mRNA distribution
h
m
i
, and
m
2
, the first protein moment
h
p
i
and the cross-correlation term
h
mp
i
.
194
We can therefore define
μ
(
x
,
y
)
to be a vector containing all moments up to
h
m
x
p
y
i
for all promoter
195
states. This is
196
μ
(
x
,
y
)
=
⇥⌦
m
0
p
0
,
m
1
p
0
,...,
h
m
x
p
y
i
T
.
(10)
Explicitly for the three-state promoter model depicted in Fig. 2(A) this vector takes the form
197
μ
(
x
,
y
)
=
⇥⌦
m
0
p
0
A
,
m
0
p
0
I
,
m
0
p
0
R
,...,
h
m
x
p
y
i
A
,
h
m
x
p
y
i
I
,
h
m
x
p
y
i
R
T
.
(11)
Given this definition we can compute the general moment dynamics as
198
d
μ
(
x
,
y
)
dt
=
A
μ
(
x
,
y
)
,
(12)
where
A
is a square matrix that contains all the numeric coe
cients that relate each of the moments.
199
We can then use Eq. 9 to build matrix
A
by iteratively substituting values for the exponents
x
and
y
200
up to a specified value. In the next section, we will use Eq. 12 to numerically integrate the dynamical
201
equations for our moments of interest as cells progress through the cell cycle.
202
1.4
Accounting for cell-cycle dependent variability in gene dosage
203
As cells progress through the cell cycle, the genome has to be replicated to guarantee that each
204
daughter cell receives a copy of the genetic material. This replication of the genome implies that
205
cells spend part of the cell cycle with multiple copies of each gene depending on the cellular growth
206
rate and the relative position of the gene with respect to the replication origin [
35]. Genes closer to
207
the replication origin spend a larger fraction of the cell cycle with multiple copies compared to genes
208
closer to the replication termination site [
35]. Fig. 3(A) depicts a schematic of this process where
209
the replication origin (
oriC
) and the relevant locus for our experimental measurements (
galK
) are
210
highlighted.
211
Since this change in gene copy number has been shown to have an e
ect on cell-to-cell variability in
212
gene expression [
24, 36], we now extend our minimal model to account for these changes in gene copy
213
number during the cell cycle. We reason that the only di
erence between the single-copy state and the
214
two-copies states of the promoter is a doubling of the mRNA production rate
r
m
. In particular the
215
promoter activation and inactivation rates
k
(
p
)
on
and
k
(
p
)
o
and the mRNA production rate
r
m
inferred in
216
8
.
CC-BY-NC 4.0 International license
It is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint
.
http://dx.doi.org/10.1101/594325
doi:
bioRxiv preprint first posted online Mar. 31, 2019;
Section 1.1 assume that cells spend a fraction
f
of the cell cycle with one copy of the promoter (mRNA
217
production rate
r
m
) and a fraction (1
f
) of the cell cycle with two copies of the promoter (mRNA
218
production rate 2
r
m
). This inference was performed considering that at each cell state the mRNA
219
level immediately reaches the steady state value for the corresponding mRNA production rate. This
220
assumption is justified since the timescale to reach this steady state depends only on the degradation
221
rate
m
, which for the mRNA is much shorter (
3 min ) than the length of the cell cycle (100 min
222
for our experimental conditions) [
37]. Appendix
S2 shows that a model accounting for this gene copy
223
number variability is able to capture the experimental data from single molecule mRNA counts of an
224
unregulated (constitutively expressed) promoter.
225
Given that the protein degradation rate
p
in our model is set by the cell division time, we do
226
not expect that the protein count will reach the corresponding steady state value for each stage in
227
the cell cycle. In other words, cells do not spend long enough with two copies of the promoter for the
228
protein level to reach the steady state value corresponding to a transcription rate of 2
r
m
. We therefore
229
use the dynamical equations developed in Section 1.3 to numerically integrate the time trajectory of
230
the moments of the distribution with the corresponding parameters for each phase of the cell cycle.
231
Fig. 3(B) shows an example corresponding to the mean mRNA level (upper panel) and the mean
232
protein level (lower panel) for the case of the unregulated promoter. Given that we inferred the
233
promoter rates parameters considering that mRNA reaches steady state at each stage, we see that the
234
numerical integration of the equations is consistent with the assumption of having the mRNA reach
235
a stable value at each stage (See Fig. 3(B) upper panel). On the other hand, the mean protein level
236
does not reach a steady state at either of the cellular stages. Nevertheless it is interesting to observe
237
that after a couple of cell cycles the trajectory from cycle to cycle follows a repetitive pattern (See
238
Fig. 3(B) lower panel). Previously we have experimentally observe this repetitive pattern by tracking
239
the expression level over time with video microscopy as shown in Fig. 18 of [
27].
240
To test the e
ects of including this gene copy number variability in our model we now compare
241
the predictions of the model with experimental data. Specifically as detailed in Methods we obtained
242
single-cell fluorescence values of di
erent
E. coli
strains under twelve di
erent inducer concentrations.
243
The strains imaged spanned three orders of magnitude in repressor copy number and three distinct
244
repressor-DNA a
nities. Since growth was asynchronous, we reason that cells were randomly sampled
245
at all stages of the cell cycle. Therefore when computing statistics from the data such as the mean
246
fluorescence value, in reality we are averaging over the cell cycle. In other words, as depicted in
247
Fig. 3(B) quantities such as the mean protein copy number change over time, i.e.
h
p
i⌘h
p
(
t
)
i
.This
248
means that computing the mean of a population of unsynchronized cells is equivalent to averaging
249
this time dependent mean protein copy number over the span of the cell cycle. Mathematically this
250
is expressed as
251
h
p
i
c
=
Z
t
d
t
o
h
p
(
t
)
i
P
(
t
)
dt,
(13)
where
h
p
i
c
represents the average protein copy number over a cell cycle,
t
o
represents the start of the
252
cell cycle,
t
d
represents the time of cell division, and
P
(
t
) represents the probability of any cell being
253
at time
t
2
[
t
o
,t
d
] of their cell cycle. We do not consider cells uniformly distributed along the cell
254
cycle since it is known that cells follow an exponential distribution, having more younger than older
255
cells at any time point [
38]. All computations hereafter are therefore done by applying an averaging
256
like the one in Eq. 13 for the span of a cell cycle. We remind the reader that these time averages are
257
done under a fixed environmental state. It is the trajectory of cells over cell cycles under a constant
258
9
.
CC-BY-NC 4.0 International license
It is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint
.
http://dx.doi.org/10.1101/594325
doi:
bioRxiv preprint first posted online Mar. 31, 2019;