First-principles prediction of the information processing capacity of a simple genetic circuit

First-principles prediction of the information processing

1

capacity of a simple genetic circuit

2

Manuel Razo-Mejia

and Rob Phillips

1, 2, 3, *

3

Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA

4

Department of Physics, California Institute of Technology, Pasadena, CA 91125, USA

5

Department of Applied Physics, California Institute of Technology, Pasadena, CA 91125, USA

6

Correspondence: phillips@pboc.caltech.edu

7

8

Abstract

9

Given the stochastic nature of gene expression, genetically identical cells exposed to the same

10

environmental inputs will produce di

↵

erent outputs. This heterogeneity has consequences for how

11

cells are able to survive in changing environments. Recent work has explored the use of information

12

theory as a framework to understand the accuracy with which cells can ascertain the state of their

13

surroundings. Yet the predictive power of these approaches is limited and has not been rigorously

14

tested using precision measurements. To that end, we generate a minimal model for a simple genetic

15

circuit in which all parameter values for the model come from independently published data sets.

16

We then predict the information processing capacity of the genetic circuit for a suite of biophysical

17

parameters such as protein copy number and protein-DNA a

nity. We compare these parameter-

18

free predictions with an experimental determination of the information processing capacity of

19

coli

cells, and find that our minimal model accurately captures the experimental data.

20

As living organisms thrive in some given environment, they are faced with constant changes in their

21

surroundings. From abiotic conditions such as temperature fluctuations or changes in osmotic pressure,

22

to biological interactions such as cell-to-cell communication in a tissue or in a bacterial biofilm, living

23

organisms of all types sense and respond to external signals. Fig. 1(A) shows a schematic of this

24

process for a bacterial cell sensing a concentration of an extracellular chemical. At the molecular

25

level where signal transduction unfolds mechanistically, there are physical constraints on the accuracy

26

and precision of these responses given by intrinsic stochastic fluctuations [

1]. This means that two

27

genetically identical cells exposed to the same stimulus will not have an identical response [

2].

28

The implication of this biological noise is that cells do not have an infinite resolution to distinguish

29

signals and, as a consequence, there is a one-to-many mapping between inputs and outputs. Further-

30

more, given the limited number of possible outputs, there are overlapping responses between di

↵

erent

31

inputs. In that sense, one might think of cells performing a Bayesian inference of the state of the

32

environment given their phenotypic response, as schematized in Fig. 1(B). The question then becomes

33

how to analyze this probabilistic rather than deterministic relationship between inputs and outputs?

34

The abstract answer to this question was worked out in 1948 by Claude Shannon who, in his seminal

35

work, founded the field of information theory [

3]. Shannon developed a general framework for how

36

to analyze information transmission through noisy communication channels. In his work, Shannon

37

showed that the only quantity that satisfies simple conditions of how a metric for information should

38

behave, was of the same functional form as the thermodynamic entropy – thereby christening his met-

39

ric the information entropy [

4]. He also gave a definition, based on this information entropy, for the

40

relationship between inputs and outputs known as the mutual information. The mutual information

41

(

;

)betweeninput

and output

, given by

42

(

;

(

)

(

) log

(

)

(

)

(1)

CC-BY-NC 4.0 International license

It is made available under a

(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint

http://dx.doi.org/10.1101/594325

doi:

bioRxiv preprint first posted online Mar. 31, 2019;

quantifies how much we learn about the state of the input

given that we get to observe the output

43

44

It is natural to conceive of scenarios in which living organisms that can better resolve signals might

45

have an evolutionary benefit, making it more likely that their o

↵

spring will have a fitness advantage [

5].

46

In recent years there has been a growing interest in understanding the theoretical limits on cellular

47

information processing [

6, 7], and in quantifying how close evolution has pushed cellular signaling

48

pathways to these theoretical limits [

8–10]. While these studies have treated the signaling pathway

49

as a “black box” explicitly ignoring all the molecular interactions taking place in them, other studies

50

have explored the role that molecular players and regulatory architectures have on these information

51

processing tasks [

11–17]. Despite the great advances in our understanding of the information processing

52

capabilities of molecular mechanisms, the field still lacks a rigorous experimental test of these ideas

53

with precision measurements on a simple system tractable both theoretically and experimentally.

54

Over the last decade the dialogue between theory and experiments in gene regulation has led to

55

predictive power not only over the mean, but the noise in gene expression as a function of relevant

56

parameters such as regulatory protein copy numbers, a

nity of these proteins to the DNA promoter,

57

as well as the extracellular concentrations of inducer molecules [

18–21]. These models based on

58

equilibrium and non-equilibrium statistical physics have reached a predictive accuracy level such that

59

for simple cases it is now possible to design input-output functions [

22, 23]. This opens the possibility

60

to exploit these predictive models to tackle the question of how much information genetic circuits

61

can process. The question lays at the heart of understanding the precision of the cellular response to

62

environmental signals. Fig. 1(C) schematizes a scenario in which two bacterial strains respond with

63

↵

erent levels of precision to three possible environmental states, i.e. inducer concentrations. The

64

overlap between the three di

↵

erent responses is what precisely determines the resolution with which

65

cells can distinguish di

↵

erent inputs. This is analogous to how for an imaging system the point spread

66

function limits the ability to resolve two light emitting point sources.

67

In this work we follow the same philosophy of theory-experiment dialogue used to determine model

68

parameters to predict from first principles the e

↵

ect that biophysical parameters such as transcription

69

factor copy number and protein-DNA a

nity have on the information processing capacity of a simple

70

genetic circuit. Specifically, to predict the mutual information between an extracellular chemical signal

71

(input

) and the corresponding cellular response in the form of protein expression (output

)(Eq.

1)we

72

must compute the input-output function

(

). To do so, we use a master-equation-based model to

73

construct the protein copy number distribution as a function of an extracellular inducer concentration

74

for di

↵

erent combinations of transcription factor copy numbers and binding sites. Having these input-

75

output distributions allows us to compute the mutual information between inputs and outputs

(

;

)

76

for any arbitrary input distribution

(

). We opt to compute the channel capacity, i.e. the maximum

77

information that can be processed by this gene regulatory architecture, defined as Eq. 1 maximized

78

over all possible input distributions

(

). By doing so we can examine the physical limits of what cells

79

can do in terms of information processing by harboring these genetic circuits. All parameters used for

80

our model were inferred from a series of studies that span several experimental techniques [

19, 24–26],

81

allowing us to perform parameter-free predictions of this information processing capacity [

27].

82

These predictions are then contrasted with experimental data, where the channel capacity is in-

83

ferred from single-cell fluorescence distributions taken at di

↵

erent concentrations of inducer for cells

84

with previously characterized biophysical parameters [

19, 26]. We find that our parameter-free pre-

85

dictions closely match the experiments. In this sense we demonstrate how our minimal model can

86

CC-BY-NC 4.0 International license

It is made available under a

(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint

http://dx.doi.org/10.1101/594325

doi:

bioRxiv preprint first posted online Mar. 31, 2019;

be used to quantify the resolution with which cells can resolve the environmental state with no free

87

parameters.

88

The reminder of the paper is organized as follows. In Section

1.1

we define the minimal theoretical

89

model and parameter inference for a simple repression genetic circuit. Section

1.2

discusses how

90

all parameters for the minimal model are determined from published datasets that explore di

↵

erent

91

aspects of the simple repression motif. Section

1.3

computes the moments of the mRNA and protein

92

distributions from this minimal model. In Section

1.4

we explore the consequences of variability in

93

gene copy number during the cell cycle. In this section we compare experimental and theoretical

94

quantities related to the moments of the distribution. Specifically the predictions for the fold-change

95

in gene expression (mean expression relative to an unregulated promoter) and the gene expression

96

noise (standard deviation over mean). Section

1.5

follows with reconstruction of the full mRNA and

97

protein distribution from the moments using the maximum entropy principle. Finally Section

1.6

uses

98

the distributions from Section

1.5

to compute the maximum amount of information that the genetic

99

circuit can process. Here we again contrast our zero-parameter fit predictions with experimental

100

inferences of the channel capacity.

101

input

[

]

inducer concentration

signal processing

cellular response

output

protein expression

(A)

(B)

(C)

protein copy number

probability

protein copy number

input-output function

noisy

response

precise

response

cellular response as Bayesian inference

posterior

likelihood

prior

protein copy number

probability

low

high

medium

probability

low

high

medium

probability

protein copy number

probability

ell resolved input

poorly resolved input

environment state inference

[

]

[

]

[

]

stimulus

Figure 1. Cellular signaling systems sense the environment with di

↵

erent degrees of precision

(A) Schematic representation of a cell as a noisy communication channel. From an environmental input

(inducer molecule concentration) to a phenotypic output (protein expression level), cellular signaling systems

can be modeled as noisy communication channels. (B) We treat cellular response to an external stimuli as a

Bayesian inference of the state of the environment. As the phenotype (protein level) serves as the internal

representation of the environmental state (inducer concentration), the probability of a cell being in a specific

environment given this internal representation

(

) is a function of the probability of the response given

that environmental state

(

). (C) The precision of the inference of the environmental state depends on

how well can cells resolve di

↵

erent inputs. For three di

↵

erent levels of input (left panel) the green strain

responds more precisely than the purple strain since the output distributions overlap less (middle panel). This

allows the green strain to make a more precise inference of the environmental state given a phenotypic

response (right panel).

CC-BY-NC 4.0 International license

It is made available under a

(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint

http://dx.doi.org/10.1101/594325

doi:

bioRxiv preprint first posted online Mar. 31, 2019;

1 Results

102

1.1

Minimal model of transcriptional regulation

103

We begin by defining the simple repression genetic circuit to be used throughout this work. As a

104

tractable circuit for which we have control over the parameters both theoretically and experimentally

105

we chose the so-called simple repression motif, a common regulatory scheme among prokaryotes [

28].

106

This circuit consists of a single promoter with an RNA-polymerase (RNAP) binding site and a single

107

binding site for a transcriptional repressor [

19]. The regulation due to the repressor occurs via exclusion

108

of the RNAP from its binding site when the repressor is bound, decreasing the likelihood of having

109

a transcription event. As with many important macromolecules, we consider the repressor to be

110

allosteric, meaning that it can exist in two conformations, one in which the repressor is able to bind

111

to the specific binding site (active state) and one in which it cannot bind the specific binding site

112

(inactive state). The environmental signaling occurs via passive import of an extracellular inducer

113

that binds the repressor, shifting the equilibrium between the two conformations of the repressor [

26].

114

In previous publications we have extensively characterized the mean response of this circuit under

115

↵

erent conditions using equilibrium based models [

27]. In this work we build upon these models to

116

characterize the full distribution of gene expression with parameters such as repressor copy number

117

and its a

nity for the DNA being systematically varied.

118

Given the discrete nature of molecular species copy numbers inside cells, chemical master equations

119

have emerged as a useful tool to model the inherent probability distribution of these counts [

29]. In

120

Fig. 2(A) we show the minimal model and the necessary set of parameters needed to predict mRNA

121

and protein distributions. Specifically, we assume a three-state model where the promoter can be

122

found 1) In a transcriptionally active state (

state), 2) in a transcriptionally inactive state without

123

the repressor bound (

state) and 3) with the repressor bound (

state). We do not assume that the

124

transition between the active state

and the inactive state

happens due to RNAP binding to the

125

promoter. The transcriptional initiation kinetics involve several more steps than simple binding [

30].

126

We coarse-grain all these steps into an e

↵

ective “on” and “o

↵

” states for the promoter consistent with

127

experiments demonstrating the bursty nature of gene expression in

E. coli

[18]. These three states

128

generate a system of coupled di

↵

erential equations for each of the three state distributions

(

m, p

;

129

(

m, p

;

) and

(

m, p

;

), where

and

are the mRNA and protein count per cell, respectively and

130

is the time. Given the rates shown in Fig. 2(A) we define the system of ODEs for a specific

and

131

. For the transcriptionally active state we have

132

(

m, p

)

{

(

)

↵

(

m, p

{

(

)

(

m, p

)

{

(

)

{

(

m, p

{

(

+ 1)

(

)

{

(

m, p

)

{

(

m, p

{

(

m, p

{

(

+ 1)

(

m, p

+ 1)

{

(

m, p

)

(2)

CC-BY-NC 4.0 International license

It is made available under a

(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint

http://dx.doi.org/10.1101/594325

doi:

bioRxiv preprint first posted online Mar. 31, 2019;

For the transcriptionally inactive state

we have

133

(

m, p

)

{

(

)

↵

(

m, p

)

{

(

)

(

m, p

{

(

)

↵

(

m, p

)

{

(

)

(

m, p

)

{

(

+ 1)

(

)

{

(

m, p

)

{

(

m, p

{

(

m, p

{

(

+ 1)

(

m, p

+ 1)

{

(

m, p

)

(3)

And finally, for the repressor bound state

we have

134

(

m, p

)

{

(

)

↵

(

m, p

{

(

)

(

m, p

)

{

(

+ 1)

(

)

{

(

m, p

)

{

(

m, p

{

(

m, p

{

(

+ 1)

(

m, p

+ 1)

{

(

m, p

)

(4)

As we will discuss later in Section 1.4 the protein degradation term

is set to zero since we do

135

not consider protein degradation as a Poission process, but rather we explicitly implement binomial

136

partitioning as the cells grow and divide.

137

It is convenient to rewrite these equations in a compact matrix notation [

29]. For this we define

138

the vector

(

m, p

) as

139

(

m, p

)=(

(

m, p

)

(

m, p

)

(

m, p

))

(5)

where

is the transpose. By defining the matrices

to contain the promoter state transitions,

140

and

to contain the mRNA production and degradation terms, respectively, and

and

141

contain the protein production and degradation terms, respectively, the system of ODEs can then be

142

written as (See Appendix

S1 for full definition of these matrices)

143

(

m, p

)

(

m, p

)

(

)+(

+ 1)

(

)

(

m, p

1) + (

+ 1)

(

m, p

+ 1)

(6)

1.2

Inferring parameters from published data sets

144

A decade of research in our group has characterized the simple repression motif with an ever

145

expanding array of predictions and corresponding experiments to uncover the physics of this genetic

146

circuit [

27]. In doing so we have come to understand the mean response of a single promoter in the

147

presence of varying levels of repressor copy numbers and repressor-DNA a

nities [

19], due to the e

↵

ect

148

that competing binding sites and multiple promoter copies impose [

25], and in recent work, assisted by

149

CC-BY-NC 4.0 International license

It is made available under a

(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint

http://dx.doi.org/10.1101/594325

doi:

bioRxiv preprint first posted online Mar. 31, 2019;

the Monod-Wyman-Changeux (MWC) model, we expanded the scope to the allosteric nature of the

150

repressor [

26]. All of these studies have exploited the simplicity and predictive power of equilibrium

151

approximations to these non-equilibrium systems [

31]. We have also used a similar kinetic model to

152

the one depicted in Fig. 2(A) to study the noise in mRNA copy number [

24]. As a test case of the

153

depth of our theoretical understanding of the so-called “hydrogen atom” of transcriptional regulation

154

we combine all of the studies mentioned above to inform the parameter values of the model presented

155

in Fig. 2(A). Fig. 2(B) schematizes the data sets and experimental techniques used to measure gene

156

expression along with the parameters that can be inferred from them.

157

Appendix

S2 expands on the details of how the inference was performed for each of the parameters.

158

Briefly the promoter activation and inactivation rates

(

)

and

(

)

↵

, as well as the transcription rate

159

were obtained in units of the mRNA degradation rate

by fitting a two-state promoter model

160

(no state

from Fig. 2(A)) [

32] to mRNA FISH data of an unregulated promoter (no repressor

161

present in the cell) [

24]. The repressor on rate is assumed to be of the form

(

)

[

]where

162

is a di

↵

usion-limited on rate and [

] is the concentration of active repressor in the cell [

24]. This

163

concentration of active repressor is at the same time determined by the mean repressor copy number

164

in the cell, and the fraction of repressors in the active state. Existing estimates of the transition rates

165

between conformations of allosteric molecules set them at the microsecond scale [

33]. By considering

166

this to be representative for our repressor of interest, the separation of time-scales between the rapid

167

conformational changes of the repressor and the slower downstream processes such as the open-complex

168

formation processes allow us to model the probability of the repressor being in the active state as an

169

equilibrium MWC process. The parameters of the MWC model

and

were previously

170

characterized from video-microscopy and flow-cytometry data [

26]. For the repressor o

↵

rate

(

)

↵

171

take advantage of the fact that the mean mRNA copy number as derived from the model in Fig. 2(A)

172

cast in the language of rates is of the same functional form as the equilibrium model cast in the

173

language of binding energies [

34]. Therefore the value of the repressor-DNA binding energy

174

constrains the value of the repressor o

↵

rate

(

)

↵

. These constraints on the rates allow us to make

175

self-consistent predictions under both, the equilibrium and the kinetic framework.

176

1.3

Computing the moments of the mRNA and protein distributions

177

Solving chemical master equations represent a challenge that is still an active area of research.

178

An alternative approach is to find schemes to approximate the distribution. One such scheme, the

179

maximum entropy principle, makes use of the moments of the distribution to approximate the full

180

distribution. In this section we will demonstrate an iterative algorithm to compute the mRNA and

181

protein distribution moments.

182

Our simple repression kinetic model depicted in Fig. 2(A) consists of an infinite system of ODEs

183

for each possible pair

m, p

. To compute any moment of the distribution we define a vector

184

i⌘

(

)

(7)

where

is the expected value of

in state

{

A, I, R

}

for

x, y

N

. In other words, just

185

as we defined the vector

(

m, p

), here we define a vector to collect the expected value of each of the

186

promoter states. By definition any of these moments

are computed as

187

⌘

(

m, p

)

(8)

CC-BY-NC 4.0 International license

It is made available under a

(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint

http://dx.doi.org/10.1101/594325

doi:

bioRxiv preprint first posted online Mar. 31, 2019;

probability

0.05

0.04

0.03

0.02

0.01

0.00

(A)

KINETIC MODEL AND MINIMAL PARAMETER SET

(B)

PARAMETER DETERMINATION

promoter

and

ff

rates

transcription

rate

repressor unbinding rate

repressor binding rate

fold-change

1.0

0.8

0.6

0.4

0.2

0.0

–6

–4

–2

)

(

number of repressors

–1

–2

–3

–4

fold-change

–1

–2

–3

–4

number of repressors

fold-change

single-site repression

(via colorimetric assay)

transcription factor competition

(via video

microscopy)

allosteric regulation

(via

ow cytometry)

mRNA expression noise

(via

mRNA FISH)

IPTG dissociation

constants

free energy

active-inactive

energy di

erence

STATE

repressor-DNA

binding energy

Figure 2. Minimal kinetic model of transcriptional regulation for a simple repression

architecture.

(A) Three-state promoter stochastic model of transcriptional regulation by a repressor. The

regulation by the repressor occurs via exclusion of the transcription initiation machinery, not allowing the

promoter to transition to the transcriptionally active state. All parameters highlighted with colored boxes

were determined from published datasets based on the same genetic circuit. (B) Data sets used to infer the

parameter values. From left to right Garcia & Phillips [

]isusedtodetermine

(

)

↵

and

(

)

, Brewster et al.

[

]isusedtodetermine

and

(

)

, Razo-Mejia et al. [

]isusedtodetermine

, and

(

)

and

Jones et al. is used to determine

(

)

, and

(

)

↵

Summing over all possible

and

values in Eq.

results in a ODE for any moment of the

188

CC-BY-NC 4.0 International license

It is made available under a

(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint

http://dx.doi.org/10.1101/594325

doi:

bioRxiv preprint first posted online Mar. 31, 2019;

distribution of the form (See Appendix

S3 for full derivation)

189

[(

)

]

[(

)

]

(

)

[(

)

]

[(

)

]

(9)

Given that all transitions in our stochastic model are first order reactions, Eq. 9 has no moment-

190

closure problem [

13]. What this means is that the dynamical equation for a given moment only depends

191

on lower moments (See Appendix

S3 for full proof). This feature of our model implies, for example,

192

that the second moment of the protein distribution

⌦

↵

depends only on the first two moments of the

193

mRNA distribution

, and

⌦

↵

, the first protein moment

and the cross-correlation term

194

We can therefore define

μ

(

)

to be a vector containing all moments up to

for all promoter

195

states. This is

196

μ

(

)

⇥⌦

↵

⌦

↵

,...,

⇤

(10)

Explicitly for the three-state promoter model depicted in Fig. 2(A) this vector takes the form

197

μ

(

)

⇥⌦

↵

⌦

↵

⌦

↵

,...,

⇤

(11)

Given this definition we can compute the general moment dynamics as

198

μ

(

)

μ

(

)

(12)

where

is a square matrix that contains all the numeric coe

cients that relate each of the moments.

199

We can then use Eq. 9 to build matrix

by iteratively substituting values for the exponents

and

200

up to a specified value. In the next section, we will use Eq. 12 to numerically integrate the dynamical

201

equations for our moments of interest as cells progress through the cell cycle.

202

1.4

Accounting for cell-cycle dependent variability in gene dosage

203

As cells progress through the cell cycle, the genome has to be replicated to guarantee that each

204

daughter cell receives a copy of the genetic material. This replication of the genome implies that

205

cells spend part of the cell cycle with multiple copies of each gene depending on the cellular growth

206

rate and the relative position of the gene with respect to the replication origin [

35]. Genes closer to

207

the replication origin spend a larger fraction of the cell cycle with multiple copies compared to genes

208

closer to the replication termination site [

35]. Fig. 3(A) depicts a schematic of this process where

209

the replication origin (

oriC

) and the relevant locus for our experimental measurements (

galK

) are

210

highlighted.

211

Since this change in gene copy number has been shown to have an e

↵

ect on cell-to-cell variability in

212

gene expression [

24, 36], we now extend our minimal model to account for these changes in gene copy

213

number during the cell cycle. We reason that the only di

↵

erence between the single-copy state and the

214

two-copies states of the promoter is a doubling of the mRNA production rate

. In particular the

215

promoter activation and inactivation rates

(

)

and

(

)

↵

and the mRNA production rate

inferred in

216

CC-BY-NC 4.0 International license

It is made available under a

(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint

http://dx.doi.org/10.1101/594325

doi:

bioRxiv preprint first posted online Mar. 31, 2019;

Section 1.1 assume that cells spend a fraction

of the cell cycle with one copy of the promoter (mRNA

217

production rate

) and a fraction (1

) of the cell cycle with two copies of the promoter (mRNA

218

production rate 2

). This inference was performed considering that at each cell state the mRNA

219

level immediately reaches the steady state value for the corresponding mRNA production rate. This

220

assumption is justified since the timescale to reach this steady state depends only on the degradation

221

rate

, which for the mRNA is much shorter (

⇡

3 min ) than the length of the cell cycle (100 min

222

for our experimental conditions) [

37]. Appendix

S2 shows that a model accounting for this gene copy

223

number variability is able to capture the experimental data from single molecule mRNA counts of an

224

unregulated (constitutively expressed) promoter.

225

Given that the protein degradation rate

in our model is set by the cell division time, we do

226

not expect that the protein count will reach the corresponding steady state value for each stage in

227

the cell cycle. In other words, cells do not spend long enough with two copies of the promoter for the

228

protein level to reach the steady state value corresponding to a transcription rate of 2

. We therefore

229

use the dynamical equations developed in Section 1.3 to numerically integrate the time trajectory of

230

the moments of the distribution with the corresponding parameters for each phase of the cell cycle.

231

Fig. 3(B) shows an example corresponding to the mean mRNA level (upper panel) and the mean

232

protein level (lower panel) for the case of the unregulated promoter. Given that we inferred the

233

promoter rates parameters considering that mRNA reaches steady state at each stage, we see that the

234

numerical integration of the equations is consistent with the assumption of having the mRNA reach

235

a stable value at each stage (See Fig. 3(B) upper panel). On the other hand, the mean protein level

236

does not reach a steady state at either of the cellular stages. Nevertheless it is interesting to observe

237

that after a couple of cell cycles the trajectory from cycle to cycle follows a repetitive pattern (See

238

Fig. 3(B) lower panel). Previously we have experimentally observe this repetitive pattern by tracking

239

the expression level over time with video microscopy as shown in Fig. 18 of [

27].

240

To test the e

↵

ects of including this gene copy number variability in our model we now compare

241

the predictions of the model with experimental data. Specifically as detailed in Methods we obtained

242

single-cell fluorescence values of di

↵

erent

E. coli

strains under twelve di

↵

erent inducer concentrations.

243

The strains imaged spanned three orders of magnitude in repressor copy number and three distinct

244

repressor-DNA a

nities. Since growth was asynchronous, we reason that cells were randomly sampled

245

at all stages of the cell cycle. Therefore when computing statistics from the data such as the mean

246

fluorescence value, in reality we are averaging over the cell cycle. In other words, as depicted in

247

Fig. 3(B) quantities such as the mean protein copy number change over time, i.e.

i⌘h

(

)

.This

248

means that computing the mean of a population of unsynchronized cells is equivalent to averaging

249

this time dependent mean protein copy number over the span of the cell cycle. Mathematically this

250

is expressed as

251

(

)

(

)

dt,

(13)

where

represents the average protein copy number over a cell cycle,

represents the start of the

252

cell cycle,

represents the time of cell division, and

(

) represents the probability of any cell being

253

at time

[

] of their cell cycle. We do not consider cells uniformly distributed along the cell

254

cycle since it is known that cells follow an exponential distribution, having more younger than older

255

cells at any time point [

38]. All computations hereafter are therefore done by applying an averaging

256

like the one in Eq. 13 for the span of a cell cycle. We remind the reader that these time averages are

257

done under a fixed environmental state. It is the trajectory of cells over cell cycles under a constant

258

CC-BY-NC 4.0 International license

It is made available under a

(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint

http://dx.doi.org/10.1101/594325

doi:

bioRxiv preprint first posted online Mar. 31, 2019;