of 43
Reconciling Kinetic and Equilibrium Models of Bacterial
Tr a n s c r i p t i o n
Muir Morrison
1
,ManuelRazo-Mejia
2
,andRobPhillips
1, 2, *
1
Department of Physics, California Institute of Technology, Pasadena, CA 91125, USA
2
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena,
CA 91125, USA
*
Correspondence: phillips@pboc.caltech.edu
Abstract
The study of transcription remains one of the centerpieces of modern biology with
implications in settings from development to metabolism to evolution to disease. Pre-
cision measurements using a host of di
erent techniques including fluorescence and
sequencing readouts have raised the bar for what it means to quantitatively under-
stand transcriptional regulation. In particular our understanding of the simplest ge-
netic circuit is su
ciently refined both experimentally and theoretically that it has
become possible to carefully discriminate between di
erent conceptual pictures of how
this regulatory system works. This regulatory motif, originally posited by Jacob and
Monod in the 1960s, consists of a single transcriptional repressor binding to a promoter
site and inhibiting transcription. In this paper, we show how seven distinct models of
this so-called simple-repression motif, based both on equilibrium and kinetic thinking,
can be used to derive the predicted levels of gene expression and shed light on the
often surprising past success of the equilibrium models. These di
erent models are
then invoked to confront a variety of di
erent data on mean, variance and full gene
expression distributions, illustrating the extent to which such models can and cannot
be distinguished, and suggesting a two-state model with a distribution of burst sizes
as the most potent of the seven for describing the simple-repression motif.
1 Introduction
Gene expression presides over much of the most important dynamism of living organisms.
The level of expression of batteries of di
erent genes is altered as a result of spatiotemporal
cues that integrate chemical, mechanical and other types of signals. The original repressor-
operator model conceived by Jacob and Monod in the context of bacterial metabolism has
now been transformed into the much broader subject of gene regulatory networks in living or-
ganisms of all kinds [1]–[3]. One of the remaining outstanding challenges to have emerged in
the genomic era is our continued inability to predict the regulatory consequences of di
erent
regulatory architectures, i.e. the arrangement and a
nity of binding sites for transcription
1
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
factors and RNA polymerases on the DNA. This challenge stems first and foremost from our
ignorance about what those architectures even are, with more than 60% of the genes even
in an ostensibly well understood organism such as
E. coli
having no regulatory insights at
all [4]–[7]. But even once we have established the identity of key transcription factors and
their binding sites of a given promoter architecture, there remains the predictive challenge
of understanding its input-output properties, an objective that can be met by a myriad
of approaches using the tools of statistical physics [8]–[25]. One route to such predictive
understanding is to focus on the simplest regulatory architecture and to push the theory-
experiment dialogue as far and as hard as it can be pushed [26], [27]. If we demonstrate that
we can pass that test by successfully predicting both the means and variance in gene expres-
sion at the mRNA level, then that provides a more solid foundation upon which to launch
into more complex problems - for instance, some of the previously unknown architectures
uncovered in [5] and [28].
To that end, in this paper we examine a wide variety of distinct models for the simple
repression regulatory architecture. This genetic architecture consists of a DNA promoter
regulated by a transcriptional repressor that binds to a single binding site as developed
in pioneering early work on the quantitative dissection of transcription [29], [30]. All of
the proposed models coarse-grain away some of the important microscopic features of this
architecture that have been elucidated by generations of geneticists, molecular biologists
and biochemists. One goal in exploring such coarse-grainings is to build towards the future
models of regulatory response that will be able to serve the powerful predictive role needed to
take synthetic biology from a brilliant exercise in enlightened empiricism to a rational design
framework as in any other branch of engineering. More precisely, we want phenomenology
in the sense of coarse-graining away atomistic detail, but still retaining biophysical meaning.
For example, we are not satisfied with the strictly phenomenological approach o
ered by
the commonly used Hill functions. As argued in [31], Hill functions are ubiquitous precisely
because they coarse-grain away all biophysical details into inscrutable parameters. Studies
like [32] have demonstrated that Hill functions are clearly insu
cient since each new situation
requires a completely new set of parameters. Such work requires a quantitative theory of
how biophysical changes at the molecular level propagate to input-output functions at the
genetic circuit level. In particular a key question is: at this level of coarse-graining, what
microscopic details do we need to explicitly model, and how do we figure that out? For
example, do we need to worry about all or even any of the steps that individual RNA
polymerases go through each time they make a transcript? Turning the question around,
can we see any imprint of those processes in the available data? If the answer is no, then
those processes are irrelevant for our purposes. Forward modeling and inverse (statistical
inferential) modeling are necessary to tackle such questions.
Figure 1(A) shows the qualitative picture of simple repression that is implicit in the repressor-
operator model. An operator, the binding site on the DNA for a repressor protein, may
be found occupied by a repressor, in which case transcription is blocked from occurring.
Alternatively, that binding site may be found unoccupied, in which case RNA polymerase
(RNAP) may bind and transcription can proceed. The key assumption we make in this
simplest incarnation of the repressor-operator model is that binding of repressor and RNAP
in the promoter region of interest is exclusive, meaning that one or the other may bind, but
2
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
never may both be simultaneously bound. It is often imagined that when the repressor is
bound to its operator, RNAP is sterically blocked from binding to its promoter sequence.
Current evidence suggests this is sometimes, but not always the case, and it remains an
interesting open question precisely how a repressor bound far upstream is able to repress
transcription [4]. Suggestions include “action-at-a-distance” mediated by kinks in the DNA,
formed when the repressor is bound, that prevent RNAP binding. Nevertheless, our modeling
in this work is su
ciently coarse-grained that we simply assume exclusive binding and leave
explicit accounting of these details out of the problem.
The logic of the remainder of the paper is as follows. In section 2, we show how both
thermodynamic models and kinetic models based upon the chemical master equation all
culminate in the same underlying functional form for the fold-change in the average level
of gene expression as shown in Figure 1(D). Section 3 goes beyond an analysis of the mean
gene expression by asking how the same models presented in Figure 1(C) can be used to
explore noise in gene expression. To make contact with experiment, all of these models
must make a commitment to some numerical values for the key parameters found in each
such model. Therefore in Section 4 we explore the use of Bayesian inference to establish
these parameters and to rigorously answer the question of how to discriminate between the
di
erent models.
2 Mean Gene Expression
As noted in the previous section, there are two broad classes of models in play for computing
the input-output functions of regulatory architectures as shown in Figure 1. In both classes
of model, the promoter is imagined to exist in a discrete set of states of occupancy, with each
such state of occupancy accorded its own rate of transcription – including no transcription for
many of these states. The models are probabilistic with each state assigned some probability
and the overall rate of transcription given by
average rate of transcription =
X
i
r
i
p
i
,
(1)
where
i
labels the distinct states,
p
i
is the probability of the
i
th
state, and
r
i
is the rate of
transcription of that state. Ultimately, the di
erent models di
er along several key axes:
what states to consider and how to compute the probabilities of those states.
The first class of models that are the focus of the present section on predicting the mean level
of gene expression, sometimes known as thermodynamic models, invoke the tools of equi-
librium statistical mechanics to compute the probabilities [8]–[17]. As seen in Figure 1(B),
even within the class of thermodynamic models, we can make di
erent commitments about
the underlying microscopic states of the promoter. Indeed, the list of options considered
here does not at all exhaust the suite of di
erent microscopic states we can assign to the
promoter.
The second class of models that allow us to access the mean gene expression use chemical
master equations to compute the probabilities of the di
erent microscopic states [18]–[25].
3
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
(A)
(B)
(C)
(D)
repressor bound,
transcription blocked
repressor unbound,
transcription proceeds
RNA polymerase
transcriptional repressor
DNA promoter
CANONICAL TRANSCRIPTIONAL REGULATION CARTOON
CANONICAL TRANSCRIPTIONAL REGULATION CARTOON
EQUILIBRIUM MODELS
EQUILIBRIUM MODELS
NONEQUILIBRIUM MODELS
NONEQUILIBRIUM MODELS
THE MASTER CURVE FOR SIMPLE REPRESSION
THE MASTER CURVE FOR SIMPLE REPRESSION
1
state
state
weight
weight
2
1
2
state
state
weight
weight
DETAILS OF PROMOTER MODELS
DETAILS OF PROMOTER MODELS
DETAILS OF PROMOTER MODELS
DETAILS OF PROMOTER MODELS
3
4
5
1.0
0.8
0.6
0.4
0.2
0.0
fold-change
6
4
2
0
2
4
6
F
R
lo
g
(
)
(
k
B
T
)
Figure 1. An overview of the simple repression motif at the level of means.
(A)
Schematic of the qualitative biological picture of the simple repression genetic architecture. (B)
and (C) A variety of possible mathematicized cartoons of simple repression, along with the
e
ective parameter
which subsumes all regulatory details of the architecture that do not
directly involve the repressor. (B) Simple repression models from an equilibrium perspective. (C)
Equivalent models cast in chemical kinetics language. (D) The “master curve” to which all
cartoons in (B) and (C) collapse.
4
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
As seen in Figure 1(C), we consider a host of di
erent nonequilibrium models, each of which
will have its own result for both the mean (this section) and noise (next section) in gene
expression.
2.1 Fold-changes are indistinguishable across models
As a first stop on our search for the “right” model of simple repression, let us consider
what we can learn from theory and experimental measurements on the average level of gene
expression in a population of cells. One experimental strategy that has been particularly
useful (if incomplete since it misses out on gene expression dynamics) is to measure the
fold-change in mean expression. The fold-change is defined as
fold-change =
h
gene expression with repressor present
i
h
gene expression with repressor absent
i
=
h
m
(
R>
0)
i
h
m
(
R
=0)
i
(2)
where angle brackets
h
·
i
denote the average over a population of cells and mean mRNA
h
m
i
is viewed as a function of repressor copy number
R
. What this means is that the fold-change
in gene expression is a relative measurement of the e
ect of the transcriptional repressor
(
R>
0) on the gene expression level compared to an unregulated promoter (
R
=0). The
second equality in Eq. 2 follows from assuming that the translation e
ciency, i.e., the number
of proteins translated per mRNA, is the same in both conditions. In other words, we assume
that mean protein level is proportional to mean mRNA level, and that the proportionality
constant is the same in both conditions and therefore cancels out in the ratio. This is
reasonable since the cells in the two conditions are identical except for the presence of the
transcription factor, and the model assumes that the transcription factor has no direct e
ect
on translation.
Fold-change has proven a very convenient observable in past work [32]–[35]. Part of its
utility in dissecting transcriptional regulation is its ratiometric nature, which removes many
secondary e
ects that are present when making an absolute gene expression measurement.
Also, by measuring otherwise identical cells with and without a transcription factor present,
any biological noise common to both conditions can be made to cancel away.
Figure 1 depicts a smorgasbord of mathematicized cartoons for simple repression using both
thermodynamic and kinetic models that have appeared in previous literature. For each
cartoon, we calculate the fold-change in mean gene expression as predicted by that model,
deferring some algebraic details to Appendix S1. What we will find is that all cartoons
collapse to a single master curve, shown in Figure 1(D), which contains just two parameters.
We label the parameters
F
R
, an e
ective free energy parametrizing the repressor-DNA
interaction, and
, which subsumes all details of transcription in the absence of repressors.
We will o
er some intuition for why this master curve exists and discuss why at the level
of the mean expression, we are unable to discriminate “right” from “wrong” cartoons given
only measurements of fold-changes in expression.
5
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
2.1.1 The Two-State Equilibrium Model
In this simplest model, depicted as (1) in Figure 1(B), the promoter is idealized as existing
in one of two states, either repressor bound or repressor unbound. The rate of transcription
is assumed to be proportional to the fraction of time spent in the repressor unbound state.
From the relative statistical weights listed in Figure 1, the probability
p
U
of being in the
unbound state is
p
U
=
1+
R
N
NS
e
"
R
1
.
(3)
The mean rate of transcription is then given by
rp
U
,asassumedbyEq.1.Themeannumber
of mRNA is set by the balance of average mRNA transcription and degradation rates, so it
follows that the mean mRNA level is given by
h
m
i
=
r
1+
R
N
NS
e
"
R
1
,
(4)
where
r
is the transcription rate from the repressor unbound state,
is the mRNA degra-
dation rate,
R
is repressor copy number,
N
NS
is the number of nonspecific binding sites
in the genome where repressors spend most of their time when not bound to the operator,
1
/k
B
T
,and
"
R
is the binding energy of a repressor to its operator site. The derivation
of this result is deferred to Appendix S1.
The fold-change is found as the ratio of mean mRNA with and without repressor as intro-
duced in Eq. 2. Invoking that definition results in
fold-change =
1+
R
N
NS
e
"
R
1
,
(5)
which matches the form of the master curve in Figure 1(D) with
=1and
F
R
=
"
r
log(
R/N
NS
).
In fact it was noted in [35] that this two-state model can be viewed as the coarse-graining of
any equilibrium promoter model in which no transcriptionally active states have transcription
factor bound, or put di
erently, when there is no overlap between transcription factor bound
states and transcriptionally active states. We will see this explicitly in the 3-state equilibrium
model below, but perhaps surprising is that an analogous result carries over even to the
nonequilibrium models we consider later.
2.1.2 The Three-State Equilibrium Model
Compared to the previous model, here we fine-grain the repressor unbound state into two
separate states: empty, and RNAP bound as shown in (2) in Figure 1(B). This picture
was used in [33] as we use it here, and in [32] and [35] it was generalized to incorporate
small-molecule inducers that bind the repressor. The e
ect of this generalization is, roughly
speaking, simply to rescale
R
from the total number of repressors to a smaller e
ective
number of available repressors which are unbound by inducers. We point out that the same
generalization can be incorporated quite easily into any of our models in Figure 1 by simply
6
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
rescaling the repressor copy number
R
in the equilibrium models, or equivalently
k
+
R
in the
nonequilibrium models.
The mean mRNA copy number, as derived in Appendix S1 from a similar enumeration of
states and weights as the previous model, is
h
m
i
=
r
P
N
NS
e
"
P
1+
R
N
NS
e
"
R
+
P
N
NS
e
"
P
,
(6)
where the new variables are
"
P
, the di
erence in RNAP binding energy to its specific
site (the promoter) relative to an average nonspecific background site, and the RNAP copy
number,
P
.Thefold-changeagainfollowsimmediatelyas
fold-change =
P
N
NS
e
"
P
1+
R
N
NS
e
"
R
+
P
N
NS
e
"
P
1+
P
N
NS
e
"
P
P
N
NS
e
"
P
(7)
=
1+
R
N
NS
e
"
R
1+
P
N
NS
e
"
P
!
1
(8)
=(1+exp(
F
R
log
))
1
,
(9)
with
F
R
=
"
R
log(
R/N
NS
)and
=1+
P
N
NS
e
"
P
as shown in Figure 1(B). Thus far,
we see that the two thermodynamic models, despite making di
erent coarse-graining com-
mitments, result in the same functional form for the fold-change in mean gene expression. We
now explore how kinetic models fare when faced with computing the same observable.
2.1.3 The Poisson Promoter Nonequilibrium Model
For our first kinetic model, we imitate the states considered in the Two-State Equilibrium
Model and consider the simplest possible picture with only two states, repressor bound and
unbound. This is exactly the model used for the main results of [36]. In this picture, repressor
association and dissociation rates from its operator site,
k
+
R
and
k
R
,respectively,govern
transitions between the two states. When the system is in the unbound state, transcription
initiates at rate
r
, which represents a coarse-graining of all the downstream processes into
a single e
ective rate. mRNA is degraded at rate
as already exploited in the previous
models.
Let
p
R
(
m, t
) denote the joint probability of finding the system in the repressor bound state
R
with
m
mRNA molecules present at time
t
.Similarlydefine
p
U
(
m, t
)fortherepressor
unbound state
U
. This model is governed by coupled master equations giving the time
evolution of
p
R
(
m, t
)and
p
U
(
m, t
)[22],[24],[27]whichwecanwriteas
d
dt
p
R
(
m, t
)=
R
!
U
z
}|
{
k
R
p
R
(
m, t
)+
U
!
R
z
}|
{
k
+
R
p
U
(
m, t
)+
m
+1
!
m
z
}|
{
(
m
+1)
p
R
(
m
+1
,t
)
m
!
m
1
z
}|
{
p
R
(
m, t
)
d
dt
p
U
(
m, t
)=
R
!
U
z
}|
{
k
R
p
R
(
m, t
)
U
!
R
z
}|
{
k
+
R
p
U
(
m, t
)+
m
1
!
m
z
}|
{
rp
U
(
m
1
,t
)
m
!
m
+1
z
}|
{
rp
U
(
m, t
)
+
m
+1
!
m
z
}|
{
(
m
+1)
p
U
(
m
+1
,t
)
m
!
m
1
z
}|
{
p
U
(
m, t
)
,
(10)
7
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
where each term on the right corresponds to a transition between two states of the promoter
as indicated by the overbrace label. In each equation, the first two terms describe transitions
between promoter states due to repressors unbinding and binding, respectively. The final
two terms describe degradation of mRNA, decreasing the copy number by one, and the terms
with coe
cient
r
describe transcription initiation increasing the mRNA copy number by one.
We direct the reader to Appendix S1.1 for a careful treatment showing how the form of this
master equation follows from the corresponding cartoon in Figure 1.
We can greatly simplify the notation, which will be especially useful for the more complicated
models yet to come, by re-expressing the master equation in vector form [37]. The promoter
states are collected into a vector and the rate constants are collected into matrices as
~
p
(
m
)=
p
R
(
m
)
p
U
(
m
)
,
K
=
k
R
k
+
R
k
R
k
+
R
,
R
=
00
0
r
,
(11)
so that the master equation may be condensed as
d
dt
~
p
(
m, t
)=(
K
R
m
I
)
~
p
(
m, t
)+
R
~
p
(
m
1
,t
)+
(
m
+1)
I
~
p
(
m
+1
,t
)
,
(12)
where
I
is the identity matrix. Taking steady state by setting time derivatives to zero, the
mean mRNA can be found to be
h
m
i
=
r
1+
k
+
R
k
R
1
,
(13)
with the algebra details again deferred to Appendix S1. Recall
k
+
R
is proportional to the
repressor copy number, so in computing fold-change, absence of repressor corresponds to
k
+
R
!
0. Therefore fold-change in this model is simply
fold-change =
1+
k
+
R
k
R
1
,
(14)
again matching the master curve of Figure 1(D) with
=1.
2.1.4 Nonequilibrium Model Two - RNAP Bound and Unbound States
Our second kinetic model depicted in Figure 1(C) mirrors the second equilibrium model
of Figure 1(B) by fine-graining the repressor unbound state of nonequilibrium model 1,
resolving it into an empty promoter state and an RNAP-bound state. Note in this picture,
in contrast with model 4 below, transcription initiation is accompanied by a promoter state
change, in keeping with the interpretation as RNAP-bound and empty states: if an RNAP
successfully escapes the promoter and proceeds to elongation of a transcript, clearly it is no
longer bound at the promoter. Therefore another RNAP must bind before another transcript
can be initiated.
The master equation governing this model is analogous to Eqs. 11-12 for model 1 above.
The main subtlety arises since transcription initiation accompanies a promoter state change.
8
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
This can be understood by analogy to
K
.Theo
-diagonalanddiagonalelementsof
K
correspond to transitions arriving at or departing from, respectively, the promoter state
of interest. If transcription initiation is accompanied by promoter state changes, we must
have separate matrices for arriving and departing transcription events since the arriving and
departing transitions have di
erent initial copy numbers of mRNA, unlike for
K
where they
are the same (see Appendix S1). The master equation for this model is
d
dt
~
p
(
m, t
)=(
K
R
D
m
I
)
~
p
(
m, t
)+
R
A
~
p
(
m
1
,t
)+
(
m
+1)
I
~
p
(
m
+1
,t
)
,
(15)
with the state vector and promoter transition matrix defined as
~
p
(
m
)=
0
@
p
R
(
m
)
p
E
(
m
)
p
P
(
m
)
1
A
,
K
=
0
@
k
R
k
+
R
0
k
R
k
+
R
k
+
P
k
P
0
k
+
P
k
P
1
A
,
(16)
and the initiation matrices given by
R
A
=
0
@
000
00
r
000
1
A
,
R
D
=
0
@
000
000
00
r
1
A
.
(17)
The elements of
~
p
(
m
)encodetheprobabilitiesofhaving
m
mRNA present along with the
promoter having repressor bound (
R
), being empty (
E
), or having RNAP bound (
P
), re-
spectively.
R
A
describes probability flux
arriving
at the state
~
p
(
m
)fromastatewithone
fewer mRNA, namely
~
p
(
m
1), and
R
D
describes probability flux
departing
from the state
~
p
(
m
) for a state with one more mRNA, namely
~
p
(
m
+1).
K
is closely analogous to model
1.
Mean mRNA at steady state is found analogously to model 1, with the result
h
m
i
=
r
k
R
k
+
P
k
R
k
+
P
+
k
R
(
k
P
+
r
)+
k
+
R
(
k
P
+
r
)
,
(18)
and with details again deferred to Appendix S1. Fold-change is again found from the ratio
prescribed by Eq. 2, from which we have
fold-change =
k
R
k
+
P
k
R
k
+
P
+
k
R
(
k
P
+
r
)+
k
+
R
(
k
P
+
r
)
k
+
P
+
k
P
+
r
k
+
P
(19)
=
1+
k
+
R
k
R
k
P
+
r
k
+
P
+
k
P
+
r
1
(20)
=
1+
k
+
R
k
R
1+
k
+
P
k
P
+
r
1
!
1
,
(21)
which follows the master curve of Figure 1(D) with
=1+
k
+
P
/
(
k
P
+
r
)asclaimed.
9
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
2.1.5 Nonequilibrium Model Three - Multistep Transcription Initiation and
Escape
One might reasonably complain that the first two “nonequilibrium” models we have con-
sidered are straw men. Their steady states necessarily satisfy detailed balance which is
equivalent to thermodynamic equilibrium. Why is this the case? At steady state there is by
definition no net probability flux in or out of each promoter state, but since the promoter
states form a linear chain, there is only one way in or out of the repressor bound and RNAP
bound states, implying each edge must actually have a net zero probability flux, which is the
definition of detailed balance (usually phrased as equality of forward and reverse transition
fluxes).
Now we consider model 3 in Figure 1(C) which allows the possibility of true nonequilibrium
steady-state fluxes through the promoter states. We point out that this model was considered
previously in [38] where a comparison was made with model 1 as used in [36]. The authors
of [38] argued that the additional complexity is essential to properly account for the noise in
the mRNA distribution. We will weigh in on both models later when we consider observables
beyond fold-change.
The master equation governing this model is identical in form to model 2 above, namely
d
dt
~
p
(
m, t
)=(
K
R
D
m
I
)
~
p
(
m, t
)+
R
A
~
p
(
m
1
,t
)+
(
m
+1)
I
~
p
(
m
+1
,t
)
,
(22)
but with a higher-dimensional state space and di
erent matrices. The state vector and
promoter transition matrix are now
~
p
(
m
)=
0
B
B
@
p
R
(
m
)
p
E
(
m
)
p
C
(
m
)
p
O
(
m
)
1
C
C
A
,
K
=
0
B
B
@
k
R
k
+
R
00
k
R
k
+
R
k
+
P
k
P
0
0
k
+
P
k
P
k
O
0
00
k
O
0
1
C
C
A
,
(23)
with the four promoter states, in order, being repressor bound (
R
), empty (
E
), RNAP closed
complex (
C
), and RNAP open complex (
O
). Besides increasing dimension by one, the only
new feature in
K
is the rate
k
O
,representingtherateofopencomplexformationfromthe
closed complex, which we assume for simplicity to be irreversible in keeping with some [38]
but not all [39] past literature. The initiation matrices are given by
R
A
=
0
B
B
@
0000
000
r
0000
0000
1
C
C
A
,
R
D
=
0
B
B
@
0000
0000
0000
000
r
1
C
C
A
,
(24)
again closely analogous to nonequilibrium model 2.
The expression for mean mRNA is substantially more complicated now, as worked out in
Appendix S1 where we find
h
m
i
=
r
k
R
k
+
P
k
O
k
R
[(
k
+
P
(
k
O
+
r
)+
r
(
k
P
+
k
O
)] +
k
+
R
r
(
k
P
+
k
O
)
,
(25)
10
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
which can be simplified to
h
m
i
=
r
k
+
P
k
O
r
(
k
O
+
k
P
)
1+
k
+
P
(
k
O
+
r
)
r
(
k
O
+
k
P
)
+
k
+
R
k
R
.
(26)
The strategy is to isolate the terms involving the repressor, so that now the fold-change is
seen to be simply
fold-change =
k
+
P
k
O
r
(
k
O
+
k
P
)
1+
k
+
P
(
k
O
+
r
)
r
(
k
O
+
k
P
)
+
k
+
R
k
R
1+
k
+
P
(
k
O
+
r
)
r
(
k
O
+
k
P
)
k
+
P
k
O
r
(
k
O
+
k
P
)
(27)
=
1+
k
+
R
k
R
1+
k
+
P
(
k
O
+
r
)
r
(
k
O
+
k
P
)
1
!
1
,
(28)
surprisingly reducing to the master curve of Figure 1(D) once again, with
=1+
k
+
P
(
k
O
+
r
)
r
(
k
O
+
k
P
)
.
This example hints that an arbitrarily fine-grained model of downstream transcription steps
may still be collapsed to the form of the master curve for the means given in Figure 1(D), so
long as the repressor binding is exclusive with transcriptionally active states. We o
er this
as a conjecture, and we suspect that a careful argument using the King-Altman diagram
method [40], [41] might furnish a “proof.” Our focus here is not on full generality but rather
to survey an assortment of plausible models for simple repression that have been proposed
in the literature.
2.1.6 Nonequilibrium Model Four - “Active” and “Inactive” States
Model 4 in Figure 1(C) is at the core of the theory in [42]. At a glance the cartoon for this
model may appear very similar to model 2, and mathematically it is, but the interpretation
is rather di
erent. In model 2, we interpreted the third state literally as an RNAP-bound
promoter and modeled initiation of a transcript as triggering a promoter state change, making
the assumption that an RNAP can only make one transcript at a time. In contrast, in the
present model the promoter state does
not
change when a transcript is initiated. So we no
longer interpret these states as literally RNAP bound and unbound but instead as coarse-
grained “active” and “inactive” states, the details of which we leave unspecified for now. We
will comment more on this model below when we discuss Fano factors of models.
Mathematically this model is very similar to models 1 and 2. Like model 1, the matrix
R
describing transcription initiation is diagonal, namely
R
=
0
@
000
000
00
r
1
A
.
(29)
The master equation takes verbatim the same form as it did for model 1, Eq. 12. Meanwhile
the promoter transition matrix
K
is the same as Eq. 16 from model 2, although we relabel
the rate constants from
k
±
P
to
k
±
to reiterate that these are not simply RNAP binding and
unbinding rates.
11
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
Carrying out the algebra, the mean mRNA can be found to be
h
m
i
=
r
k
R
k
+
k
R
k
+
+
k
R
k
+
k
+
R
k
,
(30)
and the fold-change readily follows,
fold-change =
k
R
k
+
k
R
k
+
+
k
R
k
+
k
+
R
k
k
R
k
+
+
k
R
k
k
R
k
+
(31)
=
1+
k
+
R
k
R
1+
k
+
k
1
!
1
,
(32)
from which we see
=1+
k
+
/k
as shown in Figure 1(C).
2.1.7 Nonequilibrium Model Five - Bursty Promoter
The final model we consider shown in Figure 1(C) is an intuitive analog to model 1, with
just two states, repressor bound or unbound, and transition rates between them of
k
+
R
and
k
R
. In model 1, when in the unbound state, single mRNA transcripts are produced as a
Poisson process with some characteristic rate
r
.Thecurrentmodelbycontrastproduces,
at some Poisson rate
k
i
,
bursts
of mRNA transcripts. The burst sizes are assumed to be
geometrically distributed with a mean burst size
b
,whichwewillmotivateinSection3when
we derive this model as a certain limiting case of model 4.
From this intuitive picture and by analogy to model 1, then, it should be plausible that the
mean mRNA level is
h
m
i
=
k
i
b
1+
k
+
R
k
R
1
,
(33)
which will turn out to be correct from a careful calculation. For now, we simply note that
just like model 1, the fold-change becomes
fold-change =
1+
k
+
R
k
R
1
(34)
with
=1alsolikemodel1. Wewillalsoseelaterhowthismodelemergesasanatural
limit of model 4.
2.2 Discussion of Results Across Models for Fold-Changes in Mean
Expression
The key outcome of our analysis of the models in Figure 1 is the existence of a master curve
shown in Figure 1(D) to which the fold-change predictions of all the models collapse. This
master curve is parametrized by only two e
ective parameters:
F
R
,whichcharacterizes
the number of repressors and their binding strength to the DNA, and
,whichcharacterizes
all other features of the promoter architecture. The key assumption underpinning this result
is that no transcription occurs when a repressor is bound to its operator. Note, however,
12
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
that we are agnostic about the molecular mechanism which achieves this; steric e
ects are
one plausible mechanism, but, for instance, “action-at-a-distance” mediated by kinked DNA
due to repressors bound tens or hundreds of nucleotides upstream of a promoter is plausible
as well.
Why does the master curve of Figure 1(D) exist at all? This brings to mind the deep
questions posed in, e.g., [31] and [43], suggesting we consider multiple plausible models
of a system and search for their common patterns to tease out which broad features are
and are not important. In our case, the key feature seems to be the exclusive nature of
repressor and RNAP binding, which allows the parameter describing the repressor,
F
R
,to
cleanly separate from all other details of the promoter architecture, which are encapsulated
in
. Arbitrary nonequilibrium behavior can occur on the rest of the promoter state space,
but it may all be swept up in the e
ective parameter
,towhichtherepressormakesno
contribution. We point the interested reader to [44] and [45] for an interesting analysis of
similar problems using a graph-theoretic language.
As suggested in [35], we believe this master curve should generalize to architectures with
multiple repressor binding sites, as long as the exclusivity of transcription factor binding
and transcription initiation is maintained. The interpretation of
F
R
is then of an e
ective
free energy of all repressor bound states. In an equilibrium picture this is simply given by
the log of the sum of Boltzmann weights of all repressor bound states, which looks like the
log of a partition function of a subsystem. In a nonequilibrium picture, while we can still
mathematically gather terms and give the resulting collection the label
F
R
,itisunclear
if the physical interpretation as an e
ective free energy makes sense. The problem is that
free energies cannot be assigned unambiguously to states out of equilibrium because the free
energy change along a generic path traversing the state space is path dependent, unlike at
equilibrium. A consequence of this is that, out of equilibrium,
F
R
is no longer a simple sum
of Boltzmann weights. Instead it resembles a restricted sum of King-Altman diagrams [40],
[41]. Following the work of Hill [46], it may yet be possible to interpret this expression as
an e
ective free energy, but this remains unclear to us. We leave this an open problem for
future work.
If we relax the requirement of exclusive repressor-RNAP binding, one could imagine models
in which repressor and RNAP doubly-bound states are allowed, where the repressor’s e
ect
is to reduce the transcription rate rather than setting it to zero. Our results do not strictly
apply to such a model, although we note that if the repressor’s reduction of the transcription
rate is substantial, such a model might still be well-approximated by one of the models
in Figure 1.
One may worry that our “one curve to rule them all” is a mathematical tautology. In fact
we
agree
with this criticism if
F
R
is “just a fitting parameter” and cannot be meaningfully
interpreted as a real, physical free energy. An analogy to Hill functions is appropriate here.
One of their great strengths and weaknesses, depending on the use they are put to, is that
their parameters coarse-grain many details and are generally not interpretable in terms of
microscopic models, for deep reasons discussed at length in [31]. By contrast, our master
curve claims to have the best of both worlds: a coarse-graining of all details besides the
repressor into a single e
ective parameter
,whilesimultaneouslyretaininganinterpretation
13
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
of
F
R
as a physically meaningful and interpretable free energy. Our task, then, is to prove
or disprove this claim.
How do we test this and probe the theory with fold-change measurements? There is a
fundamental limitation in that the master curve is essentially a one-parameter function of
F
R
+log
.Worse,therearemany
a priori
plausible microscopic mechanisms that could
contribute to the value of
, such as RNAP binding and escape kinetics [38], [39], and/or
supercoiling accumulation and release [47], [48], and/or, RNAP clusters analogous to those
observed in eukaryotes [49], [50] and recently also observed in bacteria [51]. Even if
F
R
is measured to high precision, inferring the potential microscopic contributions to
,buried
inside a log no less, from fold-change measurements seems beyond reach. As a statistical
inference problem it is entirely nonidentifiable, in the language of [52], Section 4.3.
If we cannot simply infer values of
from measurements of fold-change, can we perturb some
of the parameters that make up
and measure the change? Unfortunately we suspect this
is o
-limits experimentally: most of the potential contributors to
are global processes that
a
ect many or all genes. For instance, changing RNAP association rates by changing RNAP
copy numbers, or changing supercoiling kinetics by changing topoisomerase copy numbers,
would massively perturb the entire cell’s physiology and confound any determination of
.
One might instead imagine a bottom-up modeling approach, where we mathematicize a
model of what we hypothesize the important steps are and are not, use
in vitro
data for
the steps deemed important, and
predict
what
should be. But again, because of the
one-parameter nature of the master curve, many di
erent models will likely make indistin-
guishable predictions, and without any way to experimentally perturb
in vivo
,thereisno
clear way to test whether the modeling assumptions are correct.
In light of this, we prefer the view that parameters and rates are not directly comparable
between cartoons in Figure 1. Rather, parameters in the simpler cartoons represent coarse-
grained combinations of parameters in the finer-grained models. For instance, by equating
between any two models, one can derive various possible correspondences between the two
models’ parameters. Note that these correspondences are clearly not unique, since many
possible associations could be made. It then is a choice as to what microscopic interpreta-
tions the model-builder prefers for the parameters in a particular cartoon, and as to which
coarse-grainings lend intuition and which seem nonsensical. Indeed, since it remains an open
question what microscopic features dominate
(as suggested above, perhaps RNAP binding
and escape kinetics [38], [39], or supercoiling accumulation and release [47], [48], or, some-
thing more exotic like RNAP clusters [49]–[51]), we are hesitant to put too much weight on
any one microscopic interpretation of model parameters that make up
.
One possible tuning knob to probe
that would not globally perturb the cell’s physiology is
to manipulate RNAP binding sites. Work such as [53] has shown that models of sequence-
dependent RNAP a
nity can be inferred from data, and the authors of [54] showed that
the model of [53] has predictive power by using the model to
design
binding sites of a
desired a
nity. But for our purposes, this begs the question: the authors of [53]
assumed
aparticularmodel(essentiallyour3-stateequilibriummodelbutwithouttherepressor),so
14
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
it is unclear how or if such methods can be turned around to
compare
di
erent models of
promoter function.
Another possible route to dissect transcription details without a global perturbation would
be to use phage polymerase with phage-specific promoters. While such results would carry
some caveats, e.g., whether the repression of the phage polymerase is a good analog to the
repression of the native RNAP, it could nevertheless be worthy of consideration.
We have already pointed out that the master curve of Figure 1 is essentially a one-parameter
model, the one parameter being
F
R
+log
. By now the reader may be alarmed as to how
can we even determine
F
R
and
independently of each other, never mind shedding a lens
on the internal structure of
itself. A hint is provided by the weak promoter approximation,
invoked repeatedly in prior studies [14], [32], [33] of simple repression using the 3-state
equilibrium model in Figure 1(B). In that picture, the weak promoter approximation means
P
N
NS
exp(
"
P
)
1, meaning therefore
1. This approximation can be well justified
on the basis of the number of RNAP and
factors per cell and the strength of binding of
RNAP to DNA at weak promoters. This is suggestive, but how can we be sure that
is
not, for instance, actually 10
2
and that
F
R
hides a compensatory factor? A resolution is
o
ered by an independent inference of
in the absence of repressors. This was done in [42]
by fitting nonequilibrium model 4 in Figure 1(C), with zero repressor (looking ahead, this
is equivalent to model 4 in Figure 2(A)), to single-cell mRNA counts data from [34]. This
provided a determination of
k
+
and
k
, from which their ratio is estimated to be no more
than a few 10
1
and possibly as small as 10
2
.
The realization that
1toanexcellentapproximation,
independent
of which model in Fig-
ure 1 one prefers, goes a long way towards explaining the surprising success of equilibrium
models of simple repression. Even though our 2- and 3-state models get so many details
of transcription wrong, it does not matter because fold-change is a cleverly designed ratio.
Since
subsumes all details except the repressor, and log
0, fitting these simple models
to fold-change measurements can still give a surprisingly good estimate of repressor binding
energies. So the ratiometric construction of fold-change fulfills its intended purpose of can-
celing out all features of the promoter architecture except the repressor itself. Nevertheless
it is perhaps surprising how e
ectively it does so:
a priori
,onemightnothaveexpected
to be quite so close to 1.
We would also like to highlight the relevance of [55] here. Landman et. al. reanalyzed and
compared
in vivo
and
in vitro
data on the lacI repressor’s binding a
nity to its various
operator sequences. (The
in vivo
data was from, essentially, fitting our master curve to
expression measurements.) They find broad agreement between the
in vitro
and
in vivo
values. This reinforces the suspicion that the equilibrium
"
R
repressor binding energies do
in fact represent real physical free energies. Again,
a priori
this did not have to be the case,
even knowing that
1.
In principle, if
F
R
can be measured to su
cient precision, then deviations from
=1
become a testable matter of experiment. In practice, it is probably unrealistic to measure
repressor rates
k
+
R
or
k
R
or fold-changes in expression levels (and hence
"
R
)precisely
enough to detect the expected tiny deviations from
=1. Wecanestimatetherequisite
15
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint
precision in
F
R
to resolve a given
by noting, since
1, that log(1 +
)
,so
(
F
R
)
. Suppose we are lucky and
is
0
.
1, on the high end of our range estimated
above. A determination of
"
R
/k
B
T
with an uncertainty of barely 0.1 was achieved in the
excellent measurements of [32], so this requires a very di
cult determination of
F
R
for a
very crude determination of
,whichsuggests,toputitlightly,thisisnotapromisingpath
to pursue experimentally. It is doubtful that inference of repressor kinetic rates would be
any easier.
Moving forward, we have weak evidence supporting the interpretation of
F
R
as a physically
real free energy [55] and other work casting doubt [56]. How might we resolve the confusion?
If there is no discriminatory power to test the theory and distinguish the various models with
measurements of fold-changes in means, how do we probe the theory? Clearly to discriminate
between the nonequilibrium models in Figure 1, we need to go beyond means to ask questions
about kinetics, noise and even full distributions of mRNA copy numbers over a population
of cells. If the “one-curve-to-rule-them-all” is more than a mathematical tautology, then the
free energy of repressor binding inferred from fold-change measurements should agree with
repressor binding and unbinding rates. In other words, the equilibrium and nonequilibrium
definitions of
F
R
must agree, meaning
F
R
=
"
R
log(
R/N
NS
)=
log(
k
+
R
/k
R
)
,
(35)
must hold, where
"
R
is inferred from the master curve fit to fold-change measurements,
and
k
+
R
and
k
R
are inferred in some orthogonal manner. Single molecule measurements such
as from [56] have directly observed these rates, and in the remainder of this work we explore
a complementary approach: inferring repressor binding and unbinding rates
k
+
R
and
k
R
from
single-molecule measurements of mRNA population distributions.
3 Beyond Means in Gene Expression
In this section, our objective is to explore the same models considered in the previous section,
but now with reference to the the question of how well they describe the distribution of gene
expression levels, with special reference to the variance in these distributions. To that end,
we repeat the same pattern as in the previous section by examining the models one by one.
In particular we will focus on the Fano factor, defined as the variance/mean. This metric
serves as a powerful discriminatory tool from the null model that the steady-state mRNA
distribution must be Poisson, giving a Fano factor of one.
3.1 Kinetic models for unregulated promoter noise
Before we can tackle simple repression, we need an adequate phenomenological model of
constitutive expression. The literature abounds with options from which we can choose, and
we show several potential kinetic models for constitutive promoters in Figure 2(A). Let us
consider the suitability of each model for our purposes in turn.
16
.
CC-BY-NC 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted June 14, 2020.
.
https://doi.org/10.1101/2020.06.13.150292
doi:
bioRxiv preprint