Reconciling Kinetic and Equilibrium Models of Bacterial Transcription

Reconciling Kinetic and Equilibrium Models of Bacterial

Tr a n s c r i p t i o n

Muir Morrison

,ManuelRazo-Mejia

,andRobPhillips

1, 2, *

Department of Physics, California Institute of Technology, Pasadena, CA 91125, USA

Division of Biology and Biological Engineering, California Institute of Technology, Pasadena,

CA 91125, USA

Correspondence: phillips@pboc.caltech.edu

Abstract

The study of transcription remains one of the centerpieces of modern biology with

implications in settings from development to metabolism to evolution to disease. Pre-

cision measurements using a host of di

↵

erent techniques including fluorescence and

sequencing readouts have raised the bar for what it means to quantitatively under-

stand transcriptional regulation. In particular our understanding of the simplest ge-

netic circuit is su

ciently refined both experimentally and theoretically that it has

become possible to carefully discriminate between di

↵

erent conceptual pictures of how

this regulatory system works. This regulatory motif, originally posited by Jacob and

Monod in the 1960s, consists of a single transcriptional repressor binding to a promoter

site and inhibiting transcription. In this paper, we show how seven distinct models of

this so-called simple-repression motif, based both on equilibrium and kinetic thinking,

can be used to derive the predicted levels of gene expression and shed light on the

often surprising past success of the equilibrium models. These di

↵

erent models are

then invoked to confront a variety of di

↵

erent data on mean, variance and full gene

expression distributions, illustrating the extent to which such models can and cannot

be distinguished, and suggesting a two-state model with a distribution of burst sizes

as the most potent of the seven for describing the simple-repression motif.

1 Introduction

Gene expression presides over much of the most important dynamism of living organisms.

The level of expression of batteries of di

↵

erent genes is altered as a result of spatiotemporal

cues that integrate chemical, mechanical and other types of signals. The original repressor-

operator model conceived by Jacob and Monod in the context of bacterial metabolism has

now been transformed into the much broader subject of gene regulatory networks in living or-

ganisms of all kinds [1]–[3]. One of the remaining outstanding challenges to have emerged in

the genomic era is our continued inability to predict the regulatory consequences of di

↵

erent

regulatory architectures, i.e. the arrangement and a

nity of binding sites for transcription

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

factors and RNA polymerases on the DNA. This challenge stems first and foremost from our

ignorance about what those architectures even are, with more than 60% of the genes even

in an ostensibly well understood organism such as

E. coli

having no regulatory insights at

all [4]–[7]. But even once we have established the identity of key transcription factors and

their binding sites of a given promoter architecture, there remains the predictive challenge

of understanding its input-output properties, an objective that can be met by a myriad

of approaches using the tools of statistical physics [8]–[25]. One route to such predictive

understanding is to focus on the simplest regulatory architecture and to push the theory-

experiment dialogue as far and as hard as it can be pushed [26], [27]. If we demonstrate that

we can pass that test by successfully predicting both the means and variance in gene expres-

sion at the mRNA level, then that provides a more solid foundation upon which to launch

into more complex problems - for instance, some of the previously unknown architectures

uncovered in [5] and [28].

To that end, in this paper we examine a wide variety of distinct models for the simple

repression regulatory architecture. This genetic architecture consists of a DNA promoter

regulated by a transcriptional repressor that binds to a single binding site as developed

in pioneering early work on the quantitative dissection of transcription [29], [30]. All of

the proposed models coarse-grain away some of the important microscopic features of this

architecture that have been elucidated by generations of geneticists, molecular biologists

and biochemists. One goal in exploring such coarse-grainings is to build towards the future

models of regulatory response that will be able to serve the powerful predictive role needed to

take synthetic biology from a brilliant exercise in enlightened empiricism to a rational design

framework as in any other branch of engineering. More precisely, we want phenomenology

in the sense of coarse-graining away atomistic detail, but still retaining biophysical meaning.

For example, we are not satisfied with the strictly phenomenological approach o

↵

ered by

the commonly used Hill functions. As argued in [31], Hill functions are ubiquitous precisely

because they coarse-grain away all biophysical details into inscrutable parameters. Studies

like [32] have demonstrated that Hill functions are clearly insu

cient since each new situation

requires a completely new set of parameters. Such work requires a quantitative theory of

how biophysical changes at the molecular level propagate to input-output functions at the

genetic circuit level. In particular a key question is: at this level of coarse-graining, what

microscopic details do we need to explicitly model, and how do we figure that out? For

example, do we need to worry about all or even any of the steps that individual RNA

polymerases go through each time they make a transcript? Turning the question around,

can we see any imprint of those processes in the available data? If the answer is no, then

those processes are irrelevant for our purposes. Forward modeling and inverse (statistical

inferential) modeling are necessary to tackle such questions.

Figure 1(A) shows the qualitative picture of simple repression that is implicit in the repressor-

operator model. An operator, the binding site on the DNA for a repressor protein, may

be found occupied by a repressor, in which case transcription is blocked from occurring.

Alternatively, that binding site may be found unoccupied, in which case RNA polymerase

(RNAP) may bind and transcription can proceed. The key assumption we make in this

simplest incarnation of the repressor-operator model is that binding of repressor and RNAP

in the promoter region of interest is exclusive, meaning that one or the other may bind, but

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

never may both be simultaneously bound. It is often imagined that when the repressor is

bound to its operator, RNAP is sterically blocked from binding to its promoter sequence.

Current evidence suggests this is sometimes, but not always the case, and it remains an

interesting open question precisely how a repressor bound far upstream is able to repress

transcription [4]. Suggestions include “action-at-a-distance” mediated by kinks in the DNA,

formed when the repressor is bound, that prevent RNAP binding. Nevertheless, our modeling

in this work is su

ciently coarse-grained that we simply assume exclusive binding and leave

explicit accounting of these details out of the problem.

The logic of the remainder of the paper is as follows. In section 2, we show how both

thermodynamic models and kinetic models based upon the chemical master equation all

culminate in the same underlying functional form for the fold-change in the average level

of gene expression as shown in Figure 1(D). Section 3 goes beyond an analysis of the mean

gene expression by asking how the same models presented in Figure 1(C) can be used to

explore noise in gene expression. To make contact with experiment, all of these models

must make a commitment to some numerical values for the key parameters found in each

such model. Therefore in Section 4 we explore the use of Bayesian inference to establish

these parameters and to rigorously answer the question of how to discriminate between the

↵

erent models.

2 Mean Gene Expression

As noted in the previous section, there are two broad classes of models in play for computing

the input-output functions of regulatory architectures as shown in Figure 1. In both classes

of model, the promoter is imagined to exist in a discrete set of states of occupancy, with each

such state of occupancy accorded its own rate of transcription – including no transcription for

many of these states. The models are probabilistic with each state assigned some probability

and the overall rate of transcription given by

average rate of transcription =

(1)

where

labels the distinct states,

is the probability of the

state, and

is the rate of

transcription of that state. Ultimately, the di

↵

erent models di

↵

er along several key axes:

what states to consider and how to compute the probabilities of those states.

The first class of models that are the focus of the present section on predicting the mean level

of gene expression, sometimes known as thermodynamic models, invoke the tools of equi-

librium statistical mechanics to compute the probabilities [8]–[17]. As seen in Figure 1(B),

even within the class of thermodynamic models, we can make di

↵

erent commitments about

the underlying microscopic states of the promoter. Indeed, the list of options considered

here does not at all exhaust the suite of di

↵

erent microscopic states we can assign to the

promoter.

The second class of models that allow us to access the mean gene expression use chemical

master equations to compute the probabilities of the di

↵

erent microscopic states [18]–[25].

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

(A)

(B)

(C)

(D)

repressor bound,

transcription blocked

repressor unbound,

transcription proceeds

RNA polymerase

transcriptional repressor

DNA promoter

CANONICAL TRANSCRIPTIONAL REGULATION CARTOON

EQUILIBRIUM MODELS

NONEQUILIBRIUM MODELS

THE MASTER CURVE FOR SIMPLE REPRESSION

state

weight

state

weight

DETAILS OF PROMOTER MODELS

1.0

0.8

0.6

0.4

0.2

0.0

fold-change

–

(

⇢

)

(

)

Figure 1. An overview of the simple repression motif at the level of means.

(A)

Schematic of the qualitative biological picture of the simple repression genetic architecture. (B)

and (C) A variety of possible mathematicized cartoons of simple repression, along with the

↵

ective parameter

⇢

which subsumes all regulatory details of the architecture that do not

directly involve the repressor. (B) Simple repression models from an equilibrium perspective. (C)

Equivalent models cast in chemical kinetics language. (D) The “master curve” to which all

cartoons in (B) and (C) collapse.

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

As seen in Figure 1(C), we consider a host of di

↵

erent nonequilibrium models, each of which

will have its own result for both the mean (this section) and noise (next section) in gene

expression.

2.1 Fold-changes are indistinguishable across models

As a first stop on our search for the “right” model of simple repression, let us consider

what we can learn from theory and experimental measurements on the average level of gene

expression in a population of cells. One experimental strategy that has been particularly

useful (if incomplete since it misses out on gene expression dynamics) is to measure the

fold-change in mean expression. The fold-change is defined as

fold-change =

gene expression with repressor present

gene expression with repressor absent

(

=0)

(2)

where angle brackets

denote the average over a population of cells and mean mRNA

is viewed as a function of repressor copy number

. What this means is that the fold-change

in gene expression is a relative measurement of the e

↵

ect of the transcriptional repressor

(

0) on the gene expression level compared to an unregulated promoter (

=0). The

second equality in Eq. 2 follows from assuming that the translation e

ciency, i.e., the number

of proteins translated per mRNA, is the same in both conditions. In other words, we assume

that mean protein level is proportional to mean mRNA level, and that the proportionality

constant is the same in both conditions and therefore cancels out in the ratio. This is

reasonable since the cells in the two conditions are identical except for the presence of the

transcription factor, and the model assumes that the transcription factor has no direct e

↵

ect

on translation.

Fold-change has proven a very convenient observable in past work [32]–[35]. Part of its

utility in dissecting transcriptional regulation is its ratiometric nature, which removes many

secondary e

↵

ects that are present when making an absolute gene expression measurement.

Also, by measuring otherwise identical cells with and without a transcription factor present,

any biological noise common to both conditions can be made to cancel away.

Figure 1 depicts a smorgasbord of mathematicized cartoons for simple repression using both

thermodynamic and kinetic models that have appeared in previous literature. For each

cartoon, we calculate the fold-change in mean gene expression as predicted by that model,

deferring some algebraic details to Appendix S1. What we will find is that all cartoons

collapse to a single master curve, shown in Figure 1(D), which contains just two parameters.

We label the parameters

, an e

↵

ective free energy parametrizing the repressor-DNA

interaction, and

⇢

, which subsumes all details of transcription in the absence of repressors.

We will o

↵

er some intuition for why this master curve exists and discuss why at the level

of the mean expression, we are unable to discriminate “right” from “wrong” cartoons given

only measurements of fold-changes in expression.

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

2.1.1 The Two-State Equilibrium Model

In this simplest model, depicted as (1) in Figure 1(B), the promoter is idealized as existing

in one of two states, either repressor bound or repressor unbound. The rate of transcription

is assumed to be proportional to the fraction of time spent in the repressor unbound state.

From the relative statistical weights listed in Figure 1, the probability

of being in the

unbound state is

✓

◆

(3)

The mean rate of transcription is then given by

,asassumedbyEq.1.Themeannumber

of mRNA is set by the balance of average mRNA transcription and degradation rates, so it

follows that the mean mRNA level is given by

✓

◆

(4)

where

is the transcription rate from the repressor unbound state,

is the mRNA degra-

dation rate,

is repressor copy number,

is the number of nonspecific binding sites

in the genome where repressors spend most of their time when not bound to the operator,

⌘

,and

is the binding energy of a repressor to its operator site. The derivation

of this result is deferred to Appendix S1.

The fold-change is found as the ratio of mean mRNA with and without repressor as intro-

duced in Eq. 2. Invoking that definition results in

fold-change =

✓

◆

(5)

which matches the form of the master curve in Figure 1(D) with

⇢

=1and

log(

R/N

In fact it was noted in [35] that this two-state model can be viewed as the coarse-graining of

any equilibrium promoter model in which no transcriptionally active states have transcription

factor bound, or put di

↵

erently, when there is no overlap between transcription factor bound

states and transcriptionally active states. We will see this explicitly in the 3-state equilibrium

model below, but perhaps surprising is that an analogous result carries over even to the

nonequilibrium models we consider later.

2.1.2 The Three-State Equilibrium Model

Compared to the previous model, here we fine-grain the repressor unbound state into two

separate states: empty, and RNAP bound as shown in (2) in Figure 1(B). This picture

was used in [33] as we use it here, and in [32] and [35] it was generalized to incorporate

small-molecule inducers that bind the repressor. The e

↵

ect of this generalization is, roughly

speaking, simply to rescale

from the total number of repressors to a smaller e

↵

ective

number of available repressors which are unbound by inducers. We point out that the same

generalization can be incorporated quite easily into any of our models in Figure 1 by simply

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

rescaling the repressor copy number

in the equilibrium models, or equivalently

in the

nonequilibrium models.

The mean mRNA copy number, as derived in Appendix S1 from a similar enumeration of

states and weights as the previous model, is

(6)

where the new variables are

, the di

↵

erence in RNAP binding energy to its specific

site (the promoter) relative to an average nonspecific background site, and the RNAP copy

number,

.Thefold-changeagainfollowsimmediatelyas

fold-change =

(7)

(8)

=(1+exp(

log

⇢

))

(9)

with

log(

R/N

)and

⇢

=1+

as shown in Figure 1(B). Thus far,

we see that the two thermodynamic models, despite making di

↵

erent coarse-graining com-

mitments, result in the same functional form for the fold-change in mean gene expression. We

now explore how kinetic models fare when faced with computing the same observable.

2.1.3 The Poisson Promoter Nonequilibrium Model

For our first kinetic model, we imitate the states considered in the Two-State Equilibrium

Model and consider the simplest possible picture with only two states, repressor bound and

unbound. This is exactly the model used for the main results of [36]. In this picture, repressor

association and dissociation rates from its operator site,

and

,respectively,govern

transitions between the two states. When the system is in the unbound state, transcription

initiates at rate

, which represents a coarse-graining of all the downstream processes into

a single e

↵

ective rate. mRNA is degraded at rate

as already exploited in the previous

models.

Let

(

m, t

) denote the joint probability of finding the system in the repressor bound state

with

mRNA molecules present at time

.Similarlydefine

(

m, t

)fortherepressor

unbound state

. This model is governed by coupled master equations giving the time

evolution of

(

m, t

)and

(

m, t

)[22],[24],[27]whichwecanwriteas

(

m, t

{

(

m, t

{

(

m, t

{

(

+1)

(

)

{

(

m, t

)

(

m, t

{

(

m, t

)

{

(

m, t

{

(

)

{

(

m, t

)

{

(

+1)

(

)

{

(

m, t

)

(10)

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

where each term on the right corresponds to a transition between two states of the promoter

as indicated by the overbrace label. In each equation, the first two terms describe transitions

between promoter states due to repressors unbinding and binding, respectively. The final

two terms describe degradation of mRNA, decreasing the copy number by one, and the terms

with coe

cient

describe transcription initiation increasing the mRNA copy number by one.

We direct the reader to Appendix S1.1 for a careful treatment showing how the form of this

master equation follows from the corresponding cartoon in Figure 1.

We can greatly simplify the notation, which will be especially useful for the more complicated

models yet to come, by re-expressing the master equation in vector form [37]. The promoter

states are collected into a vector and the rate constants are collected into matrices as

(

✓

(

)

(

)

◆

✓

◆

✓

◆

(11)

so that the master equation may be condensed as

(

m, t

)=(

)

(

m, t

(

+1)

(

)

(12)

where

is the identity matrix. Taking steady state by setting time derivatives to zero, the

mean mRNA can be found to be

✓

◆

(13)

with the algebra details again deferred to Appendix S1. Recall

is proportional to the

repressor copy number, so in computing fold-change, absence of repressor corresponds to

0. Therefore fold-change in this model is simply

fold-change =

✓

◆

(14)

again matching the master curve of Figure 1(D) with

⇢

=1.

2.1.4 Nonequilibrium Model Two - RNAP Bound and Unbound States

Our second kinetic model depicted in Figure 1(C) mirrors the second equilibrium model

of Figure 1(B) by fine-graining the repressor unbound state of nonequilibrium model 1,

resolving it into an empty promoter state and an RNAP-bound state. Note in this picture,

in contrast with model 4 below, transcription initiation is accompanied by a promoter state

change, in keeping with the interpretation as RNAP-bound and empty states: if an RNAP

successfully escapes the promoter and proceeds to elongation of a transcript, clearly it is no

longer bound at the promoter. Therefore another RNAP must bind before another transcript

can be initiated.

The master equation governing this model is analogous to Eqs. 11-12 for model 1 above.

The main subtlety arises since transcription initiation accompanies a promoter state change.

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

This can be understood by analogy to

.Theo

↵

-diagonalanddiagonalelementsof

correspond to transitions arriving at or departing from, respectively, the promoter state

of interest. If transcription initiation is accompanied by promoter state changes, we must

have separate matrices for arriving and departing transcription events since the arriving and

departing transitions have di

↵

erent initial copy numbers of mRNA, unlike for

where they

are the same (see Appendix S1). The master equation for this model is

(

m, t

)=(

)

(

m, t

(

+1)

(

)

(15)

with the state vector and promoter transition matrix defined as

(

)

(

)

(

)

(16)

and the initiation matrices given by

000

(17)

The elements of

(

)encodetheprobabilitiesofhaving

mRNA present along with the

promoter having repressor bound (

), being empty (

), or having RNAP bound (

), re-

spectively.

describes probability flux

arriving

at the state

(

)fromastatewithone

fewer mRNA, namely

(

1), and

describes probability flux

departing

from the state

(

) for a state with one more mRNA, namely

(

+1).

is closely analogous to model

Mean mRNA at steady state is found analogously to model 1, with the result

(

)

(18)

and with details again deferred to Appendix S1. Fold-change is again found from the ratio

prescribed by Eq. 2, from which we have

fold-change =

(

)

(19)

✓

◆

(20)

✓

◆

(21)

which follows the master curve of Figure 1(D) with

⇢

=1+

(

)asclaimed.

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

2.1.5 Nonequilibrium Model Three - Multistep Transcription Initiation and

Escape

One might reasonably complain that the first two “nonequilibrium” models we have con-

sidered are straw men. Their steady states necessarily satisfy detailed balance which is

equivalent to thermodynamic equilibrium. Why is this the case? At steady state there is by

definition no net probability flux in or out of each promoter state, but since the promoter

states form a linear chain, there is only one way in or out of the repressor bound and RNAP

bound states, implying each edge must actually have a net zero probability flux, which is the

definition of detailed balance (usually phrased as equality of forward and reverse transition

fluxes).

Now we consider model 3 in Figure 1(C) which allows the possibility of true nonequilibrium

steady-state fluxes through the promoter states. We point out that this model was considered

previously in [38] where a comparison was made with model 1 as used in [36]. The authors

of [38] argued that the additional complexity is essential to properly account for the noise in

the mRNA distribution. We will weigh in on both models later when we consider observables

beyond fold-change.

The master equation governing this model is identical in form to model 2 above, namely

(

m, t

)=(

)

(

m, t

(

+1)

(

)

(22)

but with a higher-dimensional state space and di

↵

erent matrices. The state vector and

promoter transition matrix are now

(

)

(

)

(

)

(

)

(23)

with the four promoter states, in order, being repressor bound (

), empty (

), RNAP closed

complex (

), and RNAP open complex (

). Besides increasing dimension by one, the only

new feature in

is the rate

,representingtherateofopencomplexformationfromthe

closed complex, which we assume for simplicity to be irreversible in keeping with some [38]

but not all [39] past literature. The initiation matrices are given by

0000

000

0000

000

(24)

again closely analogous to nonequilibrium model 2.

The expression for mean mRNA is substantially more complicated now, as worked out in

Appendix S1 where we find

[(

(

)] +

(

)

(25)

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

which can be simplified to

+

(

)

+

(

)

(

)

+

(26)

The strategy is to isolate the terms involving the repressor, so that now the fold-change is

seen to be simply

fold-change =

+

(

)

+

(

)

(

)

+

+

(

)

(

)

+

(

)

(27)

✓

(

)

(

)

◆

(28)

surprisingly reducing to the master curve of Figure 1(D) once again, with

⇢

=1+

+

(

)

(

)

This example hints that an arbitrarily fine-grained model of downstream transcription steps

may still be collapsed to the form of the master curve for the means given in Figure 1(D), so

long as the repressor binding is exclusive with transcriptionally active states. We o

↵

er this

as a conjecture, and we suspect that a careful argument using the King-Altman diagram

method [40], [41] might furnish a “proof.” Our focus here is not on full generality but rather

to survey an assortment of plausible models for simple repression that have been proposed

in the literature.

2.1.6 Nonequilibrium Model Four - “Active” and “Inactive” States

Model 4 in Figure 1(C) is at the core of the theory in [42]. At a glance the cartoon for this

model may appear very similar to model 2, and mathematically it is, but the interpretation

is rather di

↵

erent. In model 2, we interpreted the third state literally as an RNAP-bound

promoter and modeled initiation of a transcript as triggering a promoter state change, making

the assumption that an RNAP can only make one transcript at a time. In contrast, in the

present model the promoter state does

not

change when a transcript is initiated. So we no

longer interpret these states as literally RNAP bound and unbound but instead as coarse-

grained “active” and “inactive” states, the details of which we leave unspecified for now. We

will comment more on this model below when we discuss Fano factors of models.

Mathematically this model is very similar to models 1 and 2. Like model 1, the matrix

describing transcription initiation is diagonal, namely

000

(29)

The master equation takes verbatim the same form as it did for model 1, Eq. 12. Meanwhile

the promoter transition matrix

is the same as Eq. 16 from model 2, although we relabel

the rate constants from

±

±

to reiterate that these are not simply RNAP binding and

unbinding rates.

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

Carrying out the algebra, the mean mRNA can be found to be

(30)

and the fold-change readily follows,

fold-change =

(31)

✓

◆

(32)

from which we see

⇢

=1+

as shown in Figure 1(C).

2.1.7 Nonequilibrium Model Five - Bursty Promoter

The final model we consider shown in Figure 1(C) is an intuitive analog to model 1, with

just two states, repressor bound or unbound, and transition rates between them of

and

. In model 1, when in the unbound state, single mRNA transcripts are produced as a

Poisson process with some characteristic rate

.Thecurrentmodelbycontrastproduces,

at some Poisson rate

bursts

of mRNA transcripts. The burst sizes are assumed to be

geometrically distributed with a mean burst size

,whichwewillmotivateinSection3when

we derive this model as a certain limiting case of model 4.

From this intuitive picture and by analogy to model 1, then, it should be plausible that the

mean mRNA level is

✓

◆

(33)

which will turn out to be correct from a careful calculation. For now, we simply note that

just like model 1, the fold-change becomes

fold-change =

✓

◆

(34)

with

⇢

=1alsolikemodel1. Wewillalsoseelaterhowthismodelemergesasanatural

limit of model 4.

2.2 Discussion of Results Across Models for Fold-Changes in Mean

Expression

The key outcome of our analysis of the models in Figure 1 is the existence of a master curve

shown in Figure 1(D) to which the fold-change predictions of all the models collapse. This

master curve is parametrized by only two e

↵

ective parameters:

,whichcharacterizes

the number of repressors and their binding strength to the DNA, and

⇢

,whichcharacterizes

all other features of the promoter architecture. The key assumption underpinning this result

is that no transcription occurs when a repressor is bound to its operator. Note, however,

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

that we are agnostic about the molecular mechanism which achieves this; steric e

↵

ects are

one plausible mechanism, but, for instance, “action-at-a-distance” mediated by kinked DNA

due to repressors bound tens or hundreds of nucleotides upstream of a promoter is plausible

as well.

Why does the master curve of Figure 1(D) exist at all? This brings to mind the deep

questions posed in, e.g., [31] and [43], suggesting we consider multiple plausible models

of a system and search for their common patterns to tease out which broad features are

and are not important. In our case, the key feature seems to be the exclusive nature of

repressor and RNAP binding, which allows the parameter describing the repressor,

,to

cleanly separate from all other details of the promoter architecture, which are encapsulated

⇢

. Arbitrary nonequilibrium behavior can occur on the rest of the promoter state space,

but it may all be swept up in the e

↵

ective parameter

⇢

,towhichtherepressormakesno

contribution. We point the interested reader to [44] and [45] for an interesting analysis of

similar problems using a graph-theoretic language.

As suggested in [35], we believe this master curve should generalize to architectures with

multiple repressor binding sites, as long as the exclusivity of transcription factor binding

and transcription initiation is maintained. The interpretation of

is then of an e

↵

ective

free energy of all repressor bound states. In an equilibrium picture this is simply given by

the log of the sum of Boltzmann weights of all repressor bound states, which looks like the

log of a partition function of a subsystem. In a nonequilibrium picture, while we can still

mathematically gather terms and give the resulting collection the label

,itisunclear

if the physical interpretation as an e

↵

ective free energy makes sense. The problem is that

free energies cannot be assigned unambiguously to states out of equilibrium because the free

energy change along a generic path traversing the state space is path dependent, unlike at

equilibrium. A consequence of this is that, out of equilibrium,

is no longer a simple sum

of Boltzmann weights. Instead it resembles a restricted sum of King-Altman diagrams [40],

[41]. Following the work of Hill [46], it may yet be possible to interpret this expression as

an e

↵

ective free energy, but this remains unclear to us. We leave this an open problem for

future work.

If we relax the requirement of exclusive repressor-RNAP binding, one could imagine models

in which repressor and RNAP doubly-bound states are allowed, where the repressor’s e

↵

ect

is to reduce the transcription rate rather than setting it to zero. Our results do not strictly

apply to such a model, although we note that if the repressor’s reduction of the transcription

rate is substantial, such a model might still be well-approximated by one of the models

in Figure 1.

One may worry that our “one curve to rule them all” is a mathematical tautology. In fact

agree

with this criticism if

is “just a fitting parameter” and cannot be meaningfully

interpreted as a real, physical free energy. An analogy to Hill functions is appropriate here.

One of their great strengths and weaknesses, depending on the use they are put to, is that

their parameters coarse-grain many details and are generally not interpretable in terms of

microscopic models, for deep reasons discussed at length in [31]. By contrast, our master

curve claims to have the best of both worlds: a coarse-graining of all details besides the

repressor into a single e

↵

ective parameter

⇢

,whilesimultaneouslyretaininganinterpretation

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

as a physically meaningful and interpretable free energy. Our task, then, is to prove

or disprove this claim.

How do we test this and probe the theory with fold-change measurements? There is a

fundamental limitation in that the master curve is essentially a one-parameter function of

+log

⇢

.Worse,therearemany

a priori

plausible microscopic mechanisms that could

contribute to the value of

⇢

, such as RNAP binding and escape kinetics [38], [39], and/or

supercoiling accumulation and release [47], [48], and/or, RNAP clusters analogous to those

observed in eukaryotes [49], [50] and recently also observed in bacteria [51]. Even if

is measured to high precision, inferring the potential microscopic contributions to

⇢

,buried

inside a log no less, from fold-change measurements seems beyond reach. As a statistical

inference problem it is entirely nonidentifiable, in the language of [52], Section 4.3.

If we cannot simply infer values of

⇢

from measurements of fold-change, can we perturb some

of the parameters that make up

⇢

and measure the change? Unfortunately we suspect this

is o

↵

-limits experimentally: most of the potential contributors to

⇢

are global processes that

↵

ect many or all genes. For instance, changing RNAP association rates by changing RNAP

copy numbers, or changing supercoiling kinetics by changing topoisomerase copy numbers,

would massively perturb the entire cell’s physiology and confound any determination of

⇢

One might instead imagine a bottom-up modeling approach, where we mathematicize a

model of what we hypothesize the important steps are and are not, use

in vitro

data for

the steps deemed important, and

predict

what

⇢

should be. But again, because of the

one-parameter nature of the master curve, many di

↵

erent models will likely make indistin-

guishable predictions, and without any way to experimentally perturb

in vivo

,thereisno

clear way to test whether the modeling assumptions are correct.

In light of this, we prefer the view that parameters and rates are not directly comparable

between cartoons in Figure 1. Rather, parameters in the simpler cartoons represent coarse-

grained combinations of parameters in the finer-grained models. For instance, by equating

⇢

between any two models, one can derive various possible correspondences between the two

models’ parameters. Note that these correspondences are clearly not unique, since many

possible associations could be made. It then is a choice as to what microscopic interpreta-

tions the model-builder prefers for the parameters in a particular cartoon, and as to which

coarse-grainings lend intuition and which seem nonsensical. Indeed, since it remains an open

question what microscopic features dominate

⇢

(as suggested above, perhaps RNAP binding

and escape kinetics [38], [39], or supercoiling accumulation and release [47], [48], or, some-

thing more exotic like RNAP clusters [49]–[51]), we are hesitant to put too much weight on

any one microscopic interpretation of model parameters that make up

⇢

One possible tuning knob to probe

⇢

that would not globally perturb the cell’s physiology is

to manipulate RNAP binding sites. Work such as [53] has shown that models of sequence-

dependent RNAP a

nity can be inferred from data, and the authors of [54] showed that

the model of [53] has predictive power by using the model to

design

binding sites of a

desired a

nity. But for our purposes, this begs the question: the authors of [53]

assumed

aparticularmodel(essentiallyour3-stateequilibriummodelbutwithouttherepressor),so

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

it is unclear how or if such methods can be turned around to

compare

↵

erent models of

promoter function.

Another possible route to dissect transcription details without a global perturbation would

be to use phage polymerase with phage-specific promoters. While such results would carry

some caveats, e.g., whether the repression of the phage polymerase is a good analog to the

repression of the native RNAP, it could nevertheless be worthy of consideration.

We have already pointed out that the master curve of Figure 1 is essentially a one-parameter

model, the one parameter being

+log

⇢

. By now the reader may be alarmed as to how

can we even determine

and

⇢

independently of each other, never mind shedding a lens

on the internal structure of

⇢

itself. A hint is provided by the weak promoter approximation,

invoked repeatedly in prior studies [14], [32], [33] of simple repression using the 3-state

equilibrium model in Figure 1(B). In that picture, the weak promoter approximation means

exp(

)

⌧

1, meaning therefore

⇢

⇡

1. This approximation can be well justified

on the basis of the number of RNAP and

factors per cell and the strength of binding of

RNAP to DNA at weak promoters. This is suggestive, but how can we be sure that

⇢

not, for instance, actually 10

and that

hides a compensatory factor? A resolution is

↵

ered by an independent inference of

⇢

in the absence of repressors. This was done in [42]

by fitting nonequilibrium model 4 in Figure 1(C), with zero repressor (looking ahead, this

is equivalent to model 4 in Figure 2(A)), to single-cell mRNA counts data from [34]. This

provided a determination of

and

, from which their ratio is estimated to be no more

than a few 10

and possibly as small as 10

The realization that

⇢

⇡

1toanexcellentapproximation,

independent

of which model in Fig-

ure 1 one prefers, goes a long way towards explaining the surprising success of equilibrium

models of simple repression. Even though our 2- and 3-state models get so many details

of transcription wrong, it does not matter because fold-change is a cleverly designed ratio.

Since

⇢

subsumes all details except the repressor, and log

⇢

⇡

0, fitting these simple models

to fold-change measurements can still give a surprisingly good estimate of repressor binding

energies. So the ratiometric construction of fold-change fulfills its intended purpose of can-

celing out all features of the promoter architecture except the repressor itself. Nevertheless

it is perhaps surprising how e

↵

ectively it does so:

a priori

,onemightnothaveexpected

⇢

to be quite so close to 1.

We would also like to highlight the relevance of [55] here. Landman et. al. reanalyzed and

compared

in vivo

and

in vitro

data on the lacI repressor’s binding a

nity to its various

operator sequences. (The

in vivo

data was from, essentially, fitting our master curve to

expression measurements.) They find broad agreement between the

in vitro

and

in vivo

values. This reinforces the suspicion that the equilibrium

repressor binding energies do

in fact represent real physical free energies. Again,

a priori

this did not have to be the case,

even knowing that

⇢

⇡

In principle, if

can be measured to su

cient precision, then deviations from

⇢

become a testable matter of experiment. In practice, it is probably unrealistic to measure

repressor rates

or fold-changes in expression levels (and hence

)precisely

enough to detect the expected tiny deviations from

⇢

=1. Wecanestimatetherequisite

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint

precision in

to resolve a given

⇢

by noting, since

⇢

⇡

1, that log(1 +

⇢

)

⇡

⇢

,so

(

)

⇡

⇢

. Suppose we are lucky and

⇢

⇠

1, on the high end of our range estimated

above. A determination of

with an uncertainty of barely 0.1 was achieved in the

excellent measurements of [32], so this requires a very di

cult determination of

for a

very crude determination of

⇢

,whichsuggests,toputitlightly,thisisnotapromisingpath

to pursue experimentally. It is doubtful that inference of repressor kinetic rates would be

any easier.

Moving forward, we have weak evidence supporting the interpretation of

as a physically

real free energy [55] and other work casting doubt [56]. How might we resolve the confusion?

If there is no discriminatory power to test the theory and distinguish the various models with

measurements of fold-changes in means, how do we probe the theory? Clearly to discriminate

between the nonequilibrium models in Figure 1, we need to go beyond means to ask questions

about kinetics, noise and even full distributions of mRNA copy numbers over a population

of cells. If the “one-curve-to-rule-them-all” is more than a mathematical tautology, then the

free energy of repressor binding inferred from fold-change measurements should agree with

repressor binding and unbinding rates. In other words, the equilibrium and nonequilibrium

definitions of

must agree, meaning

log(

R/N

log(

)

(35)

must hold, where

is inferred from the master curve fit to fold-change measurements,

and

are inferred in some orthogonal manner. Single molecule measurements such

as from [56] have directly observed these rates, and in the remainder of this work we explore

a complementary approach: inferring repressor binding and unbinding rates

and

from

single-molecule measurements of mRNA population distributions.

3 Beyond Means in Gene Expression

In this section, our objective is to explore the same models considered in the previous section,

but now with reference to the the question of how well they describe the distribution of gene

expression levels, with special reference to the variance in these distributions. To that end,

we repeat the same pattern as in the previous section by examining the models one by one.

In particular we will focus on the Fano factor, defined as the variance/mean. This metric

serves as a powerful discriminatory tool from the null model that the steady-state mRNA

distribution must be Poisson, giving a Fano factor of one.

3.1 Kinetic models for unregulated promoter noise

Before we can tackle simple repression, we need an adequate phenomenological model of

constitutive expression. The literature abounds with options from which we can choose, and

we show several potential kinetic models for constitutive promoters in Figure 2(A). Let us

consider the suitability of each model for our purposes in turn.

CC-BY-NC 4.0 International license

(which was not certified by peer review) is the author/funder. It is made available under a

The copyright holder for this preprint

this version posted June 14, 2020.

https://doi.org/10.1101/2020.06.13.150292

doi:

bioRxiv preprint