Analytical solutions of the chemical master equation with bursty production and isomerization reactions

Splicing cascades that alter gene products post-transcriptionally also affect expression dynamics. We study a class of processes and associated distributions that emerge from a bursty promoter model coupled to a path graph of downstream mRNA splicing, and more generally examine the behavior of finite-activity jump drivers coupled to a directed acyclic graph of splicing with one or more roots. These solutions provide full time-dependent joint distributions for an arbitrary number of species, offering qualitative and quantitative insights about how splicing can regulate expression dynamics. Finally, we derive a set of quantitative constraints on the minimum complexity necessary to reproduce gene co-expression patterns using synchronized burst models. We validate these findings by analyzing long-read sequencing data, where we find evidence of expression patterns largely consistent with these constraints.


Introduction
Recent advances in the analysis of single-cell RNA sequencing [1] and fluorescence microscopy [2] enable the quantification of pre-mRNA molecules alongside mature mRNA. These techniques provide an opportunity to infer the topologies and biophysical parameters governing the processes of mRNA transcription, splicing, export, and degradation in living cells. In particular, they provide a novel approach to inferring and studying the dynamics of splicing cascades [3]. However, drawing mechanistic conclusions from transcriptomics requires overcoming numerous statistical and computational challenges. Living cells contain mRNA in low copy numbers, and transient nascent species are even less abundant, leading to potential pitfalls if the discrete nature of the data is not appropriately modeled [4]. One approach to modeling dynamics from count data is to utilize detailed Markov models based on the chemical master equation (CME) [5][6][7]. Such modeling can, in principle, yield analytical solutions for the dynamics of genes affected by arbitrary splicing networks, thus circumventing tractability problems arising with matrix-and simulationbased methods that can be impractical for large numbers of species [8]. Several classes of analytical solutions to the CME are available [9], but their derivation tends to be ad hoc, with limited generalization to more complex systems. Starting with the approach of Singh and Bokes to the problem of bursty transcription coupled to nuclear export and degradation of mRNA [10], we develop a class of solutions for gene dynamics affected by arbitrary splicing networks under the physiologically relevant assumption of bursty production of mRNA [11]. Fundamentally, the generating function of a Markov chain describing the evolution of molecules in a splicing cascade can be automatically computed, numerically integrated in time, and then inverted by Fourier transformation at an overall computational complexity of O(N log N ) in state space size. We begin with the example of splicing described by a path graph, where the order of intron removal is deterministic, extend the procedure to a much broader class of splicing graphs and driving burst processes, and subsequently demonstrate the existence of the solutions and their isomorphism to a class of moving average processes. Figure 1: Graph representation of the generic path graph model. The source transcript T 1 is synthesized at the gene locus in random geometrically distributed bursts (according to a distribution B with burst frequency k). Each molecule proceeds to isomerize in a chain of splicing reactions governed by successive rates β 1 , β 2 , . . . , β n−1 , until reaching the form T n , which is ultimately degraded at rate γ.

Path graph splicing
Consider the system consisting of a bursting gene coupled to a n-step birth-death process, characterized by the path graph in Figure 1, where B ∼ Geom(b), and all reactions occur after exponentiallydistributed waiting times. The bursts occur with rate k, the conversion of adjacent transcripts T i to T i+1 occurs with rate β i , and the degradation of T n occurs with rate γ := β n . We assume the rates of conversion and degradation are all distinct. The amount of species T i can be described by the non-negative discrete random variable m i . We assume no molecules are present at t = 0.

Discrete formulation and algorithm
We would like to compute the probability mass function (PMF) of the count distribution, denoted by P (m 1 , ..., m n , t); this corresponds to the probability of observing m 1 molecules of T 1 , m 2 molecules of T 2 , etc., at time t. Following a previous derivation [10], this problem can be reframed in terms of a partial differential equation involving the probability generating function (PGF) G(x 1 , ..., x n , t): G := m 1 ,...,mn x m 1 1 ...x mn n P (m 1 , ..., m n , t) where F is the PGF of the burst distribution. Applying the transformations u i := x i − 1 and φ := ln G yields the equation: where M (u) := F (1 + u). This equation can be solved using the method of characteristics, with formal solution φ = t 0 [M (U 1 (s)) − 1]ds. The characteristics U i , i < n satisfy dU i ds = β i (U i+1 − U i ), with U n (u 1 , ..., u n ; s) := U n (s) = u n e −γs . The functional form of dU i ds implies that U 1 (s) is the weighted sum of exponentials n i=1 A i e −β i s ; the explicit analytical solution is provided in Section 5. The weights A i can be computed through a simple iterative procedure, which proceeds from the terminal species and successively incorporates dependence on upstream rates: Finally, given the physiologically plausible geometric burst distribution [12], φ(t; ·) = k t 0 bU 1 (s) 1−bU 1 (s) ds. The stationary PGF is e φ evaluated at t = ∞; this quantity can be converted to a probability mass function via Fourier inversion. This is an entirely general procedure that can be used for any path graph of downstream processing, although regions of stiffness are to be expected, particularly near the degenerate cases of matching rates. In the most general case, any β i may be equal to any other, leading to a combinatorial explosion of auxiliary degenerate solutions. As the overarching motivation is the statistical inference of biophysical parameters, and the degenerate regions are all of measure zero, there appears to be little physical justification for considering them in further detail. However, we remark that these solution classes are likewise analytically tractable, with mixed polynomial-exponential functional forms entirely analogous to previous work [10].

Directed acyclic graph splicing
The simplicity of the ODEs governing the evolution of the CME lends itself to extensions to the broadest class of graphs representing the stochastic and incremental removal of discrete introns, the directed acyclic graphs.

Example: alternative splicing
Suppose the downstream dynamics are given by a directed rooted tree. The solution procedure is analogous to that used for the path graph. First, starting at the leaves, the path subgraph solutions are produced by the procedure above, yielding a sum of exponentials. Then, at a node of out-degree > 1 (i.e., molecular species with several potential products), the associated ODE has a functional form identical to that of a path graph. Therefore, the solutions are analogous. As an illustration, consider the simplest tree graph, shown in Figure 2a, where the splicing reactions occur at rates α 1 and α 2 and degradation reactions occur at rates β 1 and β 2 . Physically, this graph can be interpreted as a single source mRNA being directly and stochastically converted to one of two terminal isoforms by removal of intron 1 or intron 2. Clearly, The ODE governing the source species is dU 0 Finally, the expression for U 0 can be directly plugged into the desired burst distribution generating function.

Example: two-intron splicing with non-deterministic order
Consider the same tree graph as in the example above, and suppose T 1 and T 2 are converted to product T 12 at rates β 1 and β 2 , as shown in Figure 2b. Afterward, T 12 is degraded at rate γ. Physically, this graph can be interpreted as a single source mRNA being converted to one of two intermediate isoforms by the removal of one of two introns, then to a single terminal isoform by the removal of the other intron. Clearly, U 12 = u 12 e −γs . Setting Finally, the dynamics of the source molecule T 0 are governed by the following ODE: Yet again, the functional form affords a straightforward analytical solution: where c := α 1 + α 2 , C 1 := α 1 (u 1 − f 1 u 12 ), C 2 := α 2 (u 2 − f 2 u 12 ), and C 3 : The details of the ODE solution are described in Section 5 and the computation procedure is demonstrated in Figure 3.

Algorithm
This procedure is facile to extend to an arbitrary directed acyclic graph with a unique root. The reaction system is fully characterized by two arrays, the stoichiometric matrix matrix S and the rate vector r used in the stochastic simulation algorithm [13]. Thus, for example, the path graph can be represented by the full rate vector [k, β 1 , ..., β n−1 , γ] and the following stoichiometric matrix: A well-defined system has n species and q + 1 reactions. We assume unbounded accumulation does not occur and the graph is connected, so each species has influx and efflux reaction pathways, implying q ≥ n. Further, we suppose that each species is associated with at most one degradation reaction; if several channels are available, they can be added by the superposition property of a Poisson process. Finally, we assume that all isomerization and degradation rates are distinct. Since the underlying graph is acyclic, there exists at least one terminal species with a single efflux reaction. Thus, consider a single production reaction with rate k coupled to a set of isomerization reactions with rates c ji and degradation reactions with rates c j0 . The downstream dynamics are determined by U 1 , an exponential sum with n terms. In this generic case, lumped rate exponents r i must be computed as the net efflux rate from each molecule. A simple implementation of the routine is outlined below: Initialize coefficient array Identify reaction channels that correspond to degradation Initialize set of computed species while |C| < n do for j / ∈ C do Iterate over all species that have not yet been computed R I = {k | (S kj = −1 ∩ S ki = 1)} Identify isomerization reactions originating at j P = {i | (S ki = 1 for k ∈ R I )} Identify corresponding isomerization products if P ⊂ C then If all products have been computed for i ∈ P do For each isomerization product Apply initial condition, compute coefficient for e −r j s C ← C ∪ j Adjust set of computed species end if end for end while The terminal exponential sum is given by U 1 = n i=1 A 1i e −r i s . As before, the full time-dependent solution for any burst distribution with a well-defined moment-generating function is computable by quadrature. The case of the geometric burst distribution yet again corresponds to φ(t; u 1 , ..., u n ) = k t 0 bU 1 (s) 1−bU 1 (s) ds.

Continuous formulation
We have defined a series of discrete joint distributions induced by a graph governing splicing and degradation. However, we can equivalently recast the problem in terms of the stochastic differential equations (SDEs) governing the distributions' Poisson intensities Λ i . Specifically, the following identity can be used to relate a set of stochastic processes Λ 1 , ..., Λ n with joint distribution F Λ 1 ,...,Λn to the solution of the CME [9,[14][15][16]: As an illustration, we can consider the intensity of an n-step isomerization process driven by a finite-activity compound Poisson Lévy subordinator L t : This formulation has an intimate connection with the theory of moving average processes, which can be immediately seen by applying variation of parameters to the Poisson representation: By convention, β n := γ and Λ 0 ds := dL s , the underlying driving subordinator. Thus, each Λ i is the exponentially-weighted moving average of Λ i−1 ; Λ 1 is the moving average of the Poisson shot noise introduced by the compound Poisson subordinator. Although ample literature exists on the topic of moving average processes [17], it generally considers the problem of inference and prediction from discrete-time observations. Furthermore, discussions in the context of the chemical systems only tend to consider the first-order moving average of shot noise [9]. This formulation implies that any n-step isomerization process is identical to an (n − i)-step isomerization process driven by an order i iterated moving average process. This identity affords a route for the analytical computation of hybrid continuous-discrete system solutions merely by marginalizing the first i species. The Poisson representation enables the simultaneous discussion of the properties of F m 1 ,...,mn , the stationary distribution of the continuous-time Markov chain, and F Λ 1 ,...,Λn , the stationary distribution of the underlying series of stochastic differential equations. In fact, from standard properties of Poisson mixtures [18], the PGF G(x 1 , ..., x n ) of the former evaluated at u i = x i − 1, precisely yields the moment-generating function (MGF) of the latter. This connection also allows the evaluation of the solution for a broad class of compound Poisson driving processes. Specifically, M (u) = 1 1−bu is the MGF associated with exponentially distributed jumps. In the discrete domain, this translates to the geometric burst distribution. However, the downstream dynamics, represented by U 1 (s), are independent of the burst distribution. Therefore, the procedure affords computable solutions for any finite-activity pure-jump Lévy driving process. Extensions to generic directed acyclic graphs are entirely analogous. These solutions, identical to the CME case, can be used to exactly solve arbitrary DAGs representing continuous-valued molecular concentrations. This representation is conventional for high-abundance species [19]. This formulation occasionally enables the confirmation of CME results using standard properties of SDEs. For example, when L t is a compound Poisson subordinator with exponential jumps, Λ 1 is the gamma Ornstein-Uhlenbeck process [20][21][22][23]. Inspired by a highly general result for constitutive transcription, which states that Poisson distributions always remain Poisson for a birth-death process [14], we may reasonably ask whether equivalent results are available for bursty processes. This intuition turns out incorrect, and straightforward to disconfirm using SDE results. The distribution of the gamma Ornstein-Uhlenbeck process is not gamma for any finite t ∈ (0, ∞) -although it does approach a gamma law exponentially fast [21]; therefore, the corresponding Poisson mixture is not simply a time-varying negative binomial.

Distribution properties
Using the algorithms above, we can compute the generating functions corresponding to the stochastic processes of interest. However, it is not yet clear that these generating functions are everywhere well-defined. For example, certain physiologically plausible noise models do not have all moments [24]; their generating functions fail to converge in certain regimes. By analyzing the functional form of the downstream process and assuming geometric-distributed bursts, we demonstrate that this class of processes is guaranteed to yield convergent generating functions and finite moments.

Positivity of the exponential sum
First, we demonstrate that the downstream processes yield a strictly positive functional form of time dependence. Noting that the marginal of species i yields the functional form U 1 (u i ; s) = u i n j=1 a j e −r j s := u i ψ i (s), this condition translates to ψ i (s) > 0 for all s > 0. Consider F (x) = x, corresponding to constitutive production of the source species (i.e. a Poisson birth process), with no molecules present at t = 0. Focusing on the marginal of species i, this assumption yields , marginalizes over all j = i and yields the probability of observing zero counts of species is strictly positive. This follows from the reaction rate equations. By the continuous formulation, λ i is a weighted moving average of some set of processes {λ k }. λ 1 is a strictly increasing function governed by k r 1 (1 − e −r 1 s ). The property of being strictly increasing is retained under moving average and rescaling. Therefore, each successive moving average must be strictly increasing. Finally, P 0 ∈ (0, 1) > 0, because the solution for m i is given by a Poisson distribution, which has support on all of N 0 . Therefore, dP 0 dt is strictly negative. As the exponential term and k are positive, this implies ψ i (s) is strictly positive for all s > 0.

Existence of generating functions and moments
Next, we show that G(u i ; t), the generating function of the ith marginal, is finite for the geometric burst system. This follows from the construction of the original PGF: the marginal PGF is guaranteed to be finite if 1 − bu i ψ i is never zero. But for the relevant domain (u n ) ≤ 0, on the shifted complex unit circle, (1 − bu i ψ i ) ≥ 1, except at the degenerate initial case. The existence of the marginal moments of m i is implied by the existence of the generating function. The existence of all cross moments follows from the Cauchy-Schwartz inequality. Per standard properties, this existence property holds for both m i and Λ i . The tails of the stationary discrete marginals decay no slower than the geometric distribution. This follows immediately from the lower bound on (1 − bu i ψ i ), which in turn gives an upper bound on x i [10]. Equivalently, this follows from the existence of all moments [24]. An analytical radius of convergence has been given previously for n = 2 [10], but numerical optimization is necessary to establish rates of tail decay for n > 2.

Statistical properties
All marginals are infinitely divisible. This follows from the functional form of the PGF: the random variable corresponding to any marginal distribution can be written as as a sum of q random variables with burst frequency k/q. Only the first marginal is self-decomposable. This follows from the condition that a random variable has a self-decomposable (sd) law if and only if it offers a representation of the form Y = ∞ 0 e −t dX t , with Lévy X t [25]. However, only Λ 0 is Lévy. All downstream intensity processes have nontrivial, almost-everywhere C ∞ trajectories, which implies they cannot be represented by a Lévy triplet: the only permitted continuous Lévy processes are linear combinations of the (nondifferentiable) Brownian motion W t and the trivial process t. Therefore, Λ i is sd for i = 1 and non-sd for all i > 1.
All stationary marginals are unimodal. Multimodality in the distribution of the moving average over time is contingent on time-inhomogeneity in the trajectory process. However, the underlying driving process is defined to be time-homogeneous, with uniformly distributed jump times; therefore, each downstream process has a unimodal distribution over time. By ergodicity, this distribution is equivalent to the ensemble distribution. Therefore, all marginals of downstream species are unimodal.

Moments
The moments of the marginals can be computed directly from the derivatives of the marginal MGF e φ(u i ) at u i = 0: For the path graph system, the CME immediately yields µ i = kb r i . Considering the previously dis- which is equal to 1 r i for the path graph system. Per the standard properties of mixed Poisson distributions [26], the value of µ i is identical for the underlying continuous process and the derived discrete process. The stationary second moments can be found analogously: which is straightforward to compute, but does not appear to have an easily amenable analytical form beyond the first few cases. Naturally, the standard properties of Poisson mixtures [26] allow The covariances can be computed directly from the derivatives of the MGF e φ(u l ,u i ) , i.e., the marginalized MGF for u q = 0 for all q = l, i. The particular functional form of φ(u l , u i ) is given by where each ψ is the exponential sum corresponding to the marginal of the species in question. This yields: As above, where a j are the weights associated with ψ l and c k are the weights associated with ψ i . This form of the summation is very general; for example, in case of the path graph system with l > i, it can be equivalently represented as 2kb 2 l j=1 i k=1 a j c k r j +r k . From standard identities, the covariance of a mixed bivariate Poisson distribution with no intrinsic covariance forcing is identical to the covariance of the mixing distribution [26]. Therefore, this result holds for both the CME and the underlying SDE. The Pearson correlation coefficient follows immediately from the result above: Since mixing decreases variance but not covariance, the correlation coefficient of the discrete system will always be lower than that of its continuous or hybrid analog.

Simulation
To compare the analytical solutions with simulation, we generated a random directed acyclic graph, shown in Figure 4. The numbers of species (7) and isomerization reactions (11) were chosen arbitrarily. We enforced the existence of a single unique source node (a) and the weakly connected property to ensure only a single source mRNA would be present and all isoforms would be reachable from it, but did not impose any other conditions. The number of degraded species (3) was chosen arbitrarily; we assigned degradation reactions to the two sink species (c, e) and randomly chose a degraded intermediate (b) from a uniform distribution over the molecular species. All reaction rates were drawn from a log-uniform distribution on [10 −0.5 , 10 0.5 ]; we chose to sample them from a single order of magnitude to avoid the trivial degenerate cases that occur in cases of very slow or very fast export [10]. This process produced the parameter values k = 0. 44 Figure 4. Finally, we chose the geometric burst model with b = 10. We applied the algorithm to compute the exponents and coefficients, and computed the stationary distributions of all species. The simulated distributions match the quantitative results for the marginals, as shown in Figure 5. Furthermore, the 49 entries of the covariance matrix are likewise effectively predicted by the procedure for moment calculation. Figure 4: Graph representation of the randomly generated transcription, splicing, and degradation model. A single source isoform T a is converted to a variety of downstream isoforms T b , ..., T g , which isomerize according to a directed acyclic graph.

Multi-gene systems
Although the CME solution is a general framework, the relevance to modern transcriptomic experimental data is tempered somewhat by the simplicity of the model. The model can describe the splicing cascade of a single gene, but does not naturally extend to multi-gene networks. Yet we know that genes often belong to co-expression modules that are identifiable by similarity metrics [3,27]. Therefore, we are faced with the challenge of integrating multiple genes in a physically meaningful way. Instead of building intractable "top-down" models that encode complex networks, we may build "bottom-up" models that extend analytical solutions. For example, we can consider sets of synchronized genes that experience bursting events at the same time. This model represents the bursty limit of multiple genes with transcription rates governed by a single telegraph process, up to scaling; a conceptually similar model has previously been used to describe correlations between multiple copies of one gene [28]. This model retains the appeal of physical interpretability -for example, gene modules may be regulated by the same molecule -but does not excessively complicate the mathematics, and offers an incremental step toward more detailed descriptions.

Two-gene bursty model, no splicing
We begin by considering the instructive model of two genes influenced by the same regulator, with no downstream splicing. The burst processes are synchronized, but the burst sizes are not, and may indeed come from different distributions.
The following CME holds: where p i and q j give the PMF weights of the burst size distributions that govern B 1 and B 2 . We define the joint PGF and recognize that the degradation terms have the familiar functional forms −β 1 (x 2 − 1) ∂G ∂x 1 and −β 2 (x 2 − 1) ∂G ∂x 2 . Therefore, considering the burst term: where G is the conditional PGF assuming m 1 − i molecules of T 1 and F i are the burst distribution PGFs; the final steps exploit the interpretation of the double sums as Cauchy products. This result implies: The PDE can be solved using the method of characteristics, yielding U i = u i e −β 1 s : assuming, as before, that B i ∼ Geom(b i ).

n-gene bursty model with splicing
From the derivation above, four interesting properties stand out. Firstly, the functional form of U i depends only on the details of the downstream processes; as shown, it can be easily computed for any DAG. Secondly, the derivation holds with no loss of generality for any number of genes, so long as the burst distributions are uncoupled, by repeated application of the Cauchy product formula.
Thirdly, even if they are coupled, the derivation still holds, but with intermediate conditional PGFs F i , ultimately yielding a joint PGF F that factorizes in the independent case. Finally, even though we have considered the case of disjoint DAGs, the results still hold if the gene products can ultimately converge. This becomes self-evident by identifying T 1 with T 2 and setting u 1 = u 2 , b 1 = b 2 , and β 1 = β 2 : the synchronized loci are merely multiple copies of the same gene, and produce bursts sampled from a negative binomial distribution, the sum of two identical geometric distributions. Therefore, in the most general case of arbitrary burst distributions and downstream splicing cascades, the factorial-cumulant generating function takes the following form: where U 1 , ..., U N are now the "U 1 " functions, not necessarily distinct, corresponding to the species produced by each bursty locus, and M is the joint MGF of the burst distributions. The single-gene results are recovered by marginalization.

Examples of multi-gene systems
Uncorrelated gene loci By the Cauchy-Schwartz inequality, all moments and cross-moments exist. The correlations are straightforward to compute. For example, we can consider the stationary covariance of gene products from two co-expressed genes, with disjoint downstream splicing processes. We marginalize over all other transcripts and consider the functional forms U l = u l ψ l and U i = u i ψ i , supposing that the respective average burst sizes are b l and b i .
where a j are the weights associated with ψ l and c k are the weights associated with ψ i , much as before. The covariance of the discrete process is identical. We reiterate that this expression only applies when the splicing graphs downstream of the gene loci are disjoint. The converse case requires slightly more unwieldy notation to account for, e.g., species l accessible from N source transcripts being associated with distinct coefficients a i } and using the procedure above. With this solution in hand, we can revisit the two-gene problem with no splicing: We are interested in standard summaries, such as the Pearson correlation coefficient: where we use the fact that the absolute timescale is immaterial. The first term achieves a global maximum of 1/2 at β 1 = β 2 . The second is strictly smaller than 1, but asymptotically approaches 1 as b 1 , b 2 jointly approach infinity. All downstream processes are stochastic and desynchronize molecular observables. Therefore, 1/2 is the supremum of gene-gene correlations in this class of models.
Fully correlated gene loci Conversely, we can consider the two-gene problem assuming that the burst distributions are identical and fully correlated. Physically, this model may correspond to coupling of initiation processes, e.g. this may occur when two genes are controlled by a single promoter. This burst distribution has the following joint PGF: upon inserting the characteristics, we yield which yields the result Cov(Λ 1 , Λ 2 ) = kb(2b + 1) Therefore, the correlation is As in the case of uncoupled gene sizes reported in Equation 1, the first term is at most 1/2. The second term asymptotically approaches 2 as b → ∞. Therefore, there are no intrinsic model constraints on Pearson correlation coefficients of two gene products; constraints arise as the effect of the burst size correlation structure.
Anti-correlated gene loci With these results in mind, we can consider the problem of describing genes with high negative correlations. As an illustration, we can consider two genes driven by a single telegraph process: gene 1 is on whenever gene 2 is off and vice versa; therefore, their respective products T 1 and T 2 must have a negative correlation. However, only one of these genes has a welldefined bursty limit that is infinitesimally short on periods; the other will essentially be on all the time and transcribe constitutively, containing no mutual information about the burst timing. Nevertheless, putting aside the problem of positing a specific limiting mechanism, we can ask whether any joint burst distribution can produce negative correlations in molecule counts, despite perfect synchronization between burst events. Considering the cross moment of mRNA produced at two synchronized loci: with the partial derivatives evaluated at u 1 = u 2 = 0. The second term matches µ 1 µ 2 , and is strictly positive. The first term is the integral of an exponentially discounted burst cross moment: where the partial derivatives are yet again evaluated at u 1 = u 2 = U 1 = U 2 = 0, and B 1 and B 2 denote the SDE jump sizes at the two gene loci. By the definition of covariance: Now, supposing the correlation between the burst sizes is ρ ∈ [−1, 0), and considering the covariance between the transcripts: which achieves a minimum at ρ = −1. Thus, the covariance has a lower limit: Without constructing the joint distribution explicitly, if we suppose the marginal discrete burst distributions are geometric -i.e., the jump sizes are exponential -then µ B i = σ B i = b i , and the lower limit on covariance is zero. This means that negative correlations cannot possibly result from a model with geometrically distributed, synchronized jumps. However, other joint burst laws can produce negative correlations, as long as the population correlation coefficient is sufficiently negative and the burst distributions are sufficiently dispersed. We can demonstrate the existence of processes with negative count correlations induced by synchronized burst events. First, we suppose that the marginal burst distributions are identical and described by a gamma law with shape α and scale θ, enforcing µ B 1 = µ B 2 = αθ and σ 2 B 1 = σ 2 B 2 = αθ 2 . Therefore, the covariance of the Poisson intensities takes the following form: which achieves Cov(Λ 1 , Λ 2 ) < 0 whenever ρα + α 2 < 0. Therefore, for any ρ ∈ (−1, 0), every α ∈ (0, −ρ) meets this criterion. It remains to confirm that a bivariate gamma distribution with a negative correlation can exist. Such a distribution was constructed by Moran, and permits all ρ ∈ (−1, 1) [29,30]. Furthermore, a simple application of the Cauchy-Schwartz inequality yields [31]: therefore, the joint MGF of the correlated bivariate gamma distribution is guaranteed to exist. This demonstrates the existence of continuous moving average processes with negative stationary correlation, driven by one Poisson process arrival process. Finally, the corresponding Poisson mixture has identical covariance, and must also have a negative correlation. Therefore, a CME with marginal negative binomial burst distributions and a carefully chosen joint structure can achieve negative molecular correlations, even if the bursts are synchronized.
Multi-gene dynamics emerging from fast processing Interestingly, there is a set of singlegene systems that recapitulate the multi-gene functional form in the limit of fast splicing. Consider a source species T 0 , which is produced in geometrically distributed bursts and converted to species {T i }, with i = 1, ..., N , at rates β i . These transcripts are degraded at rates γ i . Furthermore, suppose all of the β i ∼ O(ε −1 ) for small ε, i.e., the source transcript is extremely unstable. In this limit, T i are produced with bursts of size Bβ i /r, where B is the underlying T 0 burst size, r := i β i , and the ratio β i /r is O(1). We define corresponding weights w i := β i /r; by definition, i w i = 1. This yields: where M is recognizable as the joint MGF of a perfectly correlated N -variate exponential distribution with marginal distributions B i = w i B: i.e., the univariate exponential MGF evaluated at N i=1 w i u i . The corresponding discrete PGF is: As seen above, this is not the perfectly correlated multivariate geometric distribution: the stochasticity of the reaction channel selection is non-negligible. Instead, we can construct a distribution over {0, 1} N , with the probability of state δ ij (i.e., the vector contains a one at position i) set to w i . It is easy to see that the generating function of the random variable Z ∈ {0, 1} N takes the form derived above: which amounts to F (x 1 , ..., x N ) = F (H(x 1 , ..., x N )); i.e., the effective gene dynamics are described by a compound distribution. Upon inserting the characteristics and selecting a set of two genes (arbitrarily indexed by 1 and 2), we yield: Plugging in u 1 = u 2 = 0, since marginalization with respect to any gene recovers the univariate geometric burst distribution with scale bw i , which immediately yields µ i = kbw i /γ i . This implies: From the marginal results, we still have σ 2 i = kbw i (1 + bw i )/γ i . This yields: Therefore, fast processing of the source transcript yields dynamics equivalent to a class of multi-gene models, with positive, but otherwise unconstrained, correlation between the downstream species.

Transient gene dynamics
Thus far, we have primarily focused on stationary systems with time-independent parameters. Nevertheless, there are classes of physiological phenomena, such as differentiation and cell cycling, where transient behaviors are crucial, particularly since these processes occur on timescales comparable to the mRNA lifetimes [1,32]. Usually, the regulatory events underpinning these processes are modeled by variation in DNA-localized transcriptional parameters [33][34][35][36]. By examination of the generating function relations, it is easy to see that the current framework is trivial to extend to any deterministic variation in k and M : where we have adopted the shorthand U for the set of exponential sums U 1 (s), ..., U N (s) governing each burst product of the N loci. Therefore, burst frequency, burst size, and even the number of synchronized gene loci per cell can vary, continuously or discontinuously. If the reaction rates within U change over time, the generating function PDE then becomes intractable for all but the simplest models, such as piecewise constant. The gene locus parameters b and k can also vary stochastically. In principle, certain models can be solved by defining an appropriate SDE-CME system. However, such dynamics are generally challenging to treat without recourse to numerical ODE solvers. Further, we restrict our analysis to systems tractable via the PGF, which cannot be applied to continuous-valued parameters.

Applications to sequencing data
The current framework provides quadrature-based solutions to the forward problem of PMF prediction for a broad set of transcriptional processes. More broadly, we would like to treat the inverse problem of identifying parameters from sequencing data. A wide range of statistical approaches are available; however, in practice, even the simplest, ergodic version of the the inverse problem depends on the following prerequisites: 1. Single-cell, single-molecule data for a set of cells in local equilibrium. This information permits the application of the ergodic model.

Full annotation of intermediate transcripts,
including causal relationships, such as the splicing graph and the identities of degraded molecules. This information permits mapping between experimental data and theoretical quantities.
3. Transcriptome-wide quantification of all transcripts, ideally unbiased and fully saturated -or, at the very least, imperfect quantification combined with a statistical model of sequencing. This information permits the inclusion of technical noise.
No perfect experimental solution exists. The collection of single-cell, single-molecule data is enabled by barcoding [37]. Characterization of splicing graphs has been treated in experimental [38,39] and computational [40] contexts. However, these necessarily rely on comprehensive single-molecule annotations -which distinct intron/exon combinations occur? -which have only become feasible on a genome-wide scale since the introduction of molecular barcoding. Fully saturated sequencing is infeasible due to cost, and potentially due to thermodynamic constraints. Finally, standard sequencing capture protocols are, by design, biased toward polyadenylated regions [37]. This effect has been exploited to capture natural intronic sequences [1] and synthetic antibody-conjugated oligonucleotides [41,42], but the quantitative effects of these biases are as of yet unclear. Naturally, these data quality challenges are compounded with standard statistical challenges, such as the often considerable computational expense of determining confidence regions. In spite of these challenges, we can immediately apply some of the simpler theoretical results to transcriptomic datasets. We used long-read sequencing data generated by FLT-seq (full-length transcript sequencing by sampling) [43]. In brief, the method synthesizes a cDNA library using 10X gel beads and primers, amplifies it, then applies nanopore sequencing to generate long reads with associated cell and molecule barcodes. We obtained a mouse stem cell dataset (498 barcodes), filtered for the activated subset (136 barcodes), and identified the 1000 highest-expressed genes. We used gffutils 0.10.1 to construct a database of identified intermediate and terminal isoforms, based on the accompanying annotations generated by the computational pipeline FLAMES (fulllength analysis of mutations and splicing). We used the intervaltree 2.1.0 Python package to split the transcript-specific exons into elementary intervals, then used graph tools from the NetworkX 2.5.1 Python package to identify "root" transcripts that cannot be obtained by removing regions of any other transcript. The presumed source transcript covering the entire locus was not observed for any of the genes. As their exonic patterns are mutually exclusive, we model each gene's root transcripts as products of a single rapidly processed source species, as in Section 2.6.3. The theoretical results immediately imply that the root transcripts must be distributed according to the negative binomial distribution.
Therefore, by fitting the transcript distributions, we can estimate effective burst sizes bw i and unitless efflux rates γ i /k. These marginal parameters can be plugged into the formula for Pearson correlation derived in Section 2.6.3, and compared to the empirical correlation coefficients. The theoretical correlations are to be understood as upper limits, as unobserved intermediate processing steps and technical noise inevitably reduce transcript-transcript correlations. We fit the root transcript marginal distributions using the statsmodels 0.10.2 Python package. One gene was rejected outright due to failure to construct elementary intervals. 732 transcripts were rejected due to underdispersion (mean higher than variance), as they do not possess valid maximum likelihood estimates. 1290 transcripts were rejected due to poor fit (failure to converge or excessively sparse data, with all but one rejected transcripts having average expression below 1 mRNA per cell). This analysis produced a set of 4362 transcripts and 12480 nontrivial correlation matrix entries. Theoretical and observed correlation coefficients are shown in Figure 6. 11894 of the theoretical correlations (95.3%) are higher than the observed ones, whereas 586 (4.7%) are lower. Furthermore, the theoretical correlations are clearly nontrivial; all are less than unity. The results suggest that the model is not sufficient to recapitulate the full dynamics, but does provide an effective theoretical constraint. We anticipate that the lower observed correlations emerge from a contribution of technical noise in the sequencing process, suboptimal inference from the marginals, correlationdegrading stochastic intermediates, and model misidentification. The final effect is particularly plausible for the cluster evident on the left side of the figure, as the fast-processing model cannot account for negative transcript-transcript correlations.

Results and discussion
We have described a broad extension of previous work pertaining to monomolecular reaction networks coupled to a bursty transcriptional process. In particular, by exploiting the standard properties of reaction rate equations, we have demonstrated the existence of all moments and crossmoments. Further, we have derived the analytical expressions for the generating functions and demonstrated their existence. The following expression gives the general solution for the factorialcumulant generating function: where U is a set of functions associated with the gene products, computable by the procedure in Section 2.2.3, and M is a joint generating function governing the bursting dynamics. Under the mild assumptions of finite-activity bursting and Markovian downstream processes, the expressions hold for arbitrary directed acyclic graphs of splicing and degradation, coupled to an arbitrary number of gene loci with arbitrary, and potentially correlated and time-dependent, burst dynamics. In fact, we explicitly consider the problem of modeling multiple synchronized genes and find that geometric burst size coordination is required to achieve transcript count correlations ρ > 1/2. Furthermore, we test whether negative correlations are feasible under the assumption of synchronized bursts at multiple gene loci, and find that ρ < 0 are impossible with geometric bursts, but can be achieved with negative binomial bursts. These results substantially constrain and inform the space of models that can recapitulate the combination of bursty dynamics [11] and high absolute gene-gene correlations [3,27] observed in living cells. We compared the theoretical constraints with experimental data generated by FLT-seq, a recent long-read, single-cell, single-molecule sequencing method. Investigating a set of 4362 transcripts, we found that the constraints were met for 95.3% of the transcript-transcript correlations. Nevertheless, the model was insufficient to recapitulate the precise quantitative details, suggesting that more detailed biophysical models of splicing networks and technical noise are necessary. Although we primarily focus on distributions, with an eye to inference from atemporal data, the solution is robust in several complementary dimensions. Firstly, the upper limit of the integral in Equation 2 is arbitrary, and can be evaluated up to a finite time horizon to yield a transient distribution. Secondly, the formulation of the problem presupposes m i (t = 0) = 0 for all species i. However, if nonzero molecule counts are present at t = 0, it is straightforward to compute the resulting log-PGF via φ = φ h + n i=1 m i (t = 0) ln(1 + U i ), where φ h is the result for m i (t = 0) = 0 and the U i are species-specific exponential sums. This relation produces arbitrary conditional distributions, as derived elsewhere [44]. Therefore, the current approach can be used to compute the likelihoods of entire trajectories of observations, and thus perform parameter estimation on live-cell data. The computational complexity of this procedure is O(N log N ) in state space size. However, this complexity ostensibly arises from the n-dimensional inverse Fourier transform. We expect the time complexity to be O(N ) in the practical regime, and irrelevant outside due to the space complexity of holding an array of size N in memory for the inverse Fourier transform. Curiously, this class of analytical solutions to reaction networks can be adapted to a subset of diffusion problems. General diffusion on a multidimensional lattice is not directly solvable, because it violates the assumption of acyclic graph structure. However, percolation through a directed acyclic graph coupled to a source and a set of sinks can be described using the current mathematical formalism. Further, such a percolation can represent the incremental movement of RNA polymerase along a DNA strand, integrating discrete copy number statistics with submolecular details in a single analytical framework [45,46]. We do not treat the aforementioned auxiliary degenerate solutions that arise when β i = β j for some i, j. However, Section 2.4 guarantees the existence of these solutions and all moments: the procedure does not rely on a particular functional form of ψ i , only its existence, which is implied by the test subordinator F (x) = x matching the constitutive case [14]. Finally, we briefly touch upon the class of delay chemical master equations, and survey several recent advances in the field in Section S1.1. Due to the non-Markovian nature of delayed systems, general probabilistic solutions are rare [47] and represent an open area of study. In our discussion, we motivate delays as a limit of numerous, fast isomerization processes, and clarify the challenges inherent in applying the analysis of delay CMEs to bursty systems.

Appendix: ODE solution
The solution to the generic ODE dy Using the initial condition y(0) = ξ yields K = ξ − i a i r−b i a i . Consider an efflux rate r = j β j + γ, i.e., a set of isomerization pathways and a single degradation pathway from the species in question. Care must be taken when the downstream paths converge (as in the case of the two-intron system): duplicate terms in product characteristics U j need to be aggregated, with j β j U j rewritten as i a i e −b i x . This is computationally straightforward to do by choosing appropriate data structures, as with the matrix A in the implementation of the DAG algorithm.

Code availability
Python notebooks that can be used to reproduce Figures 5 and 6 are available at https://github. com/pachterlab/GP_2021_2. S1 Supplementary Note S1.1 Delay chemical master equations In the current supplement, we detour from the Markovian framework to consider delay systems, which have deterministic, rather than stochastic, state transitions. Certain degenerate cases -for example, the problem of incremental, linear movement with identical transition rates -directly bear upon the class of delay chemical master equations (DCMEs). As an example, we can model the simple linear chain of reactions with constitutive production the total delay between production of T 1 and degradation of T n is Erlang-distributed, with shape n and rate β. As n → ∞, the Erlang distribution reduces to a point mass at β −1 := τ . This implies that we can treat an aggregated species T , produced at rate k and degraded after a deterministic delay τ : This is precisely the "linear chain trick" introduced by MacDonald in 1978 [48]. The study of delayed dynamical systems, such as delay differential equations, dates back to the eighteenth century [49], with cornerstone biological models by Lotka and Volterra [48,49]. Recent work has focused on developing exact solutions [47,50,51] and simulation methods [52,53]. In particular, studies by Lafuerza and Toral [54,55] report full analytical solutions for constitutive systems with isomerization, while a contemporary study by Jia and Kulkarni [56] reports lower moments for a system with bursty mRNA production and catalysis. Unfortunately, applying these methods to bursty systems is challenging, and all but the simplest problems are intractable. As an illustration, we consider the constitutive two-stage system described by by Lafuerza and Toral [54], and discuss the challenges of extending it to include bursty production. If we assume that no stochastic degradation reactions occur, the reaction equations and generating function relations take the following form: where G * is a conditional generating function for an auxiliary non-degrading system, initialized at m 1 − 1 molecules of the parent transcript T 1 . This auxiliary system has no degradation reactions, and allows us to incorporate the non-Markovian effects of delays. Assuming constitutive production, and using the shifted variables u i for convenience, we find: of the auxiliary systems match: which follows from the derivation of the PGF of the nascent marginal. Now, considering the full generating function relation: This PDE is not easily tractable by standard analytical or numerical methods. The form of the equation is rather complicated and not amenable to analysis by characteristics. In principle, a numerical PDE or ODE solver can be used: we may fix u 2 and solve for G(u 1 , u 2 ) over a mesh of u 1 . By repeating this for many values of u 2 , we can compute the Fourier transform of the joint distribution. However, this requires solvers that can integrate over the complex plane, as well as initial conditions G(0, u 2 ) for each u 2 . These are the very values we seek, so even numerical approaches require some ingenuity. In short, the stochastically delayed systems reduce to deterministically delayed systems in some well-studied regimes. However, in spite of the formal connection between the CME and the DCME, the former is far simpler to analyze: the DCME is non-Markovian, and generally resistant to exact analysis. Although much recent progress has been made, regulated transcriptional systems do not yet have full probabilistic solutions.