Protein Copy Number Distributions for a Self-Regulating Gene in the Presence of Decoy Binding Sites

A single transcription factor may interact with a multitude of targets on the genome, some of which are at gene promoters, others being part of DNA repeat elements. Being sequestered at binding sites, protein molecules can be prevented from partaking in other pathways, specifically, from regulating the expression of the very gene that encodes them. Acting as decoys at the expense of the autoregulatory loop, the binding sites can have a profound impact on protein abundance—on its mean as well as on its cell-to-cell variability. In order to quantify this impact, we study in this paper a mathematical model for pulsatile expression of a transcription factor that autoregulates its expression and interacts with decoys. We determine the exact stationary distribution for protein abundance at the single-cell level, showing that in the case of non-cooperative positive autoregulation, the distribution can be bimodal, possessing a basal expression mode and a distinct, up-regulated, mode. Bimodal protein distributions are more feasible if the rate of degradation is the same irrespective of whether protein is bound or not. Contrastingly, the presence of decoy binding sites which protect the protein from degradation reduces the availability of the bimodal scenario.


Master equation
The probability density function p(x, t) of observing the protein at concentration x at time t satisfies the master equation (1) where the dependences on the total protein level x of the decay rate c(x) and burst rate a(x) are given by in which gives the free protein level x f as a function of the total protein level x, cf. Section 2 in the Main Text. The stationary solution to (1) has been determined, using the Laplace transformation of the equation, for linear degradation rate (Friedman et al., 2006) and for a generic non-linear rate c(x) (Mackey et al., 2013). For reader's convenience, we re-derive these results using a different method. Following that, the generic formula needs to be algebraically simplified to obtain a closed-form expression for the stationary distribution with our specific choices of a(x) and c(x) in mind.
Equation (1) can be written down in the form of where the probability flux term is given by For the stationary solution p(x, t) = p(x) both the flux J and its derivative dJ/dx must vanish, so that Dividing (6) by b and adding the resulting equation to (7) yields from which where κ is a normalisation constant; such a form has been disclosed in previous studies (Friedman et al., 2006;Mackey et al., 2013). The convergence, as time t increases, of time-dependent solutions to the master equation (1) to the stationary distribution (9), in the L 1 sense, has been established in Mackey et al. (2013) under mild conditions on the functional form of a(x) and c(x). Specifically, Mackey et al. (2013) for a small δ > 0, of which the former guarantees that the deterministic decay process does not lead to a (macroscopic) extinction, while the latter ensures that the waiting time for the next burst be finite. For our choices of a(x) and c(x), conditions (10) hold, since a(x) is bound from below by a 0 > 0 and c(x) is asymptotically linear for x small. In addition to (10), Mackey et al. (2013) require that the integral of (9) is finite and also that the mean decay rate ∞ 0 c(x)p(x)dx be finite, both of which hold too for our choices of a(x) and c(x) due to asymptotic linearity of c(x) for x very small or very large and also owing to boundedness of a(x) from below as well as above, by positive constants.

Explicit steady-state solution
In this section we simplify the general formula (9) for stationary distribution, making use of the specific properties of our choices for the transcription and decay rates (2). We assume that time is measured in the units of free protein lifetime, i.e. that γ f = 1. By (2), we find that Therefore, the integral in the exponential of (9) simplifies to where dx/dx f is determined by differentiating which expresses the total protein level x in terms of the free protein level x f , cf. Section 2 in the Main Text. The integration of the rational fraction in (12) is done by partial fraction decomposition using software capable of symbolic calculation, and the coefficients multiplying the individual partial fractions are Substituting (12) into (9), we find that the total protein concentration x is given by the closed expression in which x f is understood to be a function of x, as given by expression (3). The formula (18) is valid as long as some special cases, namely that of k p = k b , or k p = k b + γ b y, or γ b y = 0, are avoided; should however any of these occur, the above integration procedure can easily be modified accordingly to obtain a valid result, as detailed below.

Special case γ b = 0
In this case, we have where Substituting (19) into (9), we arrive at where κ is the normalisation constant.

Special case y = 0
If y = 0, then (13) implies that x = x f , and hence Substituting (25) into (9), we find where κ is the normalisation constant. This result can also be found in previous studies (Friedman et al., 2006;Mackey et al., 2013), which consider the model for burst-like gene expression in the absence of decoy binding sites.

Special case k
where Substituting (27) into (9), we find where κ is the normalisation constant.