Self-consistent theory of transcriptional control in complex regulatory architectures

Individual regulatory proteins are typically charged with the simultaneous regulation of a battery of different genes. As a result, when one of these proteins is limiting, competitive effects have a significant impact on the transcriptional response of the regulated genes. Here we present a general framework for the analysis of any generic regulatory architecture that accounts for the competitive effects of the regulatory environment by isolating these effects into an effective concentration parameter. These predictions are formulated using the grand-canonical ensemble of statistical mechanics and the fold-change in gene expression is predicted as a function of the number of transcription factors, the strength of interactions between the transcription factors and their DNA binding sites, and the effective concentration of the transcription factor. The effective concentration is set by the transcription factor interactions with competing binding sites within the cell and is determined self-consistently. Using this approach, we analyze regulatory architectures in the grand-canonical ensemble ranging from simple repression and simple activation to scenarios that include repression mediated by DNA looping of distal regulatory sites. It is demonstrated that all the canonical expressions previously derived in the case of an isolated, non-competing gene, can be generalised by a simple substitution to their grand canonical counterpart, which allows for simple intuitive incorporation of the influence of multiple competing transcription factor binding sites. As an example of the strength of this approach, we build on these results to present an analytical description of transcriptional regulation of the lac operon.

Canonical ensemble We start in the canonical ensemble, where we consider a single gene that can bind RNAP (of which there are P molecules in the cell), and a number of transcription factors A, B, . . . of copy number A, B, . . . respectively. We set the effective energy of non-specific sites to 0 and consider only the binding energies of RNAP and transcription factors to their specific sites on the DNA, and interaction energies between specifically bound RNAP and transcription factors.
We do not explicitly specify the number of operator sites a transcription factor has on a specific gene, it can be 0, 1 or more. If a gene has more than one site for a single transcription factor, then of course there are multiple possibilities of binding the transcription factors to these sites with the same occupation numbers. We therefore define Z(p, a, b, . . . ) as the sum of the Boltzmann-factors exp(−β i (p, a, b, . . . )) for each adsorption state i that has p, a, b, . . . number of molecules of RNAP, A, B, . . . bound specifically to the gene, respectively. The RNAP molecules and transcription factors that are not bound to a specific site are distributed over the non-specific sites of the DNA. The number of ways to distribute these molecules over the non-specific DNA sites is given by the multinomial coefficient When a transcription factor binds to a specific site in the gene, it is removed from the non-specific sites, so we remove it from the multinomial factor. For example, if a single molecule of A binds to its specific site, the number of ways the remainder of RNAP and transcription factor molecules can be distributed over the non-specific DNA is given by The total weight of the configuration state that has a single molecule of A bound to the gene is then given by the product of the multinomial factor and Z. Thus, We are not interested in the internal degrees of freedom the different species in the system have -these do not change upon specific binding of RNAP or transcription factors to the gene, and therefore only attribute a constant factor in the partition function of the system. The only configurational states that we are interested in, are those states that differ in the number of RNAP, A, B, . . . bound specifically to the gene. We find the total effective partition function of the genome by summing Z state over all these states consistent with p, a, b, . . . number of molecules of RNAP, A, B, . . . bound specifically.
It will turn out to be useful to isolate the occupation numbers p, a, b, . . . from the multinomial factor, so that it becomes a constant that depends only on the total number of molecules. We do this by considering the definition of the multinomial coefficient eq. (S.12) The first factor is now a multinomial coefficient that is constant and depends only on the total number of RNAP and transcription factors. The second factor still depends on the occupation numbers p, a, b, . . . , but when N ns is sufficiently large, that is, when we can apply the following approximation.
We have now removed the multinomial coefficient from the sum and grouped all factors that are related to RNAP, A, B, . . . . This expression for the canonical partition function shows us that the weight of a specific configurational state is given by the Boltzmann weight of the energy of the state, multiplied by a corrective factor that takes into account the redistribution of the remaining molecules on the DNA, and that this corrective factor behaves as an effective available concentration per specifically adsorbed molecule. We will discuss this role later on.
To find the occupation number of RNAP bound to the promoter site, we will explicitly write out the first sum as The occupation number of RNAP being bound to the promoter can then be written as where the multinomial factor was a common factor in both the denominator and enumerator, and cancels out. This expression is the main canonical result, and we will see that the equivalent equation in the grand-canonical ensemble has an identical form. For now, we can continue along this path and derive a general expression for the canonical fold-change. For this, it is easiest to introduce a shorthand where x P = exp −β P , and the superscript c denotes that this is the canonical result. Using the shorthand, we can write The fold-change is equal to θ P (P, A, B, . . . )/θ P (P, 0, 0, . . . ). When A = B = · · · = 0, we can see from eq. (S.21) that Σ c P = Z(1, 0, 0, . . . )/x P = 1 and Σ c 0 = Z(0, 0, 0, . . . ) = 1. Consequently, The approximated expression is valid in the weak promoter limit. The fold-change is then found by dividing eq. (S.22) by eq. (S.23).
We make one further approximation, namely In the case of repressive regulatory scenarios, the fraction Σ c 0 /Σ c P > 1, which means that this condition is already taken care of by the weak promoter limit that we imposed in eq. (S.23). For all activating scenarios, eq. (S.25) will still work, provided we can assume that P x P /N ns Σ c 0 /Σ c P , which is in those cases not automatically taken care of by the weak promoter limit. As discussed above, the fact that Σ c 0 /Σ c P 1/fold-change, we can use the fold-change as a convenient tool to verify this assumption a posteriori.
Grand-canonical ensemble We now turn to the grand-canonical ensemble. We now consider the situation in one of the N gene copies, and the non-specific sites and competing sites are included as additional reservoirs with which our system is in contact. If the gene is present at higher copy number, then there are simply multiple independent copies of this system. The gene copies are decoupled from each other and the rest of the genome, thus effectively eliminating the constraint on the total number of RNAP, A, B, . . . . Consequently, the weight of each state isn't dependent on the combinatorial problem of how to distribute the remaining transcription factors over the non-specific DNA, but on λ = exp(βµ) of each species. The factor λ will mathematically act as a Lagrange multiplier for the constraint on the total number of molecules, but also has the physical meaning of fugacity or activity, being an effective concentration.
The grand canonical partition function of a single gene is given by can immediately see the similarities between the two expressions. In both ensembles, we sum over the different occupation numbers of RNAP and transcription factors bound specifically to the gene, and in both cases the weight of each state is given by the product of the Boltzmann factors and an expression that acts as an effective available concentration per specifically adsorbed molecule. Of course, the canonical expression also has a multinomial coefficient that is absent from the grand-canonical expression, since the grand-canonical system is decoupled from the rest of the genome. To find the grand-canonical occupation number of RNAP bound to the promoter, we will also write out the first sum explicitly.
We can write down the occupation number of RNAP bound to the promoter. .
(S.28) By comparing eqs. (S.20) and (S.28), we see that the expressions for θ P in both ensembles are equimorphous when we make the substitutions for all involved species X where X = RNAP, A, B, . . . . Explicitly, this means that we can use the following substitutions for different powers of λ X From the equivalence of the expressions for θ P in the canonical and grand-canonical ensemble, we see that we can go from the expression in one ensemble to the other ensemble using the substitutions in eq. (S.29). Otherwise, the expressions are completely identical, regardless of the regulatory architecture.
We continue to find a general expression for fold-change in the grand-canonical ensemble, that are comparable to their canonical analogs. We introduce the shorthands where x P = exp(−β P ) and the superscript gc denotes that this is the grand-canonical result. These can be used to write eq. (S.28) as In the absence of transcription factors, the expression for θ P becomes really simple To calculate the fold-change, we divide the two and obtain Here we have made essentially the same assumption as in eq. (S.25), valid for repressive scenarios in the weak promoter limit, while for activating scenarios it becomes the strictest assumption.
The canonical and grand canonical expressions for the fold-change, eqs. (S.25) and (S.34), both have the same dependence on Σ c,gc P , Σ c,gc 0 , and we can see from their definitions, eqs. (S.21) and (S.31) that the only difference between the canonical and grand-canonical fold-change is the substitution of X!/(X − x)!N x ns in the canonical result for λ x in the grand-canonical result, for all involved molecules RNAP, A, B, . . . that have a binding site on the gene.
As an example, for simple repression, the canonical Σ c 0 , Σ c P are given by The canonical fold-change is then given by Σ c P /Σ c 0 = (1 + (R/N ns )x R ) −1 , as was determined earlier [1]. In the grand canonical ensemble, we have which leads to the fold-change, in the form derived earlier in this article, Σ gc P /Σ gc Ensemble equivalence In actual cells the number of transcription factors can be as small as ten. With such small numbers, ensemble equivalence is an issue and we address it here. While the two ensembles are not identical, we see that the canonical and grand-canonical expressions for θ P , as well as for the fold-change have essentially the same form, where eq. (S.29) identifies the substitutions that make the expressions equal. In the thermodynamic limit, when X 1 for a species X = RNAP, A, B, . . . (but X N ns ), we see that the canonical expression in eq. (S.29) simplifies to (1 X N ns ) (S.37) Here, X is the number of molecules of species X = RNAP, A, B, . . . , with x the number of X adsorbed to the gene in the state we're interested in. For the grand-canonical ensemble we first consider the reservoir of non-specific sites. The expected number of molecules of X bound to a non-specific site X is given by X = N ns p bound,ns = N ns λ X 1 + λ X N ns λ X , (λ X 1) (S.38) as we set the binding energy of non-specific sites to 0. Rewriting λ X = X N ns (S.39) When X is sufficiently large, the average number of X bound to non-specific sites becomes equal to the total number of X in the cell. In this limit, we can see that the substitution in eq. (S.29) becomes exact and that (Thermodynamic limit) (S.40)