Thermodynamic State Ensemble Models of cis-Regulation

A major goal in computational biology is to develop models that accurately predict a gene's expression from its surrounding regulatory DNA. Here we present one class of such models, thermodynamic state ensemble models. We describe the biochemical derivation of the thermodynamic framework in simple terms, and lay out the mathematical components that comprise each model. These components include (1) the possible states of a promoter, where a state is defined as a particular arrangement of transcription factors bound to a DNA promoter, (2) the binding constants that describe the affinity of the protein–protein and protein–DNA interactions that occur in each state, and (3) whether each state is capable of transcribing. Using these components, we demonstrate how to compute a cis-regulatory function that encodes the probability of a promoter being active. Our intention is to provide enough detail so that readers with little background in thermodynamics can compose their own cis-regulatory functions. To facilitate this goal, we also describe a matrix form of the model that can be easily coded in any programming language. This formalism has great flexibility, which we show by illustrating how phenomena such as competition between transcription factors and cooperativity are readily incorporated into these models. Using this framework, we also demonstrate that Michaelis-like functions, another class of cis-regulatory models, are a subset of the thermodynamic framework with specific assumptions. By recasting Michaelis-like functions as thermodynamic functions, we emphasize the relationship between these models and delineate the specific circumstances representable by each approach. Application of thermodynamic state ensemble models is likely to be an important tool in unraveling the physical basis of combinatorial cis-regulation and in generating formalisms that accurately predict gene expression from DNA sequence.


Alternate cis-Regulatory Function Forms
Recall that the cis-regulatory functions used throughout the paper employ free concentrations of each biochemical species, and are thus referred to as "free species" form (Eq. 1). This equation is shown in the main text as Eq. 5.

Free Energy Form
Because K P is a reflection of the Gibbs free energy of an interaction one may employ the fundamental relation between association constants and Gibbs free energy (Eq. 2) to rewrite Eq. 1 in terms of ∆G [1]. Solving for K P and substituting the result into Eq. 1 yields Eq. 3.
Written in this form, the dependence of K P on temperature is laid bare, in turn highlighting the necessity of completing experiments under isothermal conditions. The relationship between temperature, Gibbs free energy, and enthalpy can also be leveraged to determine K P using calorimetry and other in vitro methods [2,3].
Use of the association constant and corresponding free energy to describe binding of proteins to a single DNA site tacitly assumes that both are freely diffusing molecules at thermodynamic equilibrium. In actuality, the promoter we designate as "DNA" is a single rigid molecule in each cell, and transcription factors diffuse both freely and in one dimensional walks along DNA (see review, [4]). Ackers justifies the equilibrium assumption by arguing that the protein binding reactions occur on a much faster time scale than transcription and translation [1,5]. Thus, while the association constants are "apparent," and may not be the same as that measured in an in vitro system, they do have a physical interpretation and reflect important properties of the promoters being modeled.

State Form
It is often experimentally challenging to disentangle the contributions of the free species from their apparent association constants. In an experiment where one of these components is the independent variable-for example, if the sequence of a transcription factor binding site were modified-one may be able to discern individual components' contributions. In most other circumstances it is convenient to introduce a simplification where K p and the concentrations of TFs are combined into a new variable q [6,7]. In our basal promoter example we have only two states, free DNA represented by the 1, and one additional state. Thus, we set q 1 = K P [RN AP ] and Eq. 1 simplifies to Eq. 4, which contains only one unknown variable.
Fitting this model does not allow one to separate the effects of the binding constants from the effects of the concentrations of transcription factors. However this simplification of the model has fewer free parameters to fit and still allows for a physical interpretation of how a particular cis-regulatory system works.

Michaelis-like Functions with Basal Leak
To reconcile basal expression ambiguity, some groups [8][9][10] introduce a basal leak term, λ, to this formulation (Eq. 5) such that basal transcription occurs even in the absence of transcription factors.
A second new term, c, is also needed to scale the contribution of the transcriptional regulators, such that the maximum transcription rate, k t , is given by Eq. 6 [8][9][10].
As with the traditional Michaelis-like functions, the leak function can also be reconstituted as a thermodynamic model. Consider a basal promoter in the Michaelis-like leak formulation. In a basal promoter, transcription would not be influenced by an activator, for example (c = 0). Thus, production simplifies to λ [9]: The production term for a basal promoter in the thermodynamic formulation is taken from Eq. 1.
Then, setting these equal to each other, Substituting this definition of λ with Eq. 9 into a one activator model results in Eq. 10.
Solving for c in Eq. 6 and substituting that into Eq. 10 results in Eq. 11 after simplification.
Recalling that the numerator of a thermodynamic model is a sum of terms that encode states capable of transcribing while the denominator sums terms for all possible states, Eq. 11 shows that the leak model allows all states to transcribe except the state "1", where DNA is unbound (see Eq. 1). This includes the state where activator is bound and polymerase is not bound, represented by the θ A [A] summand. When using a Michaelis-leak function one should be aware that this formulation implies that activators boost transcription even in the absence of RNA polymerase.

Oligomerization with Hill Functions
Hill-modified Michaelis-like functions (Eqs. 12 and 13 correspond to Eqs. 26 and 27 respectively in the main text): A specific interpretation of the θ A parameter allows Hill coefficients to encode oligomerization, an application of extreme cooperativity [9]. Using the thermodynamic model, we will demonstrate the relationship between integer-valued Hill functions and the corresponding oligomeric state of a DNA binding protein. Our example is for n = 2 (dimers) but holds for higher integer n values.
Consider a promoter with a single binding site for a dimer activator. Written in the same way as Eq. 1, the thermodynamic model for this situation is modeled in Eq. 14, where we make the Michaelis framework assumption that activator is absolutely required for transcription: where A 2 represents an activator dimer. Dimer binds to free DNA with an association constant K A2 , and dimer assembles from monomers with dimerization constant L A : Binding of polymerase and dimer is modeled with a macroscopic binding constant β AP , which equals the right hand side of Eq. 17 in terms of stepwise equilibrium constants: We showed previously how the [RN AP ·DN A] term becomes K P [RN AP ][DN A] (see Eq. 1). Making the appropriate substitutions into Eq. 14, we resolve the complete thermodynamic model for a dimer activator: The polymerase binding term can again be factored out.
The right-most term is the basal promoter function and the term on the left is the activator function which we can again compare directly to Eq. 12. By setting the right-hand-side of Eq. 19 equal to Eq. 12, we see that θ 2 A = K A2 L A , the product of the dimerization constant and the dimer-DNA association constant. Thus, while θ A has no particular meaning, θ 2 A is a macroscopic binding constant that describes the process of dimerization and DNA association. This exercise further emphasizes that the Hill/Michaelis-like functions are thermodynamic models that operate under several key assumptions.