Design Principles for Ligand-Sensing, Conformation-Switching Ribozymes

Nucleic acid sensor elements are proving increasingly useful in biotechnology and biomedical applications. A number of ligand-sensing, conformational-switching ribozymes (also known as allosteric ribozymes or aptazymes) have been generated by some combination of directed evolution or rational design. Such sensor elements typically fuse a molecular recognition domain (aptamer) with a catalytic signal generator (ribozyme). Although the rational design of aptazymes has begun to be explored, the relationships between the thermodynamics of aptazyme conformational changes and aptazyme performance in vitro and in vivo have not been examined in a quantitative framework. We have therefore developed a quantitative and predictive model for aptazymes as biosensors in vitro and as riboswitches in vivo. In the process, we have identified key relationships (or dimensionless parameters) that dictate aptazyme performance, and in consequence, established equations for precisely engineering aptazyme function. In particular, our analysis quantifies the intrinsic trade-off between ligand sensitivity and the dynamic range of activity. We were also able to determine how in vivo parameters, such as mRNA degradation rates, impact the design and function of aptazymes when used as riboswitches. Using this theoretical framework we were able to achieve quantitative agreement between our models and published data. In consequence, we are able to suggest experimental guidelines for quantitatively predicting the performance of aptazyme-based riboswitches. By identifying factors that limit the performance of previously published systems we were able to generate immediately testable hypotheses for their improvement. The robust theoretical framework and identified optimization parameters should now enable the precision design of aptazymes for biotechnological and clinical applications.


Introduction
Nucleic acid binding species (aptamers) have emerged as a powerful tool for molecular recognition, and have begun to be widely adapted as biosensors, in drug-delivery systems, and as regulatory elements that control gene expression [1][2][3][4]. Naturally occurring nucleic acid regulatory elements, riboswitches, have been discovered in a variety of organisms and control the expression of a wide range of genes [5].
One of the major advantages of aptamers over their protein counterparts is that they can be easily coupled to other functional RNAs based largely on secondary structural considerations in order to generate allosteric constructs. To a large extent aptamerbased biosensors (both in vitro and in vivo) can be classified into two major categories: (i) those in which the aptamer binding influences the hybridization state of other nucleic acids (for in vitro examples see [6,7]; for in vivo examples, see [8]), and (ii) those in which aptamer binding influences the catalysis of a ribozyme (for in vitro examples, see [9][10][11]; for in vivo examples, see [12][13][14][15]. These allosteric ribozymes derived from aptamers are also known as aptazymes. While there are numerous empirical examples of aptazymes operating as biosensors and regulatory elements, quantitative analyses of aptazyme performance and the development of design principles for aptazymes have seldom been attempted and are largely incomplete [10,16]. Recently, Beisel and Smolke developed a similar model for riboswitch function [16]. However, only qualitative trends were reported. For example, while it was concluded that ''a design that is biased toward forming the disrupted-aptamer conformation will generally increase the dynamic range …(but) require higher ligand concentrations to modulate protein level,'' the more useful quantitative relationship between dynamic range of activity and ligand sensitivity that should enable rational design was not described. Similarly, the impact of fundamental kinetic parameters such as the ribozyme cleavage rate constant and mRNA degradation rate constant on the behavior of riboswitches was not analyzed. Additionally, those numerical solutions that were given were based on arbitrary parameters. For all of these reasons it is unclear what parameters need to be measured for the quantitative prediction of riboswitch function. It is also unclear how and to what extent the parameters can be optimized for improved function.
To establish a better quantitative understanding of aptazymebased biosensors and riboswitches, we analyze a two-state model for aptazyme function and illustrate: (i) the quantitative relationship between the dynamic range of activity and ligand sensitivity; (ii) the variables that limit aptazyme function; (iii) the minimal set of readily measurable parameters that are necessary and sufficient to quantitatively predict aptazyme function; and (iv) strategies to design optimal aptazyme-based biosensors for both in vitro and in vivo applications. In addition, we apply this model to published data for a previously engineered riboswitch system [14] and show that this system is severely limited both by slow ribozyme cleavage relative to mRNA degradation and likely by the intracellular concentration of theophylline.

Schemes for the design of aptazymes
The ability to predict the secondary structure of functional RNA molecules has made it possible to rationally design allosteric ribozymes. Aptamer secondary structures are superimposed upon or swapped with portions of ribozyme secondary structures ( Figure 1A), and interactions between the two domains are often controlled by junction sequences (so-called communication modules). One commonly used strategy to design ligand-activated aptazymes can be described as 'binding assisted stem-formation' ( Figure 1B) in which a weak but functionally important stem that is shared by the aptamer and the ribozyme is stabilized by ligandbinding [12,13]. Other design strategies include 'slip structures' ( Figure 1C; [9]) and 'strand replacement' ( Figure 1D; [14,15]). In these latter strategies the ligand-induced stabilization of the aptamer helix causes a conformational change in the secondary structure of the ribozyme that either promotes or inhibits catalysis. Taken together, all of these strategies assume a twostate model for the aptazyme in which one of the states is stabilized by ligand-binding.
To garner better insights into how to design aptazymes, we will attempt to model the interrelationships between aptazyme conformational change, ligand-binding, and catalysis. In this way we can separate intrinsic variables (including the aptamer:ligand affinity and the ribozyme catalytic rate constant) from extrinsic or 'engineerable' variables (including the equilibrium constant between the two conformers). While the catalytic rates of the less active conformer and the more active conformer are also extrinsic variables, they should almost always be minimized (to zero if possible) and maximized (to the rate of the ribozyme sans aptamer if possible), respectively. For simplicity, we develop our analyses with self-cleaving aptazymes, but the model should be generalizable to aptazymes with other catalytic activities.

A model for ligand-activated and ligand-inhibited aptazymes as in vitro biosensors
The general model for ligand-modulated ribozymes is similar to that for allosteric protein enzymes ( Figure 2A). In this model, the aptazyme can assume two interchangeable conformations A and B with internal equilibrium constant K int (see Text S3 for a summary of terms), each of which has particular (but different) ligand-binding affinities defined by association constants K a(A) and K a(B) , respectively, and particular (but different) cleavage activities defined by cleavage rate constants k Cle(A) and k Cle(B) , respectively. Since in most cases it is the local structure of the catalytic core (as opposed to the ligand-binding site) that determines the catalytic activity of the aptazyme, it is assumed that the aptazyme-ligand complexes AL and BL have the same cleavage rate constants as the unbound aptazymes A and B, respectively. Furthermore, we only consider the situation where all four species (A, B, AL and BL) are in equilibrium at the start of the reaction. When the conformer that possesses higher ligand-binding affinity also has higher catalytic activity the aptazyme is called ligand-activated; when the conformer that possesses higher binding affinity has lower catalytic activity the aptazyme is called ligand-inhibited. In general we assign conformation B to have the higher ligand-binding affinity (K a(B) .K a(A) ). Thus, ligand can be thought to thermodynamically shift the A conformer towards B. Since most two-state models for allosterism assume that the ligand primarily influences the population of catalytically inactive and active conformations, we assume that the less catalytically active conformer has zero activity and the more catalytically active conformer has the same cleavage rate constant as the ribozyme sans aptamer (denoted as k Cle ). Formally, k Cle(A)~0 and k Cle(B)~kCle for ligand-activated aptazymes

Author Summary
Aptamers are nucleic acids that bind their cognate ligands (ranging from metal ions to small molecules to proteins) specifically and tightly. Through rational design and/or directed evolution, aptamers can be engineered into allosteric nucleic acids whose conformations can be regulated by their ligands. Aptamer beacons, aptazymes, and riboswitches all undergo ligand-dependent conformational changes, and have been adapted to signal the concentration of their ligands. However, there is currently no model that can be used to predict how the energetics of conformational change affects signaling, either in vitro or in vivo. We have developed a model that identifies what parameters can be optimized to best yield signals. By focusing on these parameters, it should be possible to more readily design or select more effective conformationswitching nucleic acid biosensors. and k Cle(A)~kCle and k Cle(B)~0 for ligand-inhibited aptazymes: Another simplifying assumption is that the complex BL is much more thermodynamically stable than AL, and thus we can ignore the existence of AL and reduce the model to the path outlined in green in Figure 2A. This reduced model assumes that the A conformer must spontaneously refold into the B conformer in order to bind the ligand and thus excludes ligand-induced refolding of the aptazyme. This reduction is valid when two conditions are met: (i) the energy barrier between A and B is not much higher than that between AL and BL, so that aptazyme refolding does not rely on the ligand as a catalyst; and (ii) when the aptazyme is bound to the ligand the aptazyme almost exclusively assumes the BL conformation. We will use this reduced model in the following analyses.
The in vitro performance of a self-cleaving aptazyme is usually evaluated by plotting the first-order apparent cleavage rate constant (k app ; the initial cleavage rate divided by the total concentration of aptazyme) against the total ligand concentration ([L tot ]). As a starting point of our model, we show how k app , which is in fact contributed to by all three aptazyme conformations, is determined by the variables shown in Figure 2A.
Assuming that ligand-binding is much faster than aptazyme cleavage ([L]k on(B) + k off(B) &k Cle(B) ) the initial cleavage rate constant should directly reflect the initial fraction of each of the three If ligand-binding is slow relative to cleavage, the apparent rate constant would reflect the rate of binding (the rate limiting step) instead of cleavage. Based on the assigned definitions for parameters (see Text S1 for derivation) the fraction of A can be calculated to be: and the total fractions of B and BL are: where L Ã is relative ligand concentration, defined as the ligand concentration divided by the dissociation constant (K d or ) of the aptamer domain. The introduction of relative ligand concentration means that K d is only a scaling factor for ligand concentration. In other words, two aptazymes with the same K int but different K d values would be indistinguishable in terms of their performance with respect to relative ligand concentrations.
In the absence of ligand, f B+BL and f A equal to K int 1zK int and 1 1zK int , respectively. Thus the ratios K int 1zK int and 1 1zK int are the fraction of cleavage-competent conformers in the absence of ligand for ligand-activated aptazymes and ligand-inhibited aptazymes, respectively. We term these ratios 'cleavage tendency' and denote them as v. Formally: v~K int 1zK int for ligand-activated aptazymes and v~1 1zK int for ligand-inhibited aptazymes: It should be noted that the relationship between v and K int is dependent on the type of the aptazyme (ligand-activated or ligandinhibited). When the aptazyme type is specified, v can be used interchangeably with K int . Since in many cases the equations are in simpler form when v is used instead of K int , we will primarily use v in the following derivations and analyses. From equations (1,3) and earlier assumptions, the relationship between k app and the relative ligand concentration L Ã are: for ligand-activated aptazymes, and: for ligand-inhibited aptazymes.
Design principles for ligand-activated aptazymes as in vitro biosensors The k app -vs-L Ã curve is an increasing hyperbola for ligandactivated aptazymes. The relationship between the parameters that describe the hyperbola (highest value, lowest value, and halfvalue concentration) and the model parameters (v and k Cle ) can be determined by rewriting equation (4) as: where k app(min) and k app(max) are the minimal and maximum apparent cleavage rate constants. These rate constants are reached in the absence of ligand and at a saturating concentration of ligand, respectively. EC Ã 50 is the relative ligand concentration at which the k app is half-way between k app(min) and k app(max) . As a result: According to the definition of relative concentration, EC Ã 50 is dimensionless and scales relative to the K d of the aptamer domain, the absolute EC 50 (with unit of a concentration) can be calculated with the equation: It is noteworthy that EC 50 is often regarded as the 'apparent K d of the aptazyme' and can be confused with K d . In fact, the K d is an intrinsic variable reflecting the affinity between the aptamer and the ligand, while EC 50 is design-dependent. From equation (8) it can be seen that EC 50 is always greater than K d and is inversely correlated with v, since the ligand binding-competent conformation B is only a fraction of the total aptazyme population and a smaller v means this conformation is proportionately disfavored.
In addition to EC 50 , another important parameter for describing the performance of a ligand-activated aptazyme is the fold-activation of the cleavage rate constant when ligand concentration increases from 0 to infinite. We denote this foldactivation as g R, Theoretical which is defined as: Comparing equations (6) and (7) it is obvious that for ligand-activated aptazymes, which means that the maximum fold-activation is solely determined by the cleavage tendency of the aptazymes. In order to engineer aptazyme that have a higher g R, Theoretical , one must minimize v, i.e. the cleavage-competent conformation should be disfavored in the absence of ligand. For example to achieve a .10 2 -fold activation in the presence of ligan v should also be no greater than 10 22 , which in turn means that the free energy of conformation A should be disfavored by at least 2.8 kcal/mole (at 37uC) relative to conformation B. However, a low value of v would also increase the concentration of ligand that was required to fully activate the aptazyme. This can be seen by comparing equations (8) and (9), yielding: In other words, high sensitivity (low EC 50 ) and a large dynamic range of k app (high g R, Theoretical ) cannot be obtained simultaneously ( Figure 2B). Conversely, if an aptazyme displays a mediocre EC 50 and also has a large fold-activation it can be inferred that the aptamer domain may actually have a very high affinity for its ligand. For example, a lysozyme-dependent L1-ligase previously selected by Robertson and Ellington [11] exhibits an EC 50 of 1.5 mM but has a 3100-fold activation in the presence of saturating concentration of ligand (which means g R, Theoretical $3100). According to equation (10), the aptamer domain of this aptazyme may have a K d as low as 500 pM.
To reach the full theoretical dynamic range of k app , the ligand concentration should vary from 0 to infinite, which is of course impossible. The upper limit of the realistic dynamic range of k app for a ligand-activated aptazyme is determined by the k app at the highest possible concentration of ligand. Therefore, when designing aptazymes it is important to consider the fold-activation of the cleavage rate constant when ligand concentration increases from 0 to its highest possible concentration. We denote this foldactivation as g R, Realistic and formally define it as: where the L Ã max is the highest possible relative ligand concentration.
Since decreasing cleavage tendency is a double-edged sword in that it increases g R, Theoretical but at the same time requires higher ligand concentration to achieve half activation, it is important to find the cleavage tendency that gives optimal aptazyme performance (the highest g R, Realistic ). To find the optimal cleavage tendency, it is useful to determine the explicit expression of g R, Realistic as a function of v, which is: Interestingly, from this equation it is clear that for any L Ã max .0, g R, Realistic increases monotonically as the cleavage tendency v decreases, as shown in ( Figure 2C, left panel). In other words, it is always beneficial to have a lower cleavage tendency when the goal is to design the aptazyme to maximize g R, Realistic .
Practically, the only negative effect of engineering small cleavage tendencies in aptazymes is that the absolute value of k app L Ã~LÃ max À Á is small, and thus the rate of cleavage and signal generated by the aptazyme may be small. Therefore, as a practical guideline for designing ligand-activated aptazymes as in vitro biosensors the cleavage tendency should be minimized as long as the value k app L Ã~LÃ max À Á still falls within a range that is readily detected by a given assay.

Design principles for ligand-inhibited aptazymes as in vitro biosensors
The k app -vs-L Ã curve for a ligand-inhibited aptazyme is a decreasing hyperbola, whose descriptor can be solved by rearranging equation (5) to the form: yielding: Here the definition of the fold-inhibition over the theoretical dynamic range of k app (g R, Theoretical ) is problematic since the theoretical lower limit of k app is 0 and therefore g R, Theoretical for a ligand-inhibited aptazyme would be infinite. The value g R, Realistic (now defined as k app(0) k app L Ã~LÃ max À Á ) will be dependent on the design of the aptazyme (i.e., the choice of the cleavage tendency v) and on the highest available concentration of ligand (L Ã max ). Because the inhibited aptazyme is hyperbolically controlled by the ligand (see Figure 2D), the lower realistic limit of k app will be very hard to reach, and the range of k app values for ligand-inhibited aptazymes will be heavily dependent on the ratio of L Ã max to EC Ã 50 . A low EC Ã 50 will be crucial if the highest possible concentration of ligand is limited or if the intrinsic affinity of the aptamer domain is low.
According to equation (14), a lower EC Ã 50 should be engineered by decreasing v. However, by comparing equations (13) and (14) we find: which in turn implies that lowering EC Ã 50 will decrease the upper bound on possible k app values ( Figure 2D). Once again there is a compromise between ligand sensitivity and the dynamic range of activity.
Again, to find the cleavage tendency that yields the highest g R, Realistic for a given L Ã max , the expression of g R, Realistic as a function of v should be considered. This expression is: Interestingly, as cleavage tendency v increases from 0 to 1, g R, Realistic decreases linearly from 1+L Ã max to 1 ( Figure 2C, right panel). Consequently, when designing ligand-inhibited aptazymes as in vitro biosensors, it is also always beneficial to choose a low cleavage tendency as long as k app(max) is still readily detectable.
In summary, for both ligand-activated and ligand-inhibited aptazymes there are trade-offs between ligand sensitivity and the dynamic range of activity, reflected by equations (10) and (15), respectively. However, when attempting to maximize g R, Realistic it is always a good strategy to choose a low cleavage tendency, as shown by equations (11) and (16) and Figure 2C.

Aptazymes as in vivo riboswitches
Aptazymes can be inserted into mRNAs in order to regulate their stabilities and translation efficiencies, thereby functioning similar to natural riboswitches in vivo. In such applications, aptazyme regulation will of necessity be further modulated by the dynamic processes surrounding RNA metabolism, including transcription, processing, transportation, translation and degradation. In addition, the most readily observed signals will be steady state mRNA or protein concentrations, instead of cleavage rate constants.
The most straightforward strategy for adapting aptazymes to gene regulation is to engineer a drug-responsive cleavase (such as a hammerhead aptazyme) to target a particular mRNA. However, despite decades of effort, gene regulation based on trans-cleaving ribozymes has proven largely unsuccessful. Gene regulation via ligand-responsive ribozyme was paradoxically first demonstrated in a natural system, where a novel ribozyme located at the 59 UTR of the glmS gene of B. subtilis was found to self-cleave primarily in the presence of GlcN6P [17]. This cleavage has been shown to destabilize glmS mRNA and thus to down-regulate glmS expression [18]. Interestingly, biochemical study revealed that glmS ribozyme is not an allosteric ribozyme per se, since GlcN6P does not allosterically regulate glmS ribozyme but rather serves as a cofactor which directly contributes to catalysis [19].
More recently, the engineering of artificial riboswitches based on cis-cleaving aptazymes has achieved some success. By connecting the anti-theophylline or anti-tetracycline aptamers to the tobacco ringspot virus (TRSV) HHRz via rationally designed or selected communication modules, Win and Smolke engineered aptazymes that, when inserted to the 39 UTR of the GFP gene, could regulate GFP expression in yeast in response to theophylline or tetracycline concentration [14]. The reported dynamic range of GFP expression level was 20,25-fold ( Figure 2 of [14]). However, closer inspection of the raw data provided in the supplementary material ( Figure S13 of [14]) showed that the dynamic range of GFP expression level was actually much lower. Among all the aptazyme constructs that were designed and tested, most displayed only ,1.5-fold regulation and the best ones displayed ,2.5-fold regulation. The discrepancy between the interpretation and the data was due to redefinition of the word 'fold' by the authors. Although the word 'fold' is generally used to express the ratio of two quantities, Win and Smolke used 'fold' as a unit of absolute quantity of GFP expression [14]. For example, the GFP expression level from an unengineered plasmid was defined as '50 fold.' Therefore, when the GFP expression level from an engineered plasmid changed from '20 fold' in the absence of theophylline to '43 fold' in the presence of theophylline, a dynamic range of '(43220 = ) 23 fold' could be claimed. Most researchers would instead estimate the dynamic range to be (43/20 = ) 2.2-fold. Win and Smolke have also reported that multiple aptazymes inserted into the 39 UTR could act as logic gates for gene expression, but the raw data necessary to evaluate these claims were not immediately available [15].
These designs were of necessity eukaryote-specific, since the 39 polyA:59 cap interaction is crucial for efficient protein translation. A prokaryote-specific system has been developed by Wieland et al. in which the ribosome-binding site (RBS) of a reporter gene was embedded in stem I of the Schistosomal HHRz, such that the selfcleavage of the HHRz liberated the RBS for translation initiation [12,13]. Through rational design and genetic screening, a theophylline-responsive aptazyme that exhibited 10-fold regulation of the expression of the reporter gene was generated. The fold-regulation achieved by these authors (1.2-to 10-fold) are far smaller than those that have been routinely demonstrated in vitro (10 2 -,10 4 -fold ).
To explain this discrepancy, we will explore a simple kinetic model. In this model, the eukaryotic-specific system, where an aptazyme is placed within the 39 UTR of a mRNA, will be used. That said, it should be noted that self-cleaving HHRzs placed within the 59 UTR can abet even stronger inhibition of gene expression [20], but such a model would be inherently more challenging because it would have to take into account the continuous scanning by the pre-initiation complex.

Modeling inhibition of gene expression by a constitutively active ribozyme
We first model how gene expression can be inhibited by a constitutively active, self-cleaving ribozyme ( Figures 3A and 3B). In these models, we assume that the steady-state concentration of a protein is proportional to the steady state concentration of its intact mRNA. In contrast, mRNA with a cleaved 39 UTR is assumed to have a negligible translation efficiency or is rapidly degraded [21].
In the absence of ribozyme cleavage ( Figure 3A) the steady state concentration of mRNA ([R] ss ) is v Txn k Deg . When a constitutively active self-cleaving ribozyme is inserted to the 39 UTR of the mRNA ( Figure 3B), the steady state concentration of intact mRNA should depend on its cleavage rate, as well as on the transcription and degradation rates, specifically: If we define the relative steady-state intact mRNA concentration without ribozyme as 1, then the relative steady-state intact mRNA concentration of an mRNA that harbors a ribozyme is: where D~k Cle k Deg is the ratio of cleavage rate constant to the spontaneous degradation rate constant. The extent to which gene expression can be inhibited by an inserted ribozyme is directly determined by this ratio D, which implies that the rate of spontaneous degradation of mRNA also directly influences how much inhibition a given ribozyme can potentially achieve [22].

Modeling regulation of gene expression by ligandactivated aptazymes
As before, we assume that the inactive conformer in a twostate model is completely inactive, and that the active conformer has the same cleavage rate constant as the ribozyme sans aptamer. The kinetic model for gene regulation via ligand-activated self-cleavage is shown in Figure 3C. For simplicity only the 39 UTR is shown. In this model, mRNA is transcribed from the 'gene' (G) with a zero-order rate constant of v Txn . The nascent transcript (I) can fold into either aptazyme conformer [cleavage-incompetent conformer (A) or cleavagecompetent conformer (B)] with folding and unfolding rate constants k FoldA , k FoldB and k UnA , k UnB , respectively. The B (but not A) conformer can also bind the ligand L to form aptazyme:ligand complex BL which has the same catalytic activity as B (k Cle ). The second-order association rate constant and first-order dissociation rate constant are denoted as k On and k Off , respectively.
Under this model (see Text S2 for derivation), the relationship between steady-state relative concentration of intact mRNA (including I, A, B and BL) and the concentration of total ligand L ([L tot ]) is expressed in the following equation: This definition of relative concentration L Ã is similar to our earlier definition of relative ligand concentration, except that in this case K d is replaced by: k Off zk Deg zk Cle k On which we term the apparent dissociation constant and denote as and will have a similar value to K d when the dissociation rate constant of the ligand:aptamer complex (k Off values typically 10 23 to 10 1 s 21 ) is much higher than the cleavage rate constant of the ribozyme (k Cle values typically 10 22 to 1 s 21 ). However, it may also have a larger value than K d when k Off is comparable to or lower than k Cle . Again, K d 0 is the scaling factor for ligand concentration.
Since the degradation rate constant of mRNA in eukaryotic cells is much slower (by up to 10 orders of magnitude; [23]) than structural transition, ligand dissociation, and ribozyme cleavage rates, a and b should have values similar to the equilibrium constants for the reactions I«A ( k FoldA k UnA ) and I«B ( k FoldB k UnB ).
Notably, b can be treated as a constant although it is actually a function of ligand concentration. When b is treated as a constant, b a is similar to K int in Figure 2 and consequently b azb is equivalent to the cleavage tendency v.
Moreover, since the folded state is typically of lower energy (and thus more occupied) than the intermediate (I) or unfolded state, a is usually much greater than 1. Given these two conditions, the equation (19) can be written as: It is interesting that equation (23) can be simply obtained by replacing k Cle in (17) with k app in (4). This suggests that the equation for the function of aptazymes in vitro (4) can be used for aptazymes in vivo, with the only significant error coming when k Cle is on the same order as or larger than k Off , which would in turn lead to a significant difference between K d 0 and K d .

Design principles for ligand-activated aptazymes as in vivo riboswitches
Characteristics of the transfer function. The aptazyme regulation of steady-state mRNA concentration can be thought of as 'cascaded' hyperbolic control in which the apparent cleavage activity of the ribozyme (k app ) is hyperbolically controlled by relative ligand concentration (L Ã ) and [R] Rel is in turn hyperbolically controlled by the cleavage activity of the ribozyme. Mathematically it can be proven that regulatory elements that exhibit hyperbolic responsivity also exhibit hyperbolic responsivity when coupled in series. In general, if: and z~a 2 zb 2 y c 2 zy then: which means z is hyperbolically controlled by x.
By applying this conclusion to equation (23) it can be seen that the [R] Rel -vs-L Ã curve is a decreasing hyperbola ( Figure 4A), whose descriptor can be obtained by rearranging (23) Limits on the theoretical dynamic range. The aim of the design process is to optimize both the dynamic range of gene expression (as shown by the range of [R] Rel ) and sensitivity to effector (as shown by EC Ã 50 ). In order to have a large dynamic range of activity and modulation at low effector concentrations, aptamers, ribozymes, and mRNAs must be chosen that have optimal values of K d , k Cle , and k Deg , respectively. In addition, aptazyme cleavage tendency can be engineered to improve the dynamic range of activity and responsivity to effector.
For optimization of the dynamic range of activity, it is useful to think how closely the performance of the aptazyme can approach either completely cleaving or completely protecting a mRNA. The difference between complete cleavage of the mRNA and the theoretical minimum steady-state mRNA level that can be obtained in the presence of the aptazyme will be called the 'Floor Gap' (Figure 4A). The difference between complete protection and the theoretical maximum steady-state mRNA level in the presence of the aptazyme will be called the 'Roof Gap' ( Figure 4A). The aptamer, ribozyme, mRNA, and aptazyme variables must be chosen so to have as narrow a 'Roof Gap' and 'Floor Gap' as possible, while still maintaining high ligand sensitivity (low EC Ã 50 ). Going by equation (26), it is clear that for a ligand-activated aptazyme the 'Floor Gap' is solely dependent on the intrinsic variable D ( k Cle k Deg ). The 'Floor Gap' can only be narrowed by choosing or engineering faster ribozymes and/or more stable mRNAs.
In contrast, the 'Roof Gap' and EC Ã 50 are dependent upon the cleavage tendency of the aptazyme. They also have additional limitations. According to (27), EC Ã 50 is always greater than 1 since v is always smaller than 1. Thus EC 50 is always greater than K d 0 . Moreover, by comparing equations (25) and (27)  An additional criterion that can be used to evaluate the system is the fold-inhibition that occurs over the theoretical dynamic range of gene expression, denoted as g R, Theoretical : The 'Roof Gap' can be narrowed by engineering a very small cleavage tendency, i.e., by heavily disfavoring the cleavagecompetent conformer (albeit at the cost of a rising EC Ã 50 ). Under these circumstances, the primary determinant of g R, Theoretical will be the 'Floor Gap', or [R] Rel(min) . For instance, when D can be made to be as high as 1000, a riboswitch with a theoretical ,1000fold inhibition (assuming no constraints on ligand concentration) can be engineered by designing the cleavage tendency to be ca. 1 10000 . Given these parameters EC 50 would be around 10 times K d 9. Practically, for stringent regulation (.10-fold), we believe that D should be at least 10. This condition (high D, low cleavage tendency) satisfies equation (28), and is equivalent to saying that a high [R] Rel(max) requires a high EC Ã 50 . However, (28) also shows that when EC Ã 50 is greater than ,5, further increases of EC Ã 50 produce only marginal improvements in [R] Rel(max) .
This analysis suggests that the principles that apply in vivo are drastically different from those that apply to in vitro biosensors, primarily because the observed signals are different from one another (in one case, a direct readout of catalysis, in the other, a readout 'buffered' by transcription and degradation). From equation (9) we can see that in the in vitro case where k app is essentially the observed signal a g R, Theoretical of 1000 would therefore require an EC 50 of 1,000 times K d . In contrast, for the in vivo case, when observing [R] Rel the same g R, Theoretical can be obtained with higher ligand sensitivity (i.e., with an EC 50 of ca. only 10 times K d 9, as detailed above).
Limits on the realistic dynamic range. Although the compromise between ligand sensitivity and the theoretical dynamic range of activity in vivo is not as severe as was the case for the ligand-activated aptazyme in vitro, the [R] Rel -vs-L Ã curve (in contrast to k app -vs-L Ã curve) is a decreasing hyperbola, and its lower limit is difficult to reach (see Figure 4A). Therefore when D is sufficiently large, the realistic dynamic range usually depends primarily on the maximum L Ã . For the example above where the theoretical dynamic range of [R] Rel is 1,000-fold, even a 500-fold reduction of [R] Rel requires the intracellular ligand concentration to be at least 10,0006K d 0 , e.g. for an aptamer with a K d of 100nM, the intracellular ligand concentration must be 1mM! The theoretical and realistic dynamic range of [R] Rel as functions of cleavage tendency can be seen in the 'regulatory landscape' (Figures 5). In this Figure, the relationship between three variables (cleavage tendency, [R] Rel , and ligand concentration) are plotted in twodimensions. In order to achieve the third dimension, ligand concentration is colored. We also examine the relationships between these variables at two different values of D, 10 and 100.
In these plots the upper limit of achievable ligand concentration was arbitrarily chosen to be 100 K d 0 . The theoretical dynamic range of [R] Rel is encompassed within the colored (including black) region. The black areas represent those regions that are inaccessible due to difficult-to-achieve relative ligand concentrations. The EC Ã 50 value is shown as a dashed line. Based on the analyses above and an examination of Figure 5, we can qualitatively conclude that the primary variables that limit the performance of a ligand-activated aptazyme as a generegulatory element are D and L Ã max . Therefore, in optimizing riboswitches based on ligand-activated aptazymes one must: (i) attempt to achieve the tightest ligand-binding possible; (ii) use or engineer a faster ribozyme and/or a more stable mRNA; (iii) appropriately disfavor the cleavage-competent conformation; and (iv) choose a ligand with high cell-permeability and low cytotoxicity.
More quantitatively, the optimal cleavage tendency v can be determined when the limiting factors D and L Ã max are both known. By defining g R, Realistic as the fold-inhibition yielded by the aptazyme over the realistic dynamic range of [R] Rel (or formally: ½R Rel L Ã~LÃ max À Á ), the relationship between g R, Realistic and v can be analytically obtained: As shown in Figure 6A, for any given D and L Ã max there are always v 'sweet spots' where g R, Realistic is maximized. Around these 'sweet spots' g R, Realistic is highly sensitive to v, especially when D and L Ã max are high. The position of the v 'sweet spot' (optimal v) can be analytically obtained by solving dg R, Realistic dv~0 . However, this analytical result does not elucidate mechanistic understanding and is thus not shown.

Modeling regulation of gene expression by ligandinhibited aptazymes
The model for a ligand-inhibited self-cleaving ribozyme is diagramed in Figure 3D. The primary difference from the model for a ligand-activated aptazyme ( Figure 3C) is that now only the conformer A, instead of both B and BL, can undergo selfcleavage. Given the parameters in Figure 3C, the relationship between relative steady-state concentration of intact mRNA ([R] Rel ) and ligand concentration ([L tot ]) is (see Text S2 for derivation): where: b~k foldB k UnB zk Deg zk Deg : L Ã ð32Þ In this case the apparent dissociation constant K d 0 ( k Off zk Deg k On ) is closer in value to K d since k Cle does not appear in the definition of K d 0 . As before, a and b are similar to the equilibrium constants for the reactions I«A ( k FoldA k UnA ) and I«B ( k FoldB k UnB ), respectively, and b can be treated as a constant. Given that a azb~v , when a is much greater than 1 then equation (30) can be re-written as: Since the inhibition of aptazyme cleavage would result in a increase of gene expression, the [R] Rel -vs-L Ã curve is an increasing hyperbola, whose descriptor can be obtained by re-writing (34) to: where: and From these results it can be seen that the 'Roof Gap' ( Figure 4B) for a ligand-inhibited aptazyme is always 0, since the mRNA can theoretically be completely protected when the concentration of the ligand approaches infinite. In contrast, the width of the 'Floor Gap' is dependent on D and the cleavage tendency v. As before, the theoretical and realistic dynamic ranges of gene expression are graphically represented as a regulatory landscape (Figures 5E-5F). Analytically, by defining: it can be shown that: and Once again, for each given D and L Ã max there is an optimal v to maximize g R, Realistic ( Figure 6B). Interestingly, though, for ligand-inhibited aptazymes a much wider range of cleavage tendencies give satisfactory g R, Realistic values ( Figures 6B,6D).
Analytical import of the model A major advance in our modeling compared to previous work ( [16]) is that we provide practical guidelines for what experiments should be carried out to develop a quantitative understanding and prediction of riboswitch function. Based on our analysis, the performance of an aptazyme-based riboswitch can be quantitatively predicted when four parameters are known: (i) the gene expression level of an unengineered mRNA; (ii) the ratio of the ribozyme cleavage rate constant to the mRNA degradation rate constant (D); (iii) cleavage tendency of the aptazyme (v); and (iv) the maximum available relative concentration of ligand (L Ã max ). Among these four parameters, the gene expression level of an unengineered mRNA can be trivially measured. Using equation (18), D can be obtained by measuring the gene expression level of an mRNA harboring a ribozyme sans aptamer at its 39 UTR (or elsewhere). Once D is determined, the cleavage tendency can be predicted based on RNA folding energetics or by measuring the gene expression level of an aptazyme-harboring mRNA in the absence of ligand, according to equations (25) and (35).
The only parameter that cannot be directly measured is L Ã max . However, L Ã max is ligand-specific, aptamer-specific, and organismspecific, but not design-specific. Therefore if the g R, Realistic for one aptazyme is measured, L Ã max can be calculated and used to predict the performance of other aptazymes which contain the same aptamer and are used in the same organism. To calculate L Ã max from g R, Realistic one need only solve equations (29) and (39), yielding: for ligand-activated aptazymes and for ligand-inhibited aptazymes.
With such a theoretical framework we can attempt not only to promulgate engineering principles, but also to analyze previously designed aptazyme-based riboswitches. As we discussed above, Win and Smolke generated a series of theophylline-responsive hammerhead ribozymes by grafting the anti-theophylline aptamer onto loop I or loop II of the TRSV ribozyme via various communication domains [14]. When these different constructs were placed in the 39 UTR of a reporter gene (GFP) modest ,2fold effects on gene regulation were observed. One rationale for the disappointing results was that introduction of aptamer domains into loop I and loop II disrupted a known, critical tertiary interaction [24]. Although the original TRSV ribozyme inserted into the 39 UTR can inhibit the expression of GFP expression to 2% of the unengineered mRNA level, when loop II was extended the inhibition was only to ,10%. If the steady-state GFP signal reflects the steady-state concentration of intact mRNA, the D value for the engineered aptazymes was thus likely to be ,10. Therefore, the maximum activation and inhibition could never exceed 10-fold, as shown by Figures 6A and 6B (top panels). The constructs were inherently restricted by their very design.
Beyond limitations on catalysis, we also suspect that there were limitations on either the allosteric binding sites or the available intracellular ligand concentration. Using the data from Figure  S13 of Win and Smolke [14] and equations (25) and (35), the cleavage tendencies of each aptazyme were calculated ( Table 1). L Ã max was also calculated from each aptazyme construct using equations (40) and (41) ( Table 1). Although many L Ã max values fall into a narrow range, they were not consistent. Possible explanations for this inconsistency include: (i) the existence of 'non-productive' aptazyme conformations not considered in the model (e.g., a non-binding and non-cleaving conformation of the ligand-inhibited aptazyme); and (ii) the possibility that the basic functionality of either the aptamer or the ribozyme were significantly altered in the aptazyme designs.
To further our analysis, we assume that the aptazymes showing the largest L Ã max (,12) did not operate under the caveats stated above. If so, the maximum available cellular theophylline concentration was only about 12 times the K d 0 of the antitheophylline aptamer. The anti-theophylline aptamer has a reported K d ,1mM [25]. Assuming the aptamer retains its affinity for theophylline in the cellular environment, the calculated L Ã max indicates that the intracellular concentration would be on the order of 12 mM, even though the extracellular concentration of theophylline was 5mM. This discrepancy is consistent with an early finding that the intracellular concentration of theophylline in E.coli is 10 3 -fold lower than the concentration in media [26], and with the previous performance of an engineered antiswitch in yeast [8].
The comparison between the model and the experimental data from these studies can be visualized in the regulatory landscapes shown in Figure 7A and 7B, where the calculated cleavage tendencies and the relative gene expression values are shown both in the absence of theophylline (circles) and in the presence of 5 mM theophylline (triangles). For most constructs, there was quantitative agreement between the model and experimental data with acceptable variance. It should be noted that if we had used the original, published estimates for the fold-change due to the aptazyme there would have been virtually no agreement between model and experiment.
With aptazymes that had an intrinsically limited D (,10) and a small upper limit of L * (,12), it was ultimately to be expected that the maximum fold-change that might be available through optimization of the communication module was only ,3.5-fold ( Figures 6A and 6B, upper panels) for both ligand-activated and ligand-inhibited aptazymes.
In order to actually obtain better aptazyme and riboswitch functionality both a larger D and a higher upper limit of L * must be engineered. Our model predicts that by using a 10-fold more stable mRNA the maximum fold-change can be increased to ,7fold ( Figure 6A and 6B, lower panels; keeping the upper limit of L * constant). For this more stable mRNA when L * is also increased to 50 (by using a tighter binding aptamer:ligand pair and/or a ligand that is better able to penetrate the cell), ,17-fold regulation can be achieved ( Figure 6A and 6B, lower panels).
In summary, the dynamic range of gene expression in the current aptazyme-based riboswitch system is severely limited by the cleavage rate of the ribozyme relative to spontaneous mRNA degradation rate and the achievable intracellular ligand concentration relative to the in vivo K d of the aptamer. Reasonable improvements of these factors should lead to a wider dynamic range of gene expression.

Challenges and future directions
Although throughout the above analyses we assume that the cleavage tendency can be freely tuned, this is based on the assumption that for a given sequence design the aptazyme conformations and their relative energetics can be reliably predicted. This assumption is questionable. For example, we have recently designed a series of biosensors based on the anti-thrombin aptamer, and demonstrated that biosensor properties did not align with the stabilities based on secondary structural features alone, but were fit much better by measured stabilities [27]. Similarly, attempts to computationally design hammerhead aptazymes based only on secondary structural hypotheses (the 'slip structure' model; [9]) yielded aptazymes that were much less activated [28].
Such discrepancies are likely to be even greater when intracellular energetics need to be predicted. For example, for the aptazymes designed by Win and Smolke [14], the cleavage tendencies calculated from experimental data ( Table 1) largely disagree with the predicted cleavage tendencies calculated from the thermodynamics data (taken from Table S1 of [14]), as shown in Figure 7C. In principle, designed aptazymes should be characterized in vitro to better understand whether and how they fit either in silico data or the in vivo data. Similarly, a recent attempt at model-driven design of allosteric shRNAs also yielded only qualitative agreement with modeling based on secondary structures [29].
To better ensure coherence between model and reality, many assumptions and predictions made in our model of aptazymebased biosensors and riboswitches need to be tested experimentally. First of all, it is critical to test to what extent the two-state structural and energetic model is acceptable. In a recent elegant study on the kinetics of a previously engineered theophyllineactivated hammerhead ribozyme [9], de Silva and Walter observed four conformations relevant to activation using singlemolecule fluorescent resonance energy transfer (FRET) [30]. Moreover, upon the addition of theophylline the conformational change of the aptamer domain was observed to be much faster than that of the ribozyme core. Based on these results the authors suggested a model for ligand-induced conformational change in which the aptamer domain is capable of binding ligand even in the cleavage-incompetent conformation of the aptazyme. Consequently, the ligand binding of the aptamer domain primes the conformational change of the communication domain and the ribozyme domain (induced fit). Whether this mechanism proves to be general will strongly impact how the kinetics of effector modulation are modeled, and may alter the equilibrium arguments we make herein, depending on how the different energy states are populated.
In addition, parameters relevant to the in vivo environment need to be characterized in greater detail in order to understand aptazyme function. For example, translation efficiency and the half-life of cleaved mRNAs should be carefully determined since these factors, although ignored in the current model, would contribute to the background expression level when a ribozyme or aptazyme is cleaving at full speed [21]. A more fundamental and largely unknown issue is how the energetics and kinetics of RNA folding are influenced by the biochemical properties (ionic strength, viscosity, the presence of RNA chaperons and helicases) in cellular environments. While predictive models are incomplete in the absence of such information, it is nonetheless worthwhile to formulate them so that the functionality of aptazymes can be more routinely evaluated as these additional variables are acquired.

Methods
The derivations of the fundamental equations (equations (2), (3), (19) and (30)) that describe how energetic parameters dictate the performance of aptazymes in vitro and in vivo are provided in the Text S1 and S2. All figures were produced with MatLab using the equations described in the text.
The Supplemental Information and Figure 13 from Win and Smolke [14] were used to derive data for our analyses. The 'designed cleavage tendency' presented in our Figure 7C was calculated using the equations: where the values of DG active and DG inactive were taken from the Supplemental Information Table 2 of reference [14].

Supporting Information
Text S1 Derivation of equations  [14]. With this definition, the GFP expression level for unengineered mRNA is 50 a.u.. Source of data: SI Figure 13 of Win and Smolke [14].

Author Contributions
Outlined the aims and methods of the study, built the mathematical model, performed the analyses, and drafted the manuscript: XC. Suggested some of the data analyses and comparisons and critically revised the manuscript: ADE.