Energy Parameters and Novel Algorithms for an Extended Nearest Neighbor Energy Model of RNA

We describe the first algorithm and software, RNAenn, to compute the partition function and minimum free energy secondary structure for RNA with respect to an extended nearest neighbor energy model. Our next-nearest-neighbor triplet energy model appears to lead to somewhat more cooperative folding than does the nearest neighbor energy model, as judged by melting curves computed with RNAenn and with two popular software implementations for the nearest-neighbor energy model. A web server is available at http://bioinformatics.bc.edu/clotelab/RNAenn/.


Introduction
Thermodynamics-based ab initio RNA secondary structure algorithms are used to detect microRNAs [1], targets of microRNAs [2], non-coding RNA genes [3], temperaturedependent riboregulators [4], selenoproteins [5], ribosomal frameshift locations [6], RNA-protein binding sites [7], etc. The importance and ubiquity of RNA thermodynamics-based algorithms cannot be overemphasized -there are even applications in RNA design for novel cancer therapies and in synthetic biology. Indeed, in [8] Vashishta et al. used the RNA minimum free energy (MFE) structure prediction algorithm mfold [9] to design seven anti-pCD ribozymes, four of which were cloned, stably transfected in the highly metastatic human breast cancer cell line, MDA-MB-231, and shown to have a therapeutic potential by knocking down the expression of pCD. (Procathepsin D (pCD) is correlated with highly invasive malignancies, such as breast cancer. Ribozymes, first discovered by the Nobel laureats, T. Cech and S. Altman, are RNA enzymes that can cleave a molecule or catalyze a reaction.) Following pioneering work of the Tinoco Lab and Freier et al. [10], a number of increasingly sophisticated nearest neighbor models have been defined: INN [11,12], INN-HB, also called Turner99 [13], Turner2004 [14,15], as well as models that incorporate knowledge-based parameters [16,17]. These free energy parameters of the nearest neighbor (NN) model form the foundation for essentially all current thermodynamics-based RNA algorithms: minimum free energy (MFE) secondary structure [9,18], Boltzmann partition function [19], maximum expected accuracy secondary structure [20], MFE secondary structure with pseudoknots [21], sampling suboptimal structures [22], RNA sequencestructure alignments [23], etc.
Benchmarking studies have shown that, on average, the minimum free energy structure includes 73% of base pairs in Xray structures when domains of fewer than 700 nucleotides (nt) are folded [24]; i.e. prediction sensitivity of the MFE structure is 73%, although accuracy drops as sequence length increases. There is increasing evidence that by improving the free energy parameters, structure prediction accuracy can be improved. Andronescu et al. [16] used combinatorial optimization to determine optimal weights a,b for which energy parameters are determined by aweighted contribution from Turner's free energies together with bweighted contribution from knowledge-based potentials, the latter obtained from the negative logarithm of frequencies in existent structure databases. Free energy parameters in the Turner model are determined by a least-squares fit of UV absorption data based on the assumption that change in heat capacity, DC P , is zero. This assumption is erroneous, as pointed out by Mikulecky and Feig [25], who observed that the hammerhead ribozyme does not fold in 2-state transition, but rather has 3 states: cold denatured, folded and hot denatured. In [17] M. Bon improved MFE structure prediction by defining new parameters for the nearest neighbor model that account for linear dependence of change DC P of heat capacity on sequence length and by incorporating knowledgebased potentials from a hand-curated selection of Sprinzl's transfer RNA database [26].

Subsection 1.1: Motivation from protein helix-coil transition
Consider a coarse-grain classification of amino acids, where a polypeptide chain is given by an n-mer, or length n sequence a 1 , . . . ,a n of amino acids, where each residue a i is either in an H (a-helix) or C (coil) conformation. Assume that the energy of an ahelical residue is E(H)~E 0 v0, while that of a coil residue is E(C)~0. A protein with many residues in an a-helical conformation at room temperature, such as hemoglobin, will unfold into a random coil at a higher temperature, where all previous H residues have been transformed into C residues. In particular, if a 1 , . . . ,a n is an a-helix, then at low temperature, all residues are H, while at high temperature all residues are C. The partition function Z of a 1 , . . . ,a n is defined by Z~P s exp({E(s)=RT), where the sum is taken over all 2 n many sequences s of H's and C's. Using the (temperature-dependent) partition function, we can compute the expected number SHT of a-helical residues for the nmer a 1 , . . . ,a n at absolute temperature T, defined by where s~exp({E 0 =RT) -see [27]. Subsequently, it is possible to plot the expected helical fraction SHT n as a function of temperature.
Non-cooperative energy models show an approximately linear relation, where the expected helical fraction slowly decreases as temperature increases. In contrast, the plot of expected helical fraction versus temperature for cooperative energy models displays a sigmoidal shape, where there is an abrupt helix-coil transition from high to low values for the helical fraction that occurs at a critical temperature T M . Polymer theory provides several mathematical models to explain the temperature-dependent helix-coil transition for proteins. The simplest polymer model for the helix-coil transition of an ahelix is the non-cooperative model, where the probability that the each residue is H is independent of the conformation of every other residue. The cooperative, nearest-neighbor model for helixcoil transition, introduced by Zimm and Bragg [28], includes nucleation free energy dw0 that is applied for each a-helical segment of contiguous H residues. Finally, the Ising model was introduced by E. Ising in 1925 [29] to explain ferromagnetism, but has subsequently been used to model protein temperaturedependent helix-coil transitions -see, for instance [30]. Progressing from the independent model to the Zimm-Bragg model to the Ising model, each model is increasingly cooperative, thus providing a better fit to the experimental data. See Dill and Bromberg [27] for a more detailed discussion.
In the Nussinov energy model [31] for RNA secondary structure, the free energy of a secondary structure S is defined to be {1 times the number DSD of base pairs of S; i.e. in the Nussinov model, each base pair contributes an energy of {1, and there is no energy term for entropic considerations. The Turner energy model [13,32] for RNA secondary structure contains negative free energies for base stacks, which depend on the nucleotides involved, such as the base stacking free energy of . Though we do not determine Figure 1. Graph of the expected number of base pairs as a function of temperature for signal recognition particle with Rfam [56] accession number X12643. Temperature in degrees Celsius is given on the x-axis, while the expected number of base pairs Sbase pairs=nT, normalized by sequence length n, is given on the y-axis. Note the linear dependence on temperature for the non-cooperative Nussinov energy model, in contrast to the sigmoidal dependence on temperature for the cooperative Turner energy model. Data and figure taken from our paper [57]. doi:10.1371/journal.pone.0085412.g001 the analogue of the Ising model for RNA secondary structure formation, we do introduce an extended nearest-neighbor model, also called triplet or next-nearest-neighbor model, which displays somewhat more cooperativity, as displayed in the sharpness of the transition from folded to unfolded state in a figure shown later in the paper.

Subsection 1.2: Triplet model
As previously mentioned, the nearest-neighbor energy model [13,32] Table 1 of [12] determined the theoretical number of independent hybridized sequences that must be considered in UV absorbance experiments, in order to obtain triplet stacking free energies by least-squares fitting of data. In [33] Gray et al. experimentally determined in vivo inhibition parameters for next-nearest-neighbor triplets in the case of antisense DNA -RNA hybridization to inhibit protein expression. In [34], Najafabadi et al. applied a neural network to predict the thermodynamic parameters for the next-nearestneighbor triplet model, using existent UV absorbance data from the thermodynamic database for nucleic acids, NTDB version 2.0 [35].
Though at present there are no experimentally determined free energies for triplet stacking, Binder et al. [36] did show a strong correlation between microarray fluorescence intensities and DNA-RNA base stacking free energies of Sugimoto et al. [37]. More precisely, Binder et al. showed that linear combinations of tripleaveraged probe sensitivities provide nearest-neighbor sensitivity terms, that rank in similar order as the base stacking free energy parameters for DNA-RNA in solution [37]. It is our hope that future improvements in RNAseq, microarray or other technologies will ultimately furnish experimentally determined triplet and even k-tuple stacking free energies. New triplet free energies could immediately be incorporated into our algorithms, and it is tedious, but clear how one can modify our algorithms to handle k-tuple free energies.

Subsection 1.3: Plan of the paper
In this paper, we describe the first algorithms to compute the partition function and minimum free energy structure for singlestranded RNA, with respect to the full next-nearest-neighbor triplet energy model for RNA. In the Introduction, we gave the motivation for this work, coming from the Zimm-Bragg and Ising models in biopolymer theory. The plan for the remainder of the paper is as follows. In the Results section, Section 2.1 gives the notation and Sensitivity is the ratio of number of correctly predicted base pairs divided by the number of base pairs in the native structure; positive predictive value is the ratio of the number of correctly predicted base pairs divided by the number of base pairs in the predicted structure. Since RNAenn currently does not include energy contributions for dangles (single stranded, stacked nucleotides), RNAfold was used without dangles (version 1.8.5 with -d 0 6 flag). To our knowledge, there has not been a careful benchmarking of structure prediction accuracy between the Turner 1999 energy model and the newer Turner 2004 energy model, though it is interesting to note that RNAenn has better structure prediction when using Turner 1999 for base stacking. Overall, it is clear that RNAfold outperforms RNAenn (Turner99), although a few cases, such as ec and rnap2 RNAenn have better sensitivity. Nevertheless, we expect much better performance in the future when our triplet and base stacking energy terms have been refined by using knowledge-base potentials. The database of RNA structures in this benchmarking set comes from a data collection of D.H. Mathews (personal communication), which derives from published databases [26,54], etc. See [55] for a citation of original data sources. doi:10.1371/journal.pone.0085412.t001 definitions needed for the sequel, while Section 2.2 presents the extended nearest neighbor model and method used to obtain energy parameters. In the Discussion, we give secondary structure benchmarking results for the nearest neighbor (NN) and extended nearest neighbor (ENN) energy models. Additionally, the cooperativity of folding is compared with both energy models. In the Methods section, Section 3.1 [resp. Section 3.2] presents recursions for the partition function [resp. minimum free energy structure] computation. In addition to the software RNAnn and RNAenn developed for this paper, we use Vienna RNA Package RNAfold [18], RNAstructure [38], and mfold [9]. As illustration for the cooperativity of folding, we compare melting curves for two small nucleolar RNAs (snoRNA), with respect to the NN and ENN energy models; additional melting curves are available on the web server http://bioinformatics.bc.edu/clotelab/RNAenn/. These results suggest that the the extended nearest-neighbor energy model may lead to more cooperative folding than does the nearestneighbor model, which was our motivation to study the ENN energy model.
The goal of this paper is to describe the non-trivial RNAenn algorithms, which are implemented in C/C++. Our work points toward a future potential improvement in RNA secondary structure prediction, either by incorporating triplet knowledgebased potentials or experimentally inferred extended nearestneighbor free energy parameters.

Results: Extended nearest neighbor model algorithms
Assume that a 1 , . . . ,a n is a given RNA sequence. In this section, we describe pseudocode for the partition function and minimum free energy computation for an extended nearest neighbor model. Although our software, RNAenn, does depend on the exact values of the extended nearest-neighbor energy parameters, the description of the algorithms does not.

Subsection 2.1: Notation and definitions
Let a~a 1 , . . . ,a n be an arbitrary RNA sequence, and let a½i,j denote the subsequence a i , . . . ,a j . A secondary structure S for a given RNA sequence a~a 1 , . . . ,a n is a set of base pairs (i,j), 1ƒivjƒn, such that (1) a i ,a j forms a Watson Crick AU, UA, GC, CG or wobble GU, UG pair; (2) each base is paired to at most one other base, i.e. (i,j),(i,k)[S implies that j~k, and (i,j),(k,j)[S implies that i~k; (3) there are no pseudoknots in S, where a pseudoknot consists of base pairs (i,j),(k,') where ivkvjv'; (4) each hairpin loop has at least h unpaired bases; i.e. (i,j)[S implies that j{i §hz1.
In software such as mfold [9], Unafold [39], RNAfold [40], and RNAstructure [38], the parameter h, denoting the minimum number of unpaired bases in a hairpin loop, is taken to be equal to 3, due to steric constraints of RNA molecules.
The nearest-neighbor and extended nearest-neighbor triplet models are additive energy models that entail free energy values for loops, as explained in [41]. A hairpin in a secondary structure S is defined by the base pair (i,j), where iz1, . . . ,j{1 are unpaired. A left bulge in S is defined by the two base pairs (i,j),(k,j{1)[S, where iz1vk and iz1, . . . ,k{1 are unpaired. A right bulge in S is defined by the two base pairs (i,j),(iz1,k)[S, where kvj{1 and kz1, . . . ,j{1 are unpaired. An internal loop in S is defined by the two base pairs (i,j),(k,')[S, where iz1vk and 'vj{1 and iz1, . . . ,k{1 and 'z1, . . . ,j{1 are unpaired. Finally, a k-way junction, or multiloop with k{1 components, is defined by the closing base pair (i,j) and k{1 inner base pairs Figure 2 for an illustration.
Given the RNA nucleotide sequence a 1 , . . . ,a n , we use the notation H to denote the free energy of a hairpin, E(S) to denote the free energy of a stacked base pair, E(I) to denote the free energy of an internal loop, E(B) to denote the free energy of a bulge, while the free energy E(M) for a multiloop containing N b base pairs and N u unpaired bases is given by the affine approximation azbN b zcN u . The free energy E(M1) of a multiloop having exactly one component is then given by azbzcN u .
For RNA sequence a 1 , . . . ,a n , for all 1ƒiƒjƒn, the partition function Z i,j is defined by P S e {E(S)=RT , where the sum is taken over all secondary structures S of a½i,j, E(S) is the free energy of secondary structure S, R is the universal gas constant with value R~0:001987 kcal/mol 21 K 21 , and T is absolute temperature. In the Zuker [9,18,38] and McCaskill [19] algorithms, E(S) is the Turner nearest neighbor energy model; in contrast, when discussing the extended nearest-neighbor energy model, we use E(S) to denote the triplet energy model.
Given an RNA sequence a 1 , . . . ,a n , in order to compute the partition function Z 1,n [resp. minimum free energy E 1,n ] for a 1 , . . . ,a n , we need inductively to determine the partition function Z i,j [resp. minimum free energy E i,j ] for all smaller subsequences a i , . . . ,a j . In so doing, we need to know which structures involve a triple stack (i,j),(iz1,j{1),(iz2,j{2), which structures involve only a stacked pair (i,j),(iz1,j{1), and which structures involve a base pair (i,j) which closes a loop region. This is accomplished by terms ZBB i,j [resp. EBB i,j ]. Moreover, the Turner energy model stipulates that a base pair (i,j), which closes a left bulge of size 1, as in (i,j),(iz2,j{1), or a right bulge of size 1, as in (i,j),(iz1,j{2), is considered to stack on the subsequent base pair. This consideration requires the introduction of special terms ZBBL i,j ,   N EM i,j : minimum free energy over all secondary structures of a½i,j, subject to the constraint that a½i,j is part of a multiloop and has at least one component.
N EM1 i,j : minimum free energy over all secondary structures of a½i,j, subject to the constraint that a½i,j is part of a multiloop and has at exactly one component. Moreover, it is required that i base-pair in the interval ½i,j; i.e. (i,r) is a base pair, for some ivrƒj.
Details for the recursions necessary to compute the ENN minimum free energy secondary structure and ENN partition function are given in the Methods section.

Subsection 2.2: Extended nearest-neighbor energy model ENN-13
Here we describe details for the extended nearest-neighbor energy model parameters, which we denote by ENN-13, since our code RNAenn was completed in 2013.
Though some related experimental work has been done, especially by D.M. Gray and co-workers [11,12,33], there are no available experimentally determined parameters for triplet stacking. Rather than using the triplet stacking free energy parameters INN-48 [34], which are incomplete since GU-wobble pairs were not included, we instead infer RNA triplet stacking free energies by a novel use of Brown's algorithm [42], which computes the maximum entropy joint probability distribution that is consistent with given user-specified marginal probabilities. Though Brown's algorithm has been used by C. Burge to predict intron-exon splice sites in the human gene finder, GenScan [43], this appears to be the first use of Brown's algorithm to infer free energy parameters.
Brown's algorithm for maximum entropy joint distribution. In [42], D.T. Brown described an efficient algorithm to compute the maximum entropy joint probability distribution given certain marginal probabilities, where we recall that the entropy of a joint probability distribution p : V n ?½0,1 is defined by H(p)~{ P x1,...,xn[V p(x 1 , . . . ,x n ) : ln p(x 1 , . . . ,x n ). Although the algorithm was correct, there was an error in Brown's proof of correctness, which was subsequently repaired by Ireland and Kullback [44], who additionally showed that the maximum entropy distribution is not the maximum likelihood distribution.
Suppose that p(x 1 , . . . ,x n ) is a given joint probability distribution on V n , where V is the alphabet fA,C,G,Ug of RNA nucleotides. Recall that a marginal probability distribution p ai 1 ,...,ai m : V n{m ?½0,1 is defined by the projection Given an integer n §2, a value Ew0, and a set of arbitrary marginal probabilities p ai 1 ,...,ai m the idea is to initialize p to the uniform distribution, then repeatedly update p so that it has the correct currently considered marginal. Conversion between free energies and probabilities. To compute triplet stacking free energies, base stacking free energies from Turner 1999 [or alternatively Turner 2004] energy model are converted to marginal probabilities in the following manner. Given a triplet stack where the outermost base pair occurs at (i,j), let a denote the outermost base pair (i,j) with nucleotides X 1 ,Y 1 , let b denote the middle base pair (iz1,j{1) with nucleotides X 2 ,Y 2 , and let c denote the innermost base pair (iz2,j{2) with nucleotides X 3 ,Y 3 . It is a well-known principle, first proved by Jaynes [45] and subsequently exploited in protein threading algorithms [46,47], that a representative database of biomolecular sequences and structures has the property that motif occurrences are In words, the left/middle/right marginal probability is defined by the quotient of the sum over all six base pairs in the left/ middle/right position, while fixing the remaining two base pairs, divided by the partition function. We then apply Brown's algorithm to compute the joint probability distribution P(a,b,d) for all base pairs a,b,d, and thus obtain triplet stacking free energies E(a,b,c)~{RT ln(Q : P(a,b,c)): An additional potential advantage of the extended nearest neighbor energy model is that the MFE structure is perhaps less likely to have isolated base pairs, than when using base stacking free energies. In particular, Bompfunewerer et al. [48] described an O(n 3 ) algorithm to compute the MFE structure and partition function over all canonical secondary structures; i.e. those having no isolated base pairs, where an isolated base pair (i,j) has no adjacent base pair (iz1,j{1) or (i{1,jz1). Bompfunewerer et al. stated that preliminary studies indicated that canonical MFE structure prediction is both faster and more accurate. In [49] we provided theoretical reasons for the computational speed-up, by using complex analysis to prove that the asymptotic number of canonical secondary structures is 2:1614 : n {3=2: 1:96798 n , compared to the much larger number 1:104366 : n {3=2: 2:618034 n of all secondary structures, a result obtained in [50] by a different method.
Apart from the triplet stacking energy, ENN-13 contains free energies for base stacks (used only at stem ends), hairpins, bulges, internal loops and multiloops from the Turner NN model -here, the user may choose between the Turner 1999 parameters and the Turner 2004 parameters, the former taken from Vienna RNA Package 1.8.5 and the latter taken from the Nearest Neighbor  [58]. (Left) Gene off structure, determined by in-line probing -see [59] for X-ray structure of aptamer, which is consistent with the secondary structure. (Center) Minimum free energy (MFE) structure, determined by RNAnn, our implementation of the nearest-neighbor energy model. This structure is identical to the MFE structures computed by Vienna RNA Package RNAfold [18], RNAstructure [38], and mfold [9]. (Right) Minimum free energy (MFE) structure, determined by RNAenn, our implementation of the extended nearest-neighbor energy model. The only difference with the nearest-neighbor MFE structure lies in two missing GU base pairs (116,134), (117,133) indicated by a circle. doi:10.1371/journal.pone.0085412.g004 Figure 5. Melting curves for two small nucleolar RNAs (snoRNA) from family RF00158 from Rfam version 9.0 [56]. For each RNA sequence, over a range of temperatures, temperature-dependent base pair probabilities were computed using four different software packages: RNAenn, RNAnn, version 1.8.5 of RNAfold [40] and RNAstructure [38]. The software RNAenn (RNA extended nearest-neighbor) is our implementation of the algorithms described in this paper, while the software RNAnn (RNA nearest-neighbor) is our implementation of the following algorithms: Zuker's minimum free energy structure algorithm [51], McCaskill's partition function algorithm [19], and the Ding-Lawrence sampling algorithm [22].
Data Base (NNDB) http://rna.urmc.rochester.edu/NNDB/ [15]. For readers interested in the exact nature of the NN energy parameters, we recommend the excellent overview by Zuker et al. [58].

Discussion
There are some deviations between the MFE structure computed for the ENN model, compared with the nearestneighbor (NN) model. In particular, Figure 4 shows the secondary structure for the XPT riboswitch from Bacillus subtilis, obtained by experimental in-line probing (left panel), minimum free energy structure computation for the NN model (middle panel) and minimum free energy structure computation for the ENN model (right panel). The MFE structure for the NN model was identical, using four different software packages: mfold [9], RNAfold [40], RNAstructure [38] and our own program RNAnn for the nearestneighbor model. Our software RNAenn for the ENN model differs from the NN minimum free energy structure, only by missing two GU-wobble base pairs at positions (116,134), (117,133). Adjacent wobble pairs are energetically weak, so we do not view this as a failure of our software, but rather the need for additional scrutiny of the ENN energy parameters. Specifically, in the future, we intend to include a dependence on the heat capacity DC P as proposed by M. Bon [17], and knowledge-based potentials [16,17]. By such energy re-parametrization, we expect to improve the sensitivity values reported in Table 1.
Our next-nearest-neighbor triplet energy model appears to lead to somewhat more cooperative folding than does the nearest neighbor energy model, as indicated by sharper sigmoidal transition in the melting curves obtained by RNAenn, compared to melting curves obtained by RNAfold and RNAstructure -see Figure 5. Here, melting curves were computed in the following manner. For each RNA sequence, over a range of temperatures, temperature-dependent base pair probabilities were computed. At each temperature T, for each algorithm, the expected number SBPT of base pairs was computed by SBPT~P 1ƒivjƒn p i,j . For Each algorithm was run without dangle or coaxial free energies. At each temperature T, for each algorithm, the expected number SBPT of base pairs was computed as SBPT~P 1ƒivjƒn p i,j ; for each algorithm, the collection of such (T,SBPT) points generates a melting profile obtained by that algorithm. (Left) Melting curves for the 72 nt small nucleolar RNA (snoRNA) from Ornithorhynchus anatinus (platypus) with GenBank accession code AAPN01359272.1/4977-5048 and sequence given by AGCACAAAUG AUGAGCCUAA AGGGACUUAA UACUGAAACC UGAUGUAACU AAAUAAUAUA UGCUGAUCGU GC (Right) Melting curves for the 69 nt small nucleolar RNA (snoRNA) from Otolemur garnetti (small-eared galago) with GenBank accession code AQR01179445.1/1047-1115 and sequence given by GGCACAAAUG AUGAAUGACA AGGGACUUAA UACUGAAACC UGAUGUUACA UUACAAUGUG CUGAUGUGC. doi:10.1371/journal.pone.0085412.g005 Figure 6. Feynman diagram to pictorially describe recursions described in this proposal for partition function with respect to extended nearest neighbor model. For simplicity, this diagram depicts ZBB, but not ZBBL,ZBBR, which correspond to a special treatment for particular left/right bulges of size 1, that are treated as stacked base pairs. doi:10.1371/journal.pone.0085412.g006 each algorithm, the collection of all points with (x,y) coordinates given by (T,SBPT) generates a melting profile.

Methods
The top level recursion in the computation of the partition function [resp. minimum free energy structure] is identical to that of McCaskill's algorithm [19] [resp. Zuker's algorithm [51]]. The technical difficulty lies in a kind of ''2-look-ahead'' strategy, to determine if a base pair (i,j) not only stacks onto the adjacent base pair (iz1,j{1), but the latter also stacks onto the base pair (iz2,j{2). This leads to technical issues, including a special treatment for bulges of size 1, since these are considered to stack on the following base pair.

Subsection 3.1: Partition function algorithm
This section presents the recursions to compute the partition function in the extended nearest-neighbor energy model. Figure 6 depicts the recursions as a Feynman diagram. (For simplicity, the Feynman diagram in Figure 6 depicts ZBB, but not ZBBL,ZBBR, which correspond to a special treatment for particular left/right bulges of size 1, that are treated similarly to stacked base pairs.) The unconstrained partition function Z i,j for a i , . . . ,a j is defined below. Note that the recursions for Z i,j (I) entail a maximum internal loop of size 30, which follows the Vienna convention to reduce run time of the algorithm to O(n 3 ); however, our implementation actually uses the more complicated treatment of Lyngsø et al. [52], which ensures a cubic run time while not arbitrarily bounding the maximum size of internal loops. A similar remark applies to the treatment of internal loops and bulges of size 1 in Section 3.2.
We now in turn describe the partition functions ZM1 i,j for a multiloop having a single component, ZM i,j for a multiloop having one or more components, ZB i,j where (i,j) pair together, and for ZBB i,j where (i,j) and (iz1,j{1) pair together.

RT
: ZB i,k else 8 > < > : i f iƒj and j{iƒh   Assume that a 1 , . . . ,a n is a given RNA sequence. Throughout this section, we let E i,j denote the minimum free energy of a i , . . . ,a j , which is computed and stored in arrays by a dynamic programming algorithm corresponding to the following recursions. Once E 1,n is computed, then the minimum free energy structure can be computed by tracebacks. The following recursions are obtained from those in the previous section, by systematically replacing sum by minimum, product by sum and Boltzmann factor by energy.

Conclusion
In this paper, we have introduced a new energy model ENN for RNA secondary structure prediction and implemented it in a tool called RNAenn along with new energy parameters for triplet stacking inferred using Brown's algorithm. RNAenn is implemented in C/C++, without any function calls or dependence on other programs, such as mfold [9], RNAfold [18], and RNAstructure [38]. Recursions from the partition function have been cross-checked by setting free energy parameters to zero, in which case the program returns the number of secondary structures, which can be determined by independent simpler methods.
It is known from experimental work of Silverman and Cech on Tetrahymena group I intron P4-P6 domain [53] that RNA folds cooperatively. The melting curves in Figure 5 demonstrate that our ENN model leads to somewhat more cooperative folding than does the nearest neighbor energy model, in the same manner that the melting curves of Figure 1 demonstrate that the nearest neighbor energy model leads to more cooperative folding than the simple Nussinov energy model. For this reason, we feel that RNAenn supports a mathematical model that better reflects the experimental data concerning cooperativity of RNA folding.
From the benchmarking comparison in Table 1, it is clear that triplet stacking free energy parameters need further refinement to produce better agreement with RNA secondary structures, as determined by comparative sequence alignment or X-ray structure. This situation is not unlike the situation with nearest neighbor software mfold, RNAfold, which over the years underwent a series of refinements, with the introduction of additional energy parameters (energy parameters for particular triloops, tetraloops, bulges of size one, etc.). At the present time, software such as Unafold, RNAfold, RNAstructure remain stateof-the-art for RNA secondary structure prediction. However, in future work, we plan to optimize the triplet stacking energy parameters, by using knowledge-base potential as in the work [16,17] for the nearest neighbor model.