Vfold: A Web Server for RNA Structure and Folding Thermodynamics Prediction

Background The ever increasing discovery of non-coding RNAs leads to unprecedented demand for the accurate modeling of RNA folding, including the predictions of two-dimensional (base pair) and three-dimensional all-atom structures and folding stabilities. Accurate modeling of RNA structure and stability has far-reaching impact on our understanding of RNA functions in human health and our ability to design RNA-based therapeutic strategies. Results The Vfold server offers a web interface to predict (a) RNA two-dimensional structure from the nucleotide sequence, (b) three-dimensional structure from the two-dimensional structure and the sequence, and (c) folding thermodynamics (heat capacity melting curve) from the sequence. To predict the two-dimensional structure (base pairs), the server generates an ensemble of structures, including loop structures with the different intra-loop mismatches, and evaluates the free energies using the experimental parameters for the base stacks and the loop entropy parameters given by a coarse-grained RNA folding model (the Vfold model) for the loops. To predict the three-dimensional structure, the server assembles the motif scaffolds using structure templates extracted from the known PDB structures and refines the structure using all-atom energy minimization. Conclusions The Vfold-based web server provides a user friendly tool for the prediction of RNA structure and stability. The web server and the source codes are freely accessible for public use at “http://rna.physics.missouri.edu”.


Introduction
The increasing discoveries of noncoding RNAs demand more than ever the information about RNA structures [1][2][3][4][5]. However, laborious, time-consuming X-ray crystallographic or NMR spectroscopic measurements alone cannot catch up the pace with the rapidly increasing number of biologically significant RNAs such as noncoding regulatory RNAs. As a result, RNA structural genomics cannot just rely on the experimental determination of the structures. This underscores the request for accurate computational models of RNA structure prediction. RNA structures can be described at the two-dimensional (2D) and three-dimensional (3D) levels, respectively. A 2D structure is defined by the base pairs contained in the structure. Helices and loops, as defined by the base pairs contained in the structure, can be diagrammatically depicted by an RNA 2D structure. The 2D structure of an RNA provides the structural constraints to the formation of the 3D structure [6][7][8][9], where helices and loops are assembled in the 3D space. RNA free energy landscape can have multiple free energy minima [10][11][12][13]. Therefore, an RNA can often adopt multiple stable and metastable structures.
Computational prediction for RNA 2D structures falls into two general categories [14]: sequence comparison [15][16][17][18] and free energy minimization [19][20][21][22][23][24][25]. Sequence comparison-based methods rely on base covariation and can usually only infer the information about the canonical base pairs. The inclusion of noncanonical base pairs can cause covariation analysis much more convoluted [26]. However, non-canonical base pairs such as mismatched base pairs in the loop regions, may be crucial for folding stability and 3D structure folding. For example, noncanonical base pairs can influence the loop and junction structures and thus play an critical role in determining helix orientations. The accuracy of computational prediction is usually better for methods that consider ''fold recognition'' [27]: structure is usually more conserved than sequence and the functional core regions are usually more conserved at all levels. Therefore, computational methods are highly useful and reliable for structures with known homologous folds or structures with sufficient auxiliary structural data. However, these methods depend on the availability of homologous sequences, which significantly limits their applicability.
Structure prediction algorithms based on free energy minimization search for the structure or suboptimal structures with the lowest free energy from an ensemble of possible structures. Most of the algorithms employ the same empirical thermodynamic parameters (the Turner parameters [28]) for the different secondary structural elements based on the nearest-neighbor model. However, unlike the entropy (free-energy) parameters for simple loops (hairpin, bulge, and internal loops), which have been determined from thermodynamic experiments [28], quantitative understanding of many other interactions remains very limited. Moreover, because of the possible conformational coupling between the loops, the loop entropies are not additive for tertiary motifs such as loop-loop kissing contacts [29,30]. For such cases, thermodynamic experiments alone are not sufficient to directly provide loop entropies and free energies due to the complexity of the problem.
Current RNA folding algorithms for 3D structures are generally limited to simple (short) structures. Further development of the models is hampered by several challenges including conformational sampling and evaluation of the energies for the tertiary contacts. Combined with discrete molecular dynamics (DMD) [31], coarse-grained approaches [31][32][33] can be used to predict structures as well as folding mechanisms with knowledge-based potentials derived from known structures. Structure assembly approaches [26,[34][35][36], based on the assumption that 3D fold can be recognized by the alignment of sequences and secondary structure patterns, have shown promising results in RNA 3D structure predictions. However, one of the common limitations to the structure assembly approaches is the degree of divergence of the fragment library [37,38].
The recently developed Vfold model is a statistical mechanicsbased RNA folding model [36,[39][40][41][42] that can predict RNA 2D and 3D structures as well as RNA folding thermodynamic stabilities from RNA sequence. In this report, we briefly describe the underlying algorithm and the practical usage of a web server for the Vfold model (http://rna.physics.missouri.edu). The server provides predictions for the structure and melting thermodynamics for user-provided RNA sequences. The results from the server, in combination with experimental data, may offer useful insights into RNA structure and function.

Methods
The Vfold model was first reported in 2005 for RNA secondary structure prediction [39]. Since then, the model has been extended to predict the structures and folding thermodynamics of H-type pseudoknots and RNA/RNA complexes [40][41][42]. Furthermore, Vfold was developed to predict 3D all-atom structures using a physics-based de novo method [36]. Below we describe several unique features of the Vfold model. The detailed underlining algorithms can be found in the published papers [36,[39][40][41][42] and in the Supporting Information (file Data S1) of this paper.

Features of the Vfold algorithm
One of the unique features of the Vfold model for 2D structure (base pairs) prediction is its ability to compute the RNA motifbased loop entropies. Using the virtual bonds to represent the backbone conformations, the model samples fluctuations of loops/ junction conformations in the 3D space through conformational enumeration [39] (see Figure S1 in Data S1 for details). By calculating the probability of loop formation, the model can give the conformational entropy parameters for the formation of the different types of loops such as pseudoknot loops. The model has the advantage of accounting for chain connectivity, exclude volume and the completeness of conformational ensemble. Studies by us and other groups show that an accurate entropy parameter improves the prediction of RNA secondary structures and thermodynamic stabilities [39][40][41][42][43].
Another notable feature of Vfold model is its ability to model intraloop mismatched base pairs for RNA loops (see Figure S2 in Data S1 for details). By enumerating all the possible (sequence-dependent) intra-loop mismatches, the Vfold model can partially account for the sequence-dependence of the loop free energy. Therefore, the Vfold-predicted loop free energy is not only loop size-dependent but also sequence-dependent. The model provides a unique tool for predicting many important information that cannot be obtained through traditional methods. For example, the model can calculate the dramatic decrease in loop entropy upon the formation of mismatched base pairs in a loop. The model can predict the populational distribution of the different loop conformations that contain the different intra-loop mismatches. The predicted mismatched base pairs provide constraints to otherwise flexible loop structures.
For a given 2D structure, the Vfold-based 3D structure prediction method [36] searches for the appropriate template for each loop/junction in the structure, and assembles the 3D template structures into a scaffold for further structure refinement. In comparison with other template-based (structure assembly) methods such as FARNA/FARFAR [34,35] and MC-Sym [26], which sample structures from small fragments of the known RNA structures, the Vfold-based method uses motif-based instead of fragment-based templates. The main advantage of the multi-scale approach used in the Vfold 3D modeling [36] is that the virtual bond tertiary structure as the initial state may already lie in the free energy basin, so the structure refinement can avoid large structural rearrangements for the effective prediction of the final native structure.

Energy parameters
The Vfold model provides pre-tabulated entropy parameters (available in the Vfold web server) for hairpin loops [39], internal/ bulge loops [39], H-type pseudoknots with/without inter-helix junction [40,41] and hairpin-hairpin kissing motifs [42]. For free energy-based RNA structure modeling, the predicted structures and thermodynamic stabilities could be sensitive to the choice of energy parameters. Therefore, the server provides predictions based on two different sets of the thermodynamic parameters for base stacks, including mismatched base stacks: (1) from the Turner parameters 04 version [28], and (2) from the MFOLD 2.3 version [20].

3D template library
To construct the template library, Vfold classifies all the known structures into different motifs, such as helices, hairpin loops, internal/bulge loops, pseudoknots, N-way junctions (N §3) (see Figure.1). The motif-based template library was built from 2621 PDB structures, including all the PDB entries released before January of 2014. It includes RNA-involved complexes except RNA/DNA hybrids. The redundant templates for those with root mean square deviation (RMSD) v1.5 A 0 for the same motif, same size and identical sequence are removed. The complete list of the non-redundant 3D template list can be found in the Vfold web server.

Results
The Vfold server contains three parts: (a) Vfold2D predicts the RNA 2D structure (pseudoknotted or non-pseudoknotted) from the sequence, (b) VfoldThermal predicts the melting curve  Vfold2D: Predicting RNA 2D structures from the sequence The input of Vfold2D is the sequence in plain text form (see the snapshot of Vfold2D web server in Fig. 2). The default temperature for Vfold2D is 37uC. Users have the option to change the temperature to other values. Users have the option to use the base stacking energy parameters either from Turner's parameters or from the MFOLD. Users also have the option to choose the type of structures: 1. Excluding pseudoknot: Only non-pseudoknotted secondary structures are included in the structure prediction; 2. Including pseudoknots with inter-helix junction length ƒ1 nt: All the possible non-pseudoknotted secondary structures and H-type pseudoknots with inter-helix junction of length ƒ1 nt are considered in the calculation. It may take a much longer computational time than the pseudoknot-free calculations.  The Vfold2D server generates three files: 1. Base pair probabilities (in txt format). 2. Probabilities for the formation of the possible helices (including the native and alternative helices) (in txt format). 3. Predicted 2D structures (in eps format) plotted by VARNA [44].
We recommend users to consider the possible alternative structures from the base pair probabilities and helix probabilities (the first two output files above). Fig. 2 shows an example of Vfold2D prediction for a 32-nt sequence [45]. With conformational sampling for the nonpseudoknotted structures, Vfold2D predicts the possible (including the alternative) helices from the base pair probabilities P ij based on the premise that base pairs (helices) in the same structure have the same level of probabilities of formation. The dominant 2D structure is identified from the base pairs of the largest probability. Fig. 2 shows an RNA that has two sets of helices. One set shown in magenta has the probability of 0.78. This is the most probable structure. Another set of helices in cyan with probability 0.22 gives an alternative structure. The predicted bistable structures agree with the NMR results [45].

VfoldThermal: predicting RNA melting curves
VfoldThermal predicts the heat capacity C(T) melting curves from the temperature-dependence of the partition function Q(T) for the conformational ensemble chosen by the user. The server provides the results in text format as well as in eps format plotted by Gnuplot. The input of VfoldThermal is the same as those for the Vfold2D, except for the temperature range in VfoldThermal (see the snapshot of VfoldThermal web server in Fig. 3).
For the example shown in Fig. 3, with the same input as for Vfold2D in Fig. 2, VfoldThermal calculates the partition function Q(T) for all the non-pseudoknotted structures for temperature range 0uC-100uC with the temperature step of 0.5uC. The predicted heat capacity (melting curve) shows two peaks around 60 and 90uC, respectively. The peaks correspond to the melting of the two helices in the predicted structures in Fig. 2, respectively.

Vfold3D: Predicting RNA 3D structure
The input data of Vfold3D are the RNA sequence and the 2D structure (base pairs) (see the snapshot of the Vfold3D web server in Fig. 4). The output of Vfold3D is a PDB file for the predicted all-atom 3D structure(s). Because the current version of Vfold3D is template-based, no 3D structure will be predicted if a proper template cannot be found.
Currently, due to the limited structural template database, the current version of Vfold3D can only predict the 3D structures with hairpin loops, internal/bulge loops, N-way (2,N,8) junctions and pseudoknots. For example, as listed in Figure.1, there is no templates available for the open motifs (single strand tails and tandem helices except for coaxially stacked helices). Therefore, it is recommended to remove the single strand tails before submitting jobs to Vfold3D. With the increasing number of the known RNA structures, the larger and more divergent pools of the known loop/ junction structures with the different types and different sizes would lead to better predictions from the Vfold3D.  For the RNA in Fig. 2, Vfold2D predicts two alternative 2D structures. As shown in Fig. 4, for the most probable 2D structure, Vfold3D predicts one 3D structure. For the alternative 2D structure, which consists of two hairpins connected by a singlestrand loop, Vfold3D yields no 3D structure because of the lack of the templates for the UUCG single-stranded open junction between the two hairpins.

Vfold output
Once a calculation is submitted, a notification page containing the job information (job name, e-mail address (optional) and the job status) is displayed. When the calculation is completed, the Vfold web server sends out an e-mail (if provided) notification with the predicted results attached. It is recommended to bookmark the job-specific notification page for later check of the job status and for downloading Vfold predicted results, since Vfold2D and VfoldThermal might take a long computational time (hours or even longer) depending on the sequence length. An online README file about the interpretation of the Vfold predictions is available on the Vfold web server.

Conclusion
The Vfold package is developed to predict RNA structures and folding thermodynamics. The web server will be updated continuously with the development of new Vfold-based algorithms for RNA folding. In the future development, we plan to add structure predictions for the formation of RNA-RNA complexes. We will also add the effect of the ion-dependent electrostatic free energies and the heat capacity effect, which can cause the temperature-dependence of the enthalpy and entropy parameters for the loop and base stack formations, to the melting curve calculations and structure predictions.

Supporting Information
Data S1 (PDF)