Helix Capping in RNA Structure

Helices are an essential element in defining the three-dimensional architecture of structured RNAs. While internal basepairs in a canonical helix stack on both sides, the ends of the helix stack on only one side and are exposed to the loop side, thus susceptible to fraying unless they are protected. While coaxial stacking has long been known to stabilize helix ends by directly stacking two canonical helices coaxially, based on analysis of helix-loop junctions in RNA crystal structures, herein we describe helix capping, topological stacking of a helix end with a basepair or an unpaired nucleotide from the loop side, which in turn protects helix ends. Beyond the topological protection of helix ends against fraying, helix capping should confer greater stability onto the resulting composite helices. Our analysis also reveals that this general motif is associated with the formation of tertiary structure interactions. Greater knowledge about the dynamics at the helix-junctions in the secondary structure should enhance the prediction of RNA secondary structure with a richer set of energetic rules and help better understand the folding of a secondary structure into its three-dimensional structure. These together suggest that helix capping likely play a fundamental role in driving RNA folding.


Introduction
RNA is an active participant in the chemistry of life. While mRNAs code for proteins, other RNAs including structured RNAs are responsible for many essential cellular processes, ranging from protein synthesis to gene expression and regulation [1,2,3,4]. Structured RNAs fold hierarchically from their sequence into their native, three-dimensional tertiary structure [5,6,7,8]. While the computational determination of RNA tertiary structure is still beyond our reach, bioinformatic comparative sequence analysis accurately predicted the secondary structures of various structured RNAs [9], composed of a large number of very short canonical helices and loops that are rearranged into its native tertiary structure, mostly with the help of metal ions such as Mg 2+ and Na + [10]. While RNA folding has been explored from different perspectives, the helix-loop junctions in the secondary structure can potentially have a significant influence on the prediction of higher-order RNA structure and long-range tertiary interactions since the energetics of helices can potentially be improved with knowledge about the junctions.
Together with basepairing interactions, base-stacking contributes significantly to the stability of DNA and RNA helices [11,12,13,14,15,16,17,18,19]. While internal basepairs are stacked on both sides, the ends of RNA secondary structure helices are stacked on their internal side and exposed to the loop side, potentially susceptible to fraying in that their imino protons exchange with solvent [20,21,22]. Thus, short RNA helices can potentially unfold as fraying can propagate from the ends of helices towards the interior. How do short canonical helices prevent their unfolding prior to their assembly into its three-dimensional structure? The ends of short canonical helices in structured RNAs, however, are frequently flanked by tetraloops [23], lonepair triloops [24], G:A and A:A basepairs [25], or other canonical helices [26]. Consistently, previous melting studies have shown that canonical RNA helices are greatly stabilized in the presence of tetraloops or various mismatches at their ends [13,17,27,28,29,30,31,32]. In particular, UUCG and GAAA tetraloops are known to nucleate the formation of unusually stable hairpin structures and serve as a reverse transcription termination signal of bacteriophage T4 mRNA or as a rho-independent transcription terminator of prokaryotic mRNAs [27,33]. This example suggest that other recurrent structural elements or motifs can protect and stabilize the ends of short canonical RNA helices against fraying, reminiscent of a-helix capping in protein [34,35,36].
Nonetheless no systematic analysis of the helix-loop junctions in large naturally occurring structured RNAs has been documented to address the protection of helix ends from fraying. Based on our detailed and comprehensive analysis of the helix-loop junctions in the high-resolution Thermus thermophilus 16S rRNA (T16S) and Haloarcula marismortui 23S rRNA (H23S) crystal structures [37,38], herein we explore helix capping motifs, single basepairs or unpaired nucleotides capable of protecting the ends of canonical RNA helices (see Materials and Methods for definition).

Materials and Methods
A canonical RNA helix is defined as an antiparallel A-form RNA duplex with at least two consecutive basepairs, each forming a canonical (standard Watson-Crick or wobble) conformation regardless of its basepair group [39]. The RNA helices in the crystal structures were visually examined how helix ends are potentially protected from the loop side. While coaxial positioning of two canonical helices is called coaxial stacking. topological stacking of a helix end with a capping motif -a basepair or an unpaired nucleotide from the loop -is termed helix capping if the vertical distance from the helix end to the capping motif is similar to the one (,3.0 Å ) between two consecutive internal basepairs in a canonical helix. Various RNAs, including the 16S and 23S rRNAs from the Thermus thermophiles 30S (T16S) and Haloarcula marismortui 50S (H23S) crystal structures [37,38].

Results
Identification of short canonical RNA helices and their topological end-stacking While, a priori, we expect longer helices to be enthalpically more stable than shorter helices, our analysis revealed that the vast majority of the 265 canonical RNA helices identified in T16S and H23S are very short, with the median length of 4 bp, compared to a complete helical turn of 11-12 bp for the A-form RNA ( Figure 1A). Analysis of the helix-loop junctions of these canonical helices surprisingly revealed that all but 13 (97%) of the 515 resolved helix ends are topologically involved in endstacking from the loop side ( Figure 1B). Specifically, while 166 ends are involved exclusively in coaxial stacking of two canonical  helices to form a compound helix, 336 are capped, 276 with basepairs and 60 with unpaired nucleotides, forming a composite helix or bridging two canonical helices stack coaxially. Besides, these identified helix capping motifs are frequently involved in longrange tertiary contacts ( Figure 1C). Additional analysis demonstrated that nearly all helix ends in other classes of structured RNAs are involved in end-stacking ( Table S1). Provided that helix ends fray [20,21,22], such preponderance of end-stacking in structured RNAs reflects its significance not merely in protecting short canonical RNA helices against fraying as indicated by earlier studies [12,13,14,15,16,17,18,19,27,28,29,30,31,32].

Topological classification of helix capping motifs
A detailed analysis of the 336 basepairs and unpaired nucleotides that cap helix ends, or helix capping motifs (Figure 2), revealed that helix capping occurs contiguously (Ccapping with basepairs and C9-capping with unpaired nucleotides) or discontiguously (D-capping with basepairs and D9-capping with unpaired nucleotides), depending on the absence or presence of the intervening sequence of nucleotides (IVS) between a canonical helix end and its helix capping motif, respectively ( Figure 1B). The arrangement of the IVS is 59 or/and 39 to a canonical helix further distinguishes D-capping into D1, D2, and D3. The IVS can be either short (1-3 nt) or long (,25 to 1000 nt); if short, the bases of the IVS are usually flipped out of a composite helix, making tertiary contacts implicated in RNA folding (see below).
Overall, C-capping occurs more frequently than D-capping (163 vs. 113), and C9-capping occurs .2-fold more frequently than D9-capping (43 vs. 17) ( Figure 1C). Interestingly, while any of the ten basepair groups [39] can serve as a capping basepair motif, all but 21 adopt non-canonical conformations with varying C19-C19 distances (dCC's); the exceptional 21 form the canonical conformations ( Table 1). In addition, any of the four nucleotides (A, C, G, and U) can be an unpaired capping nucleotide ( Table 2). Nonetheless, helix capping motifs are largely biased for a few basepair groups or unpaired nucleotides, depending on the types of helix capping.

Helix capping versus helix stability
While both nucleotides in all helix capping basepairs, except for 13 C-caps, stack well on top of a helix end by predominantly forming a non-canonical conformation, all the helix capping unpaired nucleotides stack right on top of the hydrogen-bonding interface of a helix end (Figure 3). A detailed basepair stacking analysis in canonical RNA helices revealed that one base of a basepair stacks up on top of its immediately 59 flanking basepair while the other base only marginally stacks on the 59 flanking basepair. This indicates that helix capping motifs overall stack better on a helix end than an internal basepair does in a canonical helix. The exceptional 13 C-caps (9 G:A's, 3 A:A's, and 1 C:A), all in the reversed sheared conformation [39], overall assume a hairpin-like loop of a single nucleotide, similar to that observed with helix capping unpaired nucleotide motifs; with the 39-nt stacked directly on top of the hydrogen-bonding interface of a helix end, the 59-nt gets displaced into the minor groove ( Figure 3A, upper right). These together strongly suggest that helix capping motifs stabilize short canonical helices by restricting the fraying entropy at helix ends.
Of the 336 helix capping motifs, a total of 252 (or 75%) are part of either a larger RNA structural motif that has been previously described [23,24,40,41,42,43,44,45] or their mimics, some mediating coaxial stacking between two flanking canonical helices ( Figures 1B and 3B). Given that canonical RNA helices are dramatically stabilized by the presence of UNCG and GNRA tetraloops [27,28,29,30], these additional associations of a helix capping motif are likely to provide additional stabilization to a composite helix that is already stabilized by the helix capping motif itself.

Tertiary contacts formed around helix capping motifs and their role in RNA folding
Helix capping basepair motifs and their associated IVS frequently participate in tertiary contacts, contributing to the folding of the RNA secondary structure into its three-dimensional structure. Overall, while less than a half (75; 46%) of the 163 Ccaps form tertiary contacts, the vast majority (95; 84%) of the 113 D-caps and their associated IVS participate in tertiary interactions ( Figure 1C). In particular, 30 of the 95 D-caps involved in tertiary contacts are by themselves long-range tertiary basepairs, each bringing two remote regions on the secondary structure into contact, having initiated the transition from the secondary to the tertiary structure. Surprisingly, the tertiary contacts formed by helix capping basepair motifs occur far more frequently through the 59-nt than through the 39-nt (95 vs. 22) ( Figure 4A). A further analysis revealed that the). 59-nt A in G:A C-caps is the primary site for long-range tertiary contacts in all but one GNRA tetraloops found in T16S and H23S; with the 39-nt G in the G:A C-caps stacked right on top of the basepairing interface of a helix end, the 59-nt A is slightly displaced toward the minor groove and forms a single hydrogen-bond from its N7 to the G NH 2 , leaving its N1 and N3 available for tertiary contacts ( Figure 3B, top right). More surprisingly, the tertiary contacts made by the IVS associated with D-caps occur almost exclusively through the 59-IVS ( Figure 4B). Furthermore, when two unpaired nucleotides are simultaneously available immediately 39 and 59 to a helix end, C9-capping is favored 7-fold with the one 39 to the helix end over the one 59 to the helix end (27 vs. 4) ( Table 1 and Figure 4C), consistent with previous melting studies demonstrating that a 39dangling nucleotide stabilize a canonical helix far more than a 59dangling nucleotide does [12,13,14,15,16,19]. Altogether, these suggest that, while stabilizing helix ends against fraying, the 59-nt of helix capping basepair motifs and its associated IVS be rather intrinsically entropic, making many long-range tertiary contacts largely responsible for hierarchically driving RNA folding. Dependence of helix capping on basepair polarities at helix ends An analysis of the 336 capped helix ends revealed that, while helix capping favors the 39-end only marginally over the 59-end, the overall frequency order for the capped helix ends is C:G.G:C.U:G.U:A.A:U.G:U, with two-thirds accounted for by the most frequent C:G and G:C ends (Table S2). This strongly suggested a correlation between helix-ending basepair identity and helix capping frequency, prompting us to further elucidate the dependence of helix capping frequency on the basepair polarities of the two terminal basepairs at helix ends in T16S and H23S.
This additional analysis revealed that, while overall helix ends favor Y:R (297; 58%) over R:Y (171; 33%), the Y:R ends are more than twice more likely to be capped than the R:Y ends (208 vs. 93) ( Table 3). With the basepair polarities of the last two terminal basepairs combined, the helix ends with the Y:R|Y:R polarity are capped most frequently (75%), those with the Y:R|R:Y polarity least frequently (42%) and the remaining two in between (66%), strikingly consistent with the NMR melting temperatures of selfcomplementary tetramers, 59-GGCC-39 (54.0uC).59-GCGC-39 (49.9uC).59-CCGG-39 (47.8uC).59-CGCG-39 (36.9uC) [46]. This reflects that helix capping strongly favors energetically more stable but short canonical helices, stabilizing the growing number of short canonical RNA helices being formed early in RNA folding.

Conformational diversity of helix capping basepair motifs
While helix capping basepair motifs can be any of the 10 basepair groups in different conformations [39], they are strongly biased for a few basepair groups and conformations, depending on the types of helix capping ( Table 1). C-caps are most frequently G:A, followed by C:A, adopting predominantly the sheared conformation. D1-caps are biased toward U:A, C:G, and G:A, forming dominantly the reversed Hoogsteen, Watson-Crick, and sheared conformations, respectively. Both D2-and D3-caps are most commonly C:G and U:A, forming frequently the Watson-Crick conformation. In particular, the majority of the noncanonical conformations adopted by helix capping basepair motifs has a significantly shorter dCC compared to 10.6 Å in a canonical basepair in the A-form RNA, topologically effectively protecting helix ends against fraying. An additional analysis revealed that 84 (or 30%) of the 276 capping basepairs are involved in RNAprotein interactions (unpublished data). Nonetheless, only 10 of them could change their conformation in the presence of protein, suggesting that the conformational diversity of capping basepairs will not be biased by the presence of protein.
A few of the helix capping basepairs including C:A and U:A form several different conformations, albeit with identical or very similar sequence and structural contexts, demonstrating that they are susceptible to structural perturbation from the entropic loop side and may undergo dynamic conformational changes as RNA folds into its native tertiary structure. An analysis of the archaeal H. marismortui and bacterial E. coli 23S rRNA crystal structures [38,47] revealed five homologous helix capping basepairs whose conformations are completely different in the two crystal structures ( Table 4). In particular, the two, including H23S-0873:0876 and H23S-1164:1192, share exactly the same sequence and structural context between the two phylogenetically distant organisms, strongly supporting the idea of dynamic conformational changes but without affecting the overall RNA structure and function.

Discussion
Our ability to predict RNA secondary and tertiary structure is mostly dependent on our detailed understanding of many different  structural motifs and the organizing principle explaining how they are assembled to form the complex, but highly ordered threedimensional tertiary structure. Given that the vast majority (96%) of helix ends in structured RNAs are either capped (65%) or coaxially stacked (31%) from the loop side (Table S1), both helix capping and coaxial stacking play roles in defining RNA structure and driving RNA folding. In particular, helix capping not only locks and stabilizes the fraying ends of many short canonical helices formed early in RNA folding, but facilitate the formation of many long-range tertiary contacts that are, in cooperation with coaxial stacking, essential for defining the complex threedimensional architecture of structured RNAs. Besides, helix capping in RNA favors intrinsically more stable helix ends, working cooperatively with the sequence polarity of the last two terminal basepairs to drive helix formation during RNA folding. Thus, the derivation of the stabilizing energies of all the identified helix capping motifs and their subsequent application to the development of an RNA folding algorithm would greatly enhance our capability of predicting RNA secondary and tertiary structure.
Such data for mismatches (C-caps) and dangling nucleotides (C9-caps) have been derived calorimetrically [14,15,16,17,18,19] and employed in the energy-based mfold RNA folding program [48]. Nonetheless, not all calorimetric data for the identified helix capping motifs are currently available, especially for those implicated in folding the secondary into the tertiary structure. Due to the complexity of experimental design, however, it is presently experimentally challenging to obtain the stabilizing energies for the D-and D9-caps. An alternative is to compute their evolutionary frequency enriched in a set of homologous RNA sequences from a wide range of different organisms, followed by employing them as a proxy for their experimental energy. In addition, the determination and implementation of polaritydependent nearest-neighbor energies for the last two terminal basepairs at helix ends could further improve the accuracy of RNA structure prediction from sequence.