Secondary Structures of rRNAs from All Three Domains of Life

Accurate secondary structures are important for understanding ribosomes, which are extremely large and highly complex. Using 3D structures of ribosomes as input, we have revised and corrected traditional secondary (2°) structures of rRNAs. We identify helices by specific geometric and molecular interaction criteria, not by co-variation. The structural approach allows us to incorporate non-canonical base pairs on parity with Watson-Crick base pairs. The resulting rRNA 2° structures are up-to-date and consistent with three-dimensional structures, and are information-rich. These 2° structures are relatively simple to understand and are amenable to reproduction and modification by end-users. The 2° structures made available here broadly sample the phylogenetic tree and are mapped with a variety of data related to molecular interactions and geometry, phylogeny and evolution. We have generated 2° structures for both large subunit (LSU) 23S/28S and small subunit (SSU) 16S/18S rRNAs of Escherichia coli, Thermus thermophilus, Haloarcula marismortui (LSU rRNA only), Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. We provide high-resolution editable versions of the 2° structures in several file formats. For the SSU rRNA, the 2° structures use an intuitive representation of the central pseudoknot where base triples are presented as pairs of base pairs. Both LSU and SSU secondary maps are available (http://apollo.chemistry.gatech.edu/RibosomeGallery). Mapping of data onto 2° structures was performed on the RiboVision server (http://apollo.chemistry.gatech.edu/RiboVision).


Introduction
RNA secondary (2u) structures, with symbolic representations of base pairs, double-helices, loops, bulges, and single-strands, provide frameworks for understanding three-dimensional (3D) structure, folding and function of RNA, and for organizing, distilling, and illustrating a wide variety of information. Accurate and accessible 2u structures are particularly important for understanding ribosomes, which are extremely large and highly complex three-dimensional objects.
Co-variation approaches, using a rich sequence database as primary input, are powerful and widely-applicable for determining rRNA 2u structures in the absence of 3D information. Covariation methods produce very few false-positive base pairs [1]. However, 2u structures determined by co-variation have inherent limitations. Co-variation does not reliably reveal non-canonical base pairs, especially purine-purine base pairs. For example, Helix 26a of LSU rRNAs was not detected by co-variation methods and was not included in traditional 2u structures [1,2]. The rRNA comprising Helix 26a is represented by an extended single-strand in co-variation 2u structures. The omission of Helix 26a is significant because it is universally-conserved and thermodynam-ically stable [3,4], and is a core component that helps define domain architecture [5].
Here we focus on accurate re-determination of 2u structures, primarily of SSU rRNAs. We modify the traditional E. coli SSU 2u structure to incorporate non-canonical base pairs. In addition, we include all base pairing interactions of the central pseudoknot. And finally, for several eukaryotic species, we provide complete 2u structures of both subunits, including expansion segments. Covariation approaches are especially problematic for highly idiosyncratic RNA sequence regions such as expansion segments, because appropriate sets of alignable sequences may not be available or readily identifiable.
We have constructed 2u structures that minimize artificial fragmentation of rRNA. For historical reasons, 2u structures, especially those of larger rRNAs, are represented as fragments placed around the conserved core. Optimal 2u structures should as far as possible portray the true continuity of an rRNA strand. In practice, representation of rRNA as continuous strands can require re-organizing the traditional scheme of the common core and may not be desirable in all instances. The major differences between the co-variation and 3D based 2u structures are highlighted in Figure S1.
The small but growing number of ribosomal 3D structures allows 2u structure determination by geometric analysis. Information from 3D structures can be used to determine accurate 2u structures, including non-canonical base-pairs and expansion segments. Thus, we have used geometric analysis of 3D structures of ribosomes to re-determine rRNA 2u structures. The resulting 3D based 2u structures, unlike co-variation 2u structures, contain all base pairs and helices observed in 3D structures.
We make available a series of 2u structures that broadly sample the phylogenetic tree, are up-to-date, and as far as possible, accurately represent strand continuity. We have incorporated noncanonical base pairs. We have mapped the 2u structures with a variety of data related to molecular interactions and geometry, phylogeny and evolution. We have partitioned the rRNA into helices and domains. These information-rich 2u structures are amenable to reproduction and modification by end-users. We provide high-resolution editable versions of the 2u structures in several file formats. The images are legible when printed on a single sheet of standard sized paper. Both LSU and SSU secondary maps are available (http://apollo.chemistry.gatech. edu/RibosomeGallery). Mapping of data onto 2u structures was performed on the RiboVision server (http://apollo.chemistry. gatech.edu/RiboVision) [10].

Methods
Atomic coordinates were obtained from the PDB. Base-pairing and base-stacking interactions were obtained from the library of RNA interactions (FR3D) [11] and confirmed by inspection and in-house code. The co-variation E. coli secondary structures of LSU and SSU rRNAs were downloaded from http://rna.ucsc. edu/rnacenter/ribosome_images.html, adjusted and extended with the program XRNA (http://rna.ucsc.edu/rnacenter/xrna/ xrna.html), finalized with Adobe Illustrator, and written out as svg and png files. Secondary structures of all other species presented here were built from the E. coli template. We use historical representations as far as possible, except where conflicts arise with correct helical assignments or strand continuity.

Results and Discussion
rRNA 2u structures can be determined by a variety of methods including co-variation [7,15,16], thermodynamic predictions [17] and by geometric analysis of molecular interactions within 3D structures [5]. We have re-derived a series of rRNA 2u structures from 3D structures, with the goal of improving clarity, accuracy, and utility. The primary disadvantage of the structural approach remains the small number of ribosomes with well-determined 3D structures. However, the number of ribosomes with available 3D structures is ever increasing [6,8,18]. The current numbers of available 3D structures make the geometric method a viable method for systematic determination of rRNA 2u structures.
Helices are the defining elements of RNA 2u structure [19,20]. We identify helices by specific geometric and molecular interaction criteria [5]. In folded RNAs, a base is in one of two discrete states: paired or non-paired [21,22]. A paired base is involved in 2u interactions, tertiary interactions, or both. Following Levitt [23], we define helices as base-paired nucleotides bounded by nonpaired nucleotides. With 3D information, one can incorporate stacking information, and so we define helices as base pairs in the form of a continuous base-paired stack that is faithful to strand connectivity. A helix can contain bulges or other defects as long as they do not break the helical stack. Secondary interactions are base pairing interactions within helical regions, while tertiary interactions are pairing interactions other than those within helical and (j,p) (yellow or green) where i,j,p,q. The blue helix is non-nested within the other helices, with base pairs (i9,q9) (red) and (j9,p9) (blue) where i9,j9,q9,p9. The red, green and yellow helices are commonly considered to be 2u structural helices. The blue helix is non-nested and is considered to be a tertiary helix. doi:10.1371/journal.pone.0088222.g002 regions. Each nucleotide belongs uniquely to no more than one helix. Non-canonical base pairs are not differentiated from canonical base pairs. Non-canonical base pairs that are internal to or that extend secondary helices are defined as secondary interactions.
The basic helical definition of secondary structure [19] has been extended to differentiate helices that are nested from those that are non-nested [24][25][26], as illustrated in Figure 2. A structure is nested if it contains pairs (i,q) and (j,p) where i,j,p,q are locations in the primary structure. Helices between expansion elements observed in some eukaryotes (as in the 18S rRNAs of S. cerevisiae, D. melanogaster, and H. sapiens) are among the longest non-nested helices. Non-nested helices (kissing loops and pseudoknots) are commonly categorized as tertiary interactions [27,28].
In our structure-based 2u structures, we followed the nest/nonnest definition of secondary and tertiary helices. Our approach extends and clarifies the definition of rRNA 2u structure to explicitly include all pairing interactions that confer thermodynamic stability to the folded RNA. The structural approach allows us to incorporate non-canonical base pairs on parity with Watson-Crick base pairs rather than by post hoc adjustment or symbolic notation.
For the central pseudoknot of the 16S rRNA [29], we treat helix 2 as a secondary element, even though it is non-nested, following the original Woese representation [15]. The central pseudoknot is conserved over all phylogeny [30] and is a key feature of the SSU that links all four domains. Central pseudoknot assembly appears to be a crucial, irreversible step of SSU maturation [31]. The covariation 2u structure of the central pseudoknot is incomplete. We modified the traditional 2u structure of the central pseudoknot to include all base-paring interactions revealed by 3D structures. The central pseudoknot contains conserved triplets of bases U12-G22-A912 and U13-U20-A914. In our revised 2u structure, these base triples are presented as pairs of base pairs (Figure 3). The advantage of this representation is that one can easily infer that it is a pseudoknot and can directly discern all the pairing interactions of the pseudoknot. The representation used here was formulated by Brakier-Gingras and coworkers [32] and by Gregory and Dahlberg [33] using information from 3D crystal structures. Westhof and Lescoute correctly represent the central pseudoknot in their information-rich wiring diagrams [34]. Gutell recently revised the historical 2u structure of the 16S rRNA to adjust the central pseudoknot and incorporate many of the non-canonical base pairs [35]. Unlike other pseduoknots in the rRNA, this representation can be integrated into the historical 2u scheme without major rearrangement. The 3D based 2u structure of the 16S rRNA of E. coli with all canonical secondary and tertiary Watson-Crick interactions is shown in Figure S2.

Conclusion
We have generated structure-based 2u structures for 23S/28S and 16S/18S rRNAs of E. coli, T. thermophilus, S. cerevisiae, H. marismortui (LSU only), D. melanogaster, and H. sapiens. We have mapped the 2u structures with a variety of data related to helices, domains, molecular interactions, phylogeny, and evolution. We provide high-resolution editable versions of all of these 2u structures (http://apollo.chemistry.gatech.edu/RibosomeGallery). Figure S1 Schematic 2u structures, based on 3D structures, of rRNAs of a) S. cerevisiae LSU, and b) S. cerevisiae SSU. Major differences between these 2u structures and co-variation based 2u structures are highlighted in red: i) Helix 26a is shown as a helix instead of a single stranded loop; ii) the central pseudoknot is corrected to include all non-canonical base pairs; iii) rRNA is represented as far as possible as continuous strands; and iv) the secondary structure of all eukaryotic expansion segments is shown explicitly. The domain colors in the LSU are, Domain 0, orange; I, purple; II, blue; III, magenta; IV, yellow; V, pink; VI, green, 5.8S, brown, 5S, light green. The domain colors in the SSU are, 59, blue; C, brown; 39M, pink; and 39m green. (TIF) Figure S2 The 2u structure of the 16S rRNA of E. coli. Nucleotides connected by lines in the 2u structure here are canonical Watson-Crick base-pairs in the 3D structure of the ribosome. The domain colors in the SSU are, 59, blue; C, brown; 39M, pink; and 39m green. (TIF)