Figure 1.
(A) A schematic representation of the tandem units in the ribosomal DNA locus with some units containing an R2 insertion. The rRNA transcription unit with external transcribed spacer (ETS), 18S, 5.8S and 28S genes (blue boxes), internal transcribed spacers (ITS, gray boxes), and R2 element (white box) is diagrammed. The single open reading frame (ORF) of R2 is indicated. (B) 28S gene sequence flanking the R2 insertion site with the conserved site of the initial nick indicated as position +1. The 5' junction sequences of the R2 elements obtained from seven arthropods are shown below the 28S sequence. Nucleotides to the left of the vertical line are upstream 28S gene sequences (uppercase). Nucleotides to the right of the vertical line are either R2 (lowercase), have identity to the 28S gene (uppercase, differences underlined), or non-templated additions (italics). One horseshoe crab junction contained an additional 106 nt that are not shown. Overlined sequences correspond to duplications of the 28S gene from upstream (solid) or downstream (dashed) of the target site.
Figure 2.
RNA comprising the 28S/R2 5' junction from many arthropods can function as a self-cleaving ribozyme.
(A) Diagram of a generic 28S gene (blue box)/ R2 5' end (white box) junction. Arrows labeled a through y represent the in vitro generated RNAs tested for self-cleavage from the eight species listed on the left. Ds, D. simulans. RNAs (except y) begin at position -95 relative to the R2 insertion site and end at the noted nucleotide within the R2 element. RNA k has a 'G' to 'A' change in the active site of the ribozyme; RNA y has a generated 'G' to 'C' change to improve the P1 stem (see text). The short vertical lines demarcate the extent of the predicted ribozymes. To the far right of each arrow is the average fraction of RNA cleaved in at least two independent co-transcription/cleavage assays. (B) A 5% denaturing acrylamide gel showing the cleavage products for select RNAs in the co-transcription/self-cleavage assays. A longer exposure of the lower portion of the gel is shown to better visualize the short upstream cleavage product. The uncleaved RNA (solid red circles) and cleavage products (open red circles) are indicated. Unmarked bands are alternative, stable RNA structures. Lanes are labeled with the corresponding letter from panel (A). Lane M, RNA length markers with sizes indicated. (C) An 8% denaturing acrylamide gel showing the upstream cleavage products at single nucleotide resolution. RNAs are the same as in (B) except RNA from the silkmoth was from S. cynthia. DNA ladder corresponds to combined G, A, T sequencing reactions with nucleotide position from the -40 primer in the m13 vector shown to the right.
Figure 3.
28S/R2 RNA sequences from divergent arthropods fold into structures similar to the ribozyme from Drosophila.
(A) Diagram of the R2 ribozyme from D. simulans; P, base-paired region, L, loop at end of a P region; J, nucleotides joining base-paired regions [11]. (B) through (H) Predicted secondary structures of the 28S/R2 junctions from seven arthropods. Only the number of R2 nucleotides in the J1/2 loop and the number of nucleotides in the L4 loop if it is large are shown. The presence of 28S gene sequences within the ribozyme is indicated with blue shading. Arrows indicate the observed or predicted R2 self-cleavage sites relative to the 3' R2 insertion site. A 'G' to 'A' nucleotide substitution at the boxed position in the N. vitripennis structure was observed to affect self-cleavage (Figure 2). Partial structures for earwig and horseshoe crab and an alternative P1 for silkmoth were previously predicted [12].
Figure 4.
Comparison of the R2 ribozyme from 5 silkmoth species.
(A) Secondary structure of the putative ribozyme encoded by the R2 element of S. cynthia. Blue shading indicates nucleotides from the 28S gene. Indicated next to the S. cynthia sequence are differences in the R2 sequences from 4 silkmoth species (nucleotides in orange boxes). The entire sequence of the largely unconserved L4- J4/2 region is presented for each species. Two species contained nucleotide differences in the otherwise conserved L3 (residue 1 is a 'U' in S. pyri, residue 2 is an 'A' in C. hercules). The J1/2 loop length varied from 216 nt in S. pyri to 498 nt in C. hercules mori, B. mori; herc, C. hercules; prom, C. promethea; pyri, S. pyri. (B) Diagrams of the 28S/R2 RNAs from S. cynthia tested for self-cleavage as described in Figure 2. The vertical dashed line indicates the predicted upstream cleavage site at -28. Negative numbers indicate position in the 28S gene relative to the R2 insertion site; positive numbers indicate position in the S. cynthia R2. The x in RNA 7 represents a 'U' substitution for the putative catalytic 'C'. Templates 8-11 have varying amounts of the J1/2 loop removed. (C) 5% denaturing acrylamide gels showing the products from the co-transcription/self-cleavage assays. Lane numbers correspond to the RNAs in panel (B). The uncleaved RNA (solid red circle) and self-cleavage products (open red circles) are indicated. Lane M, RNA length markers with sizes indicated. For RNAs 8-12, a longer exposure of the lower portion of the gel is shown to better visualize the 21 nt upstream cleavage products. The fraction of the synthesized RNA undergoing self-cleavage (fc) is shown at the bottom of each gel.
Figure 5.
Variation in the 28S/R2 5' junction reflects cleavage site position.
R2 sequences, 28S sequences, and non-templated sequences are as described in Figure 1. Single nucleotides in the flour beetle and tick 28S genes that differ from the consensus arthropod 28S gene sequence are underlined. All sequences are derived from the trace reads of genomic sequencing projects. The number of trace reads containing the same junction sequence is indicated to the far right. Nucleotides that could form the 5' end of the predicted P1 stem of the ribozyme (Supporting Information, Figure S2) are boxed in blue if from the 28S gene or in gray if from R2. Two regions can potentially anneal because of 28S duplications associated with some junctions. Sequences not presented in the figure: 1AGTAACTATGACTCTCTTTGAGTAACTATGACTCTCTTTGAGTAACTATGACTCTCTTT; 2TGAACTCTCTATGGTGGTCGCCTTCTCGTATG; 3GTAACTATGACTCTCTCTTT; 4cyclin A-like, putative template jump [18]; 5GGGAGTAACTATGACTCTCTT; 6TGACTCTCTTATTTATGACTCTCCTATGACTCTCTTATT; 7GAGACCAACTTA; 8GGCGGGAGTAACTCTGACTCTCTTTT; 9CTATGACTAACTATGACTCTCTTT; 10GCTAACTCTGACTCTCTTAGCTGACTCTCTTTT; 11TGACTCTCGGCGGGAGTAACTATGACTCTCTT; 12CTACTGTATGACTCTCTTACTACTGTATGACT; 13TCTTGTCTCTTGTCTCTTG.
Figure 6.
R2 element phylogeny and its correlation with cleavage location and ribozyme sequence.
(A) Best tree based on the neighbor-joining method and rooted using the retrotransposons Baggins1 and L1Tc. Most elements are listed in Figure 2 and Supporting Information, Figure S2. Additional fruit fly elements include: D. willistoni (wil.), D. ananassae (ana.), D. yakuba (yak.), and D. pseudoobscura (pse.). Bootstrap values above 70% are shown. Numbers left of the elements are predicted or demonstrated (boxed elements) self-cleavage locations relative to the R2 insertion site. The 5' junction cleavage site agrees with previous predictions for sea squirt A, tick, and R4; however, the tadpole 5' end is shifted 1 nucleotide and 28S cleavage sites are suggested for zebra finch, gnat, and both termites [12]. The 5' end of the sea squirt D element could not be folded and may be incomplete. To the right of each element are sequence differences in the ribozyme active site relative to the R2 consensus sequence (top line). The underlined 'U' in this consensus is a 'C' in the HDV antigenomic ribozyme. Black dots indicate nucleotides that disrupt P3 or P1.1 stem pairing. (B) Consensus structure of the R2 ribozymes based on 26 elements (additional fruit fly elements were not included) in panel (A). Length differences and mismatches in the P1, P2, and P3 stems are described in the text. Nucleotides present in at least 75% of the R2 ribozymes are indicated. Invariant nucleotides are shaded orange. The sow bug ribozyme, which does not show activity in vitro, has 'C' to 'A' and 'G' to 'A' differences at the conserved sites in L3. Y= U or C; R= A or G; K= G or U.
Figure 7.
Models for the priming of second strand DNA synthesis in an R2 retrotransposition reaction.
The initial steps of R2 integration are well characterized [41] and believed to be the same for all species. The R2 protein (not shown) recognizes the 3' UTR of the R2 RNA, binds to the 28S DNA target site, and cleaves the bottom strand. This cleavage site is used to prime first stand DNA synthesis which is referred to as target primed DNA synthesis, TPRT (cDNA, red line). Additional/non-templated nucleotides were shown to be added to the cDNA as the polymerase runs off an RNA template [42]. Priming of second strand DNA synthesis is proposed to differ between species based on whether the R2 RNA underwent self-cleavage at the R2 5' junction or upstream in the 28S gene. In the former, the R2 reverse transcriptase is able to use regions of microhomology of the cDNA with the DNA target upstream of the insertion site to initiate second strand DNA synthesis. This priming can involve the extra nucleotides added to the cDNA strand and can give rise to different length deletions of 28S sequences (lower left). In those animals where RNA self-cleavage is in the upstream 28S sequences, a heteroduplex between the cDNA and the target DNA is predicted to stabilize the integration intermediate resulting in a higher frequency of precise 5' junctions (lower right).
Figure 8.
Model for how target site duplications can shift the R2 cleavage site.
Diagramed is a typical 28S/R2 5' junction with 28S sequences in blue. The 28S upstream ribozyme self-cleavage site is indicated with a vertical arrow. Indicated with triangles are the upstream (P1) and downstream (P1') segments of the P1 stem of the ribozyme. Occasionally, the product of an R2 integration event is the duplication of 28S sequences at the new junction (a). A mutation in the internal P1 sequences in this junction will result in a higher likelihood of self-cleavage at the upstream site than at the downstream site, black and gray symbols respectively (b). Additional mutations in the duplication will eventually result in loss of sequence identity to the 28S gene (c). The nucleotide substitution in (b) can on occasion be compensated by a nucleotide substitution in P1', and self-cleavage will now more likely occur at the internal P1 (d). As further compensatory mutations accumulate, the duplication will lose identity to the 28S gene, and cleavage has shifted to the 5' end of the R2 element (e). Both scenarios give rise to a 5' extension of the R2 element.