Fig 1.
RNA structure-based comparative genome analysis.
Sequence alignments were created through SHAPE-dependent pairwise comparisons, which were then combined into multiple sequence alignments [24]. Windowed linear regression analysis of SHAPE data was then used to define regions where structural conservation is implied by correlation of SHAPE reactivities. For these regions, consensus secondary structures were modeled using SHAPE- and sequence-dependent folding [50]. Consensus secondary structures were found for both HIV-1/SIVcpz and three-genome alignments. Base pairs that did not disagree between the two consensuses and that had a pairing probability greater than 95% were used to constrain a final model for HIV-1 using SHAPE-directed folding [29].
Fig 2.
SHAPE-structure dependent alignment.
(A) SHAPE-directed alignment over one 200-nt window in the RRE. Sequences are numbered relative to the HIV-1 RNA genome, with the transcription start site as +1. (B, C) Windowed linear regression statistics as a function of the HIV-1 (NL4-3) sequence, computed over 200-nt windows. Correlations of SHAPE values across the three-genome sequence alignment were evaluated by F-test (results shown as p-values; black); pairwise comparisons with HIV-1 were evaluated by t-test (results shown as p-values). The entire HIV-1 alignment is shown; RNA landmarks are given at the bottom of the figure.
Fig 3.
Secondary structure models for six structurally conserved elements in the 5' half of HIV-related RNA genomes.
Nucleotides are colored by HIV-1 SHAPE reactivities. Secondary structures are shown for the final constrained HIV-1 secondary structure model. Base pairs are colored by level of structural consensus; black base pairs appear only in constrained HIV-1 predictions. Positions of predicted elements are shown on the HIV-1 (NL4-3) genome; annotations indicate statistical dependence, known RNA elements, major splice sites, and protein reading frames.
Fig 4.
Secondary structure models for five structurally conserved elements in the 3' half of HIV-related RNA genomes.
Secondary structures are shown for the final constrained HIV-1 secondary structure model. Other figure annotations are described in the Fig 3 legend.
Fig 5.
Secondary structure models for RNA elements with prior well-established functions.
Secondary structures correspond to the final, fully automated constrained HIV-1 prediction. Other figure annotations are described in the Fig 3 legend.
Fig 6.
Novel conserved, likely functional, elements in the final HIV-1 structure model.
(A) Structural elements located near protein-protein junctions. Protein domain junctions are labeled. (B) Conserved structural elements with the potential to form long (>20 bp) helical stacks. (C) Conserved structure at the A1 splice site, described in Pollom et al., recapitulated in this work [11].
Fig 7.
Consensus structures for the cPPT and PPT elements.
(A) cPPT and PPT sequences and SHAPE reactivities. The six sequences correspond to two elements from each of three genomes. Regions with consistently low or high SHAPE reactivities are highlighted by shading. (B) Structure models for cPPT- and PPT-containing elements in the final HIV-1 model.