Figure 1.
RNA from human accelerated region HAR1F, a region of the human genome that differs from highly conserved regions of our closest primate relatives and is active in the developing human brain between the 7th and 18th gestational weeks [38].
Secondary structure representation in conventional form (left), as a circular Feynman diagram (center) and as a linear Feynman diagram (right). Sequence and consensus secondary structure taken from Rfam [42]; graphics produced with jViz software [43].
Figure 2.
Long range pseudoknot PKB239 in the 5′ untranslated region (UTR) of human immunodeficiency virus HIV-1.
Secondary structure with pseudoknots displayed in conventional form (left) and as a circular Feynman diagram (right). Sequence and structure of PKB239 taken from Pseudobase [44]; graphics produced with jViz software [43].
Figure 3.
The Turner energy model is an additive loop model, whereby the free energy of an RNA secondary structure is defined to be the sum of loop free energies in a unique decomposition of the structure into loops.
In this figure, the free energy of the depicted structure is the sum of free energies of loops through
. The Turner rules include free energy parameters for different types of loops, illustrated here for hairpins (
), stacked base pairs (
), bulges (
), internal loops (
), multiloops (
) and external loops (
). The Turner parameters are derived from a series of UV absorption (optical melting) experiments described in a number of papers including the references [12], [45]–[49]. For a complete list of all references, see http://rna.urmc.rochester.edu/NNDB/ref.html. Images created using the software VARNA [50].
Figure 4.
Feynman diagram of original recursions from McCaskill's algorithm [33] to compute the partition function.
(Notation in this figure slightly deviates from that in text; e.g. in text corresponds to
in the figure.)
Figure 5.
This figure depicts the logarithm (base 10) of the number of locally optimal [resp. all] secondary structures for random RNA.
Sequence length is given on the -axis, while the logarithm of the number of locally optimal structures (lower curve) [resp. all structures (top curve)] is given on the
-axis. Error bars are displayed. For various lengths
, random RNA sequences of length
were generated by a
th order Markov process with probability
for each nucleotide A,C,G,U. For each value of
, the average (exact) number of locally optimal [resp. all] secondary structures was computed. Using least-squares fitting, we find that the number
of secondary structures for length
random RNA satisfies
with
, while the number
of locally optimal structures for length
random RNA satisfies
with
. (The coefficient of determination,
, is the square of Pearson correlation coefficient of the least squares (linear) fit of the logarithm of the average number of structures.) It follows that the total number of structures is approximately equal to the number of local optima squared.
Figure 6.
Plot of ratio, with error bars, of the restricted Boltzmann partition function and the total Boltzmann partition function, as a function of RNA length, for the same random RNA generated as described in the Figure 5.
This ratio represents the percentage of structures, as weighted by their Boltzmann factor, that are locally optimal. By numerical fitting, we find that this ratio is approximately with coefficient of determination (see [54])
.
Table 1.
Using a 0th order Markov chain with probabilities of 0.25 for each nucleotide A,C,G,U, 50 random RNA sequences were generated for each length , from 20 to 200 in steps of 20.
Table 2.
Structural diversity comparison between ensemble of locally optimal structures and Boltzmann ensemble of all structures.
Figure 7.
Graph showing sensitivity and positive predictive value for variants of the MEA method, when benchmarked with consensus structures from all seed alignments of Rfam 10.0 database [42].
For various values of with
, the sensitivity and PPV were computed for methods MEA, MEA LO and MEA MIN. Sensitivity of a secondary structure prediction for a given RNA sequence is defined as the number of correctly predicted base pairs divided by the number of base pairs in the native consensus structure, while PPV is defined as the number of correctly predicted base pairs divided by the number of base pairs in the predicted secondary structure. Sensitivity and PPV are computed by Rfam family, then averaged over all families of seed alignment in Rfam 10.0. (We performed a similar analysis where averages were taken over all sequences in Rfam, without first computing a family average. Results are similar; data not shown.) In [34], [35], the maximum expected accuracy (MEA) structure is computed by applying a variant of the Nussinov-Jacobson [5] algorithm using the base pairing probabilities
as computed by McCaskill's algorithm [33]. The parameter
is a weight for base pairing probability; in other words, the score, following [34], [35], of a structure
is given by
. (Value
in the graph.) In the MEA LO variant of the MEA procedure, we consider base pairing frequencies
, obtained by sampling locally optimal structures, while in the MEA MIN variant, we take
to be the minimum of the McCaskill base-pairing probability and the base pairing frequency sampled from locally optimal structures, and we take
to be the minimum of the corresponding probabilities that
is unpaired in the low energy ensemble (using RNAfold -p) and in the locally optimal ensemble (using RNAlocopt). Sensitivity and PPV values are respectively
and
for the minimum free energy (MFE) structure, as computed by RNAfold from the Vienna RNA package [58], similar to the values for MEA, which latter has sensitivity
and PPV of
when
. The single point below each of the three curves corresponds to MFE sensitivity and PPV. The method MEA MIN gives a consistent performance improvement over the other methods.
Figure 8.
Graph showing sensitivity (black, increasing curves) and positive predictive value (PPV, red, decreasing curves) as a function of (explained in text and in Figure 7) for methods MEA, MEA LO, and MEA MIN. as benchmarked with consensus structures from all seed alignments of Rfam 10.0 database [42].
Values of given on
-axis, while values of sensitivity and ppv are given on the
-axis. Sensitivity and PPV are computed by Rfam family, then averaged over all families of seed alignment in Rfam 10.0. (We performed a similar analysis where averages were taken over all sequences in Rfam, without first computing a family average. Results are similar; data not shown.) The MEA MIN method yields a consistent improvement other MEA methods, as well as over minimum free energy (MFE) structure predictions, benchmarked by using RNAfold from the Vienna RNA package [58]. The best sensitivity and the best PPV are given by method MEA MIN; the next best by MEA LO, and the last by method MEA. Two horizontal lines indicate the sensitivity (top line) and PPV (bottom line) for the minimum free energy structure, as computed by RNAfold from the Vienna RNA Package.
Figure 9.
Example structure in recursion. In the left structure, we do not yet know the two loops bordered by the base pair .
Therefore we do not yet know whether by removing this base pair, the free energy will be lowered. In the right structure, one step further in the recursion, we now know which loops border the base pair – namely, loops
and
. Images created using the software VARNA [50].
Figure 10.
Example structure in recursion.
The energy change effected by removing the base pair is
. To calculate this, we need to keep track of base pair
. Images created using the software VARNA [50].
Figure 11.
The six ways that a single base pair can be added to or removed from a structure and possibly reduce the overall energy.
Images created using the software VARNA [50].
Figure 12.
Image of base pair that could not possibly lower the energy by creating a multiloop, since it creates two bordering multiloops.
Images created using the software VARNA [50].
Figure 13.
Example of the formation of a multiloop with tails of length and
.
Images created using the software VARNA [50].
Figure 14.
Example of gluing together two pieces of a multiloop.
Note that if each piece is locally optimal, then the composite, obtained by gluing the pieces together, is as well. Images created using the software VARNA [50].