Hierarchical analysis of RNA secondary structures with pseudoknots based on sections

doi:10.1371/journal.pcbi.1013904

Fig 1.

Graphical representations of RNA structures and section-based pseudoknot classifications.

Yellow lines: secondary structure base pairs; red lines: pseudoknot base pairs; blue regions: sections (unpaired nucleotides); black regions: non-sections. a) RNA structure in stem-loop and linear representations. b) Stem-loop (left) and linear (right) representations of the same structure. Blue arrows indicate sections (contiguous unpaired regions). The black arrow indicates a non-section where nucleotides participate in secondary structure pairing. c) Topologically distinct pseudoknot structures of both 2-clusters and 3-clusters.

More »

Expand

Fig 2.

Energy model calculation examples demonstrating local versus global structural considerations.

a) Pseudoknot base pair structure. The yellow bands represent sets of base pairs belonging to a secondary structure, while the red band represents a set of consecutive pseudoknots. Bottom right shows detailed base pair structure in the pseudoknot. b) Pseudoknot connecting sections across multi-loops. The newly formed loop that includes the pseudoknot contains all sections from the connected multi-loops, requiring global rather than local energy calculations.

More »

Expand

Fig 3.

Implementation flowchart of the RNA pseudoknot structure prediction algorithm.

The flowchart illustrates the hierarchical approach of our method, starting with input parsing, followed by section identification from the secondary structure. Energy calculations determine the minimum free energy (MFE) between section pairs. Structure prediction then branches into 2-cluster and 3-cluster analysis paths, with the latter using weighted energy minimization to balance former and latter section pair contributions. The output includes predicted structures, base pair assignments, and accuracy metrics.

More »

Expand

Table 1.

Structural characteristics of RNA sequences analyzed from RNAstrand database.

Section length refers to the number of unpaired nucleotides within each contiguous unpaired region.

More »

Expand

Fig 4.

Distribution of MFE gain for section pairs in RNA databases.

a) Histogram showing exponential decay of MFE gain frequency for all section pairs in tmRNA and RNase P RNA sequences from RNAstrand database. b) Simple probabilistic model explaining the observed exponential decay, assuming independent base pairing probability and ∼1.0 kcal/mol energy contribution per base pair.

More »

Expand

Fig 5.

Analysis of pseudoknot formation likelihood based on energetic predictions.

a) Connecting probability as a function of MFE gain for tmRNA and RNase P RNA sequences, defined as the ratio of actually connected section pairs to all section pairs with a given MFE value. b) Representative example of a non-connecting section pair with substantial MFE gain (80 kcal/mol), illustrating how conformational entropy losses can outweigh local energy gains in long sections (∼104 and ∼48 nucleotides).

More »

Expand

Table 2.

The result of analyses based on section pairs.

More »

Expand

Table 3.

The comparison of MFE and free energy contribution of real structure for connected section pairs in 3-clusters.

More »

Expand

Fig 6.

Optimization of weighting parameter for 3-cluster pseudoknot structure prediction.

Prediction accuracy metrics (sensitivity and positive predictive value) as a function of weight parameter w applied to the free energy contribution of latter section pairs in the weighted energy minimization approach. Optimal performance occurs at w = 0.8, indicating preferential weighting of former section pair interactions while maintaining contribution from latter pairs in type-1 3-cluster configurations.

More »

Expand