Using the Fast Fourier Transform to Accelerate the Computational Search for RNA Conformational Switches

Using complex roots of unity and the Fast Fourier Transform, we design a new thermodynamics-based algorithm, FFTbor, that computes the Boltzmann probability that secondary structures differ by base pairs from an arbitrary initial structure of a given RNA sequence. The algorithm, which runs in quartic time and quadratic space , is used to determine the correlation between kinetic folding speed and the ruggedness of the energy landscape, and to predict the location of riboswitch expression platform candidates. A web server is available at http://bioinformatics.bc.edu/clotelab/FFTbor/.

In the main paper, the recursions given in Theorem 1 and related material in the section Methods are for the simple Nussinov energy model for RNA secondary structure, in which base pairs are assigned a stabilizing energy of −1. This was done to simplify the argument on first reading. Nevertheless, it is known that the Turner energy function, which considers stabilizing energy contributions due to base stacking and destabilizing energy contributions due to loop entropy, is much more accurate for structure prediction. For that reason, our software, FFTbor, is implemented for the Turner energy model, following the following recursions.
To compute Z(x) = Z 1,n (x), we use the recursions k,j] ), and where E d is the energy contribution due to dangling ends (energy contributions from single bases stacking on adjacent base pairs) and closing AU base pairs (since a non GC base pair closing a stem has a destabilizing effect). The sum is taken over all possible base pairs (k, j) with i ≤ k < j. We compute ZB(x) using the recursion where EH(i, j) is the energy of the hairpin loop with closing base pair (i, j), EI(i, j, k, l) is the energy of the stack, bulge or interior loop with the closing base pair (i, j) and the interior base pair (k, l), The first term in the recursion takes care of the case where (i, j) is the only base pair in [i, j], i.e. (i, j) closes a hairpin loop. The second term handles the case where there is an interior loop (or a bulge or a stack) closed by (i, j) and (k, l). The third term takes care of all the structures where (i, j) closes a multi-loop. To reduce complexity of the algorithm, the interior and bulge loop size can be limited to a maximum size of L, by requiring that l > j − L in the above recursion. The final recursion, for computing ZM(x), is j] ). Note that since ZM i,j (x) computes the partition function contribution under the assumption that [i, j] is part of a multi-loop, there will be exactly one stem-loop structure in this region (the ZB(x) term) or more than one (the ZB(x) − ZM(x) term). Justification of recursions (1), (2), and (3) follow by induction, as in the proof of Theorem 1.

Scaling
Since the use of scaling may not be well-known in the context for RNA secondary structure, we describe how the recursions of FFTbor can be scaled to any given constant. Let c > 1 be a real scaling constant. Given an RNA sequence a = a 1 , . . . , a n and initial structure S 0 of a, let Q k 1,n = Z k 1,n c n denote the scaled sum of Boltzmann factors of all secondary structures S, whose base pair distance from S 0 is exactly k. Noting that the maximum base pair distance between any two structures of a is at most n, we define the polynomial whose coefficients c k = Q k 1,n . If we evaluate the polynomial Q(x) for n + 1 distinct values Q(x 1 ) = y 1 , . . . , Q(x n+1 ) = y n+1 then the Lagrange interpolation formula guarantees that Q(x) = n k=1 y i · P k (x), where Let Q(x) denote Q 1,n (x), defined by induction on j − i as follows. k,j] ), and where E d is the energy contribution due to dangling ends (energy contributions from single bases stacking on adjacent base pairs) and closing AU base pairs (since a non GC base pair closing a stem has a destabilizing effect). The sum is taken over all possible base pairs (k, j) with i ≤ k < j.
where EH(i, j) is the energy of the hairpin loop with closing base pair (i, j), EI(i, j, k, ) is the energy of the stack, bulge or interior loop with the closing base pair (i, j) and the interior base pair (k, ), d 3 = d BP (S [i,j] , S [k, ] ∪ {(i, j)}), and d 4 = d BP (S [i,j] , S [i+1,k−1] ∪ S [k, ] ∪ {(i, j)}). The first term in the recursion takes care of the case where (i, j) is the only base pair in [i, j], i.e. (i, j) closes a hairpin loop. The second term handles the case where there is an interior loop (or a bulge or a stack) closed by (i, j) and (k, ). The third term takes care of all the structures where (i, j) closes a multi-loop. To reduce complexity of the algorithm, the interior and bulge loop size can be limited to a maximum size of L, by requiring that l > j − L in the above recursion.
Let QM(x) denote QM 1,n (x), defined as follows. For 1 ≤ i ≤ j ≤ i + θ, define QM i,j (x) = 0, while for j ≤ i + θ + 1 ≤ n, where d 5 = d BP (S [i,j] , S [k,j] ) and d 6 = d BP (S [i,j] , S [i,k−1] ∪ S [k,j] ). Note that since QM i,j (x) computes the partition function contribution under the assumption that [i, j] is part of a multi-loop, there will be exactly one stem-loop structure in this region (the QB(x) term) or more than one (the QB(x) − QM(x) term).