The authors have declared that no competing interests exist.
Conceived and designed the experiments: MB JB JC. Performed the experiments: MB JB JC. Analyzed the data: MB JB JC. Contributed reagents/materials/analysis tools: MB JB JC. Wrote the paper: MB JB JC AD DB.
Cellular processes involve large numbers of RNA molecules. The functions of these RNA molecules and their binding to molecular machines are highly dependent on their 3D structures. One of the key challenges in RNA structure prediction and modeling is predicting the spatial arrangement of the various structural elements of RNA. As RNA folding is generally hierarchical, methods involving coarsegrained models hold great promise for this purpose. We present here a novel coarsegrained method for sampling, based on game theory and knowledgebased potentials. This strategy, GARN (Game Algorithm for RNa sampling), is often much faster than previously described techniques and generates large sets of solutions closely resembling the native structure. GARN is thus a suitable starting point for the molecular modeling of large RNAs, particularly those with experimental constraints. GARN is available from:
RNA molecules are involved in diverse biological processes in the cell. An understanding of the way in which RNA molecules adopt a 3D structure provides considerable insight in to the functional roles of these molecules. Methods for designing sequences so as to obtain specific functions are within reach [
These methods generate interesting samples for the analysis of experimental data [
A few thousand RNA structures are now available from the Protein Data Bank (PDB), and these data have improved our understanding of RNA structures. They have increased the quality of energy potentials for RNA. This is true not only for the traditional force fields used in molecular dynamics simulation, but also for knowledgebased (KB) potentials. Initially developed for protein structure prediction [
The use of various levels of molecule representation is a major feature of most of these effective techniques [
We present here a new strategy for sampling RNA 3D structures, combining a coarsegrained graphbased representation, KB potentials, and gametheory algorithms. Each SSE is represented as one or a few nodes on a graph. The nodes are linked by covalent bond connections, and nonbonded interactions are represented with various types of KB potentials. Game theory is used to make the system evolve and to provide putative conformations: the nodes are the players of
In this context, finding RNA conformations satisfying structural and potential constraints is seen as a local optimization problem in which each SSE or
GARN combines a coarsegrained 3D representation, a knowledgebased (KB) scoring scheme, and efficient search techniques. The idea is that a stable solution, referred to as a Nash equilibrium in Game Theory, could be used to represent a stable 3D structure for RNA.
In game theory: (i) the strategy set of an action is called
One of the key results of Game Theory is the Nash theorem [
One way to obtain a Nash equilibrium is to use simple algorithms in which each player selects a best response strategy, evaluated relative to the decisions taken by the other players. Probabilistic versions can be efficient, but major improvements have been achieved through studies of the dynamics of these algorithms [
Another approach is the
As with the multiarmed bandit problem, we use the regret minimization to solve our RNA folding problem. Basically, when searching for minima with game theory, the type of search performed is similar to that in forcefield experiments. We used a reference set of RNA structures to construct a KB scoring framework for use in a lattice setting in which the players (RNA SSEs) evolve. We show that game theory strategies are efficient for sampling various conformations, particularly for large molecules with complex substructures, such as threeway junctions.
A schematic description of our approach is provided in
The 3D space is then modeled as a triangular lattice, as previously described [
Full atomic models are not reconstructed from GARN, and only coarsegrained models are compared. Allatom structures can be obtained and refined with other software, e.g., C2A [
The sections below describe the methodology applied at each step and the evaluation scheme.
Various 3D datasets can be used to establish statistical measurements for RNA structures [
This
For the evaluation of our approach and its comparison with other programs, we used two distinct datasets: the
We assessed the performance of our method and compared it with the results of other studies on the
The RNA molecule is represented by a graph similar to the representations previously used for other methods of RNA structure prediction [
Each helix consisting of fewer than five base pairs is modeled using one player, taken to be the geometric center of all the heavy atoms considered. Longer helices are represented with as many players as required to keep a maximum of five base pairs per player and to account for possible longrange flexibility.
For terminal loops (oneway junctions), bulges and twoway junctions [
Threeway junctions are modeled with two players. These players are defined such that one accounts for the helical stacking [
Nodes (players) are built from the secondary structure representation in which basepaired nucleotides are shown in blue and free nucleotides are shown in yellow. Blue nodes correspond to helices and yellow nodes to junctions. Each element, except the threeway junction is represented by one node (shown as a sphere), taken to be the geometric center of the heavy atoms. The threeway junction contains two nodes, accounting for helical stacking and branching, respectively.
Each node (player) of the RNA model is set to lie on a 3D triangular regular lattice.
As explained by [
We optimized the size of the lattice, by calculating all the distances between adjacent players in the graph for the
Scoring parameters were also computed on the
Unsurprisingly, the lowcount regions (small distances) were difficult to deal with in the score evaluation. A Dirichlet Process Mixture was used to ease the process for Gaussian function evaluations [
The game model contains: (i) players, corresponding to the nodes of the RNA graph, (ii) a set of possible strategies for each player, i.e., the spatial directions in which the next player can move, and (iii) player preferences, corresponding to the probability of each player choosing a strategy as a function of the previous moves of the other players, i.e., based on a score. From these settings, different game plays can be tested, to evaluate the best combinations allowing the system to evolve.
Each player has a set of 12 strategies, corresponding to the set of directions in the triangular lattice in which it is possible to move (See
Players in small helices (fewer than 6 base pairs) can either stay in the same direction, or move at a maximum angle of 60° from that orientation.
Players in large helices (more than 5 base pairs) and in small twoway junctions (one unpaired side being smaller than two nucleotides) can move in all 60° angle directions from their initial orientation. Large helices may also be
All possible moves in the lattice are allowed for one and twoway junction players.
While all possible moves in the lattice are possible for threeway junction players, once the first player, representing the stacking, has chosen a strategy (for the position of the second player), the second has to choose a position from the possible positions in the lattice.
The relative ordering of the players is determined as follows: each node of the graph is numbered according to a depth first search starting from the first junction (from the 5’3’ paired ends) with the largest degree of branching (see
As described above, the game uses a finite set of strategies for each player. It is also sequential, as players make their moves one after the other and the scheme is repeated several times, the previous configurations being known at each step (the game thus includes several turns for each player).
We use a regret minimization scheme: each player moves by choosing the most favorable and sufficiently likely environment, based on previous observations. We tried two different algorithms inspired by multiarmed bandits: the Upper Confidence Bound algorithm (UCB) [
In the UCB algorithm, each player chooses the strategy maximizing the sum of two scoring terms: (i) an
An action is allowed if it belongs to the set available to the specific type of player. An action is forbidden if: (i) two players occupy the same position on the lattice or (ii) two edges of the graph intersect. The set of possible actions available to a player is thus updated at each game turn, with the elimination of forbidden configurations.
Three different gameplays were tested, for sequential games, with each player acting in turn. These gameplays differ in terms of what the players do at each turn (see
(I) Each game consists of several turns obeying a common set of rules. First, one or all the players choose a strategy (a direction on the grid) according to the relative probabilities of their choosing each strategy. The score is then updated by one or all the players, based on their distances from the other players, and the probabilities are then updated. Depending on the type of game, three schemes are possible (II): (a) in the AA game, all players apply a strategy and all players then calculate their scores, in two successive steps, (b) in the OA game, all players calculate their score each time a single player plays (the usual total number of turns
Once the game is set, with its players and strategies, a score is needed to define the welfare of a player in a specific conformation. Taking into account the coarsegrain nature of the model, we used KBdefined scoring functions to provide a pseudopotential for SSEs. The scoring function is computed so as to mimic a KB energy: measurements of the distance
Four different types of scoring functions were tested (see
LennardJones:
Modified LennardJones: the positive repulsive part of the LennardJones potential is flattened out to 0, so as to minimize local effects.
Gauss:
1/
For each molecule, a graph representation must first be created. The secondary structure is obtained from RNA FRABASE [
Starting from a random conformation, different game settings were tested on the whole
The sampling results for the dataset are not given here, but
We evaluated the performance of this approach relative to other strategies, using the output PDB files, which we converted to our coarsegrained graph representation and used to calculate the RMSD. The inputs for these methods are the sequence and the same secondary structure used for GARN (when required by the method). We compared our results with those for four well known software suites: (i) iFOLDRNA [
The previous exhaustive parameter and gameplay tests indicated that some settings were more appropriate for certain molecules, depending on their structural characteristics.
We extracted a rule of thumb for choosing the gameplay from the
We thus extracted three categories of molecules represented in the
The native structure graph is superimposed on the Xray structure in the top panel. This superimposition indicates that the native structure graph (in blue and yellow) represents the native Xray structure well. From top to bottom, the graphs most closely matching the native graphs are shown in gray (the darker the gray, the closer the match), superimposed on the native structure graph. These graphs show a good range of samples that could be used for reconstruction: the global shape of the molecule is recovered and the geometry of the junction is of interest.
The top left panel shows the structure and its associated GARN graph. Panel (1) shows the best threeway junction obtained with GARN (in pink) superimposed on the native structure graph (in white). Panels (2) and (3) show the secondbest threeway junctions (in pink) superimposed on the native structure graph (in white).
Each setting has an impact on the sampling. In the AA gameplay, each player waits for all the other players to change their positions before updating its probabilities. The game thus provides access to very different conformations between turns. Sampling is efficient as the difference between two consecutive steps is large. The OA gameplay also allows different conformations between consecutive steps, but as strategy is updated for only one player at a time, the sampling is less broad. In the OO gameplay, in which probabilities are updated for only one player, the other players have very little influence and sampling at each turn is purely local. Overall, these three gameplays cover different levels of aggressiveness for the sampling that can be finetuned for each molecule or experiment, depending on what needs to be achieved.
The performance of the algorithm used in the game depends principally on the type of scoring. The UCB algorithm worked better when the sampling and scoring did not involve a wide and rugged conformational space. The UCB algorithm is also known not to be very robust with noisy data [
The scoring schemes have various effects, mostly linked to the number of modes and the treatment of the repulsive part of the scheme. Both the Lennard Jones and Modified Lennard Jones schemes force players into a reasonable conformation, at least visually from a packing perspective, with only one mode in their definition distribution. However, the repulsive part of the Lennard Jones scoring function sometimes prevents the formation of tightly packed conformations of potentially interest from a biological perspective. A wider range of conformations is available with the Gauss score, which allows for different modes related to these conformations.
When a helix is frozen, i.e., when the helix cannot bend (See
We assessed the influence of the starting conformation, by using random initializations. We observed no marked impact of starting conformation on the conformations generated. In the AA game, after the first round, all players choose a conformation different from the starting conformation. In the OA and OO games, the starting conformation has an influence only on the first round, because all players choose a different conformation.
There is currently no way to demonstrate that a Nash equilibrium has been reached, but our modeling is based on potential games [
The regret is defined as:
The analogy with gradient descent is intuitively comprehensible, but it is hard to obtain formal proof [
For each molecule, we used the default gameplay of our procedure, as described above.
Results from GARN runs with default parameters for full structures were compared with those for iFoldRNA, MCSym, FARNA and NAST. Not all servers can handle the ten structures of the
1MZP  55  8  min  4.32  8.23  NA  5.61  12.97 
max  10.58  12.68  NA  17.80  26.38  


1E8O  49  8  min  6.82  12.62  6.75  7.78  12.34 
max  15.40  17.71  15.55  21.02  20.29  


4FE5  67  14  min  7.14  15.04  NA  7.80  17.00 
max  14.53  20.12  NA  20.98  34.17  


4QJH  74  15  min  7.87  11.34  NA  8.99  23.44 
max  14.93  21.24  NA  18.67  26.57  


4TS0  89  21  min  10.42  NA  NA  9.47  19.84 
max  23.13  NA  NA  23.19  22.79  


1LNG  97  16  min  7.85  11.92  10.52  12.67  36.49 
max  17.07  35.53  29.58  30.19  59.66  


4WFL  107  18  min  8.82  18.08  NA  11.33  43.41 
max  16.22  25.75  NA  26.00  47.04  


4QK8  124  20  min  12.25  18.66  NA  13.23  54.43 
max  22.78  28.49  NA  22.19  59.44  


1MFQ  127  24  min  9.68  20.42  16.07  16.13  38.91 
max  20.64  34.08  30.97  41.27  44.17  


4GXY  172  32  min  14.27  NA  NA  15.84  69.04 
max  31.04  NA  NA  26.17  74.83  

Another advantage of GARN is that it does not require any fragment library to be available for SSEs of a specific sequence. Other strategies are also limited by the size of the molecules, due to computation costs. GARN can accommodate large structures, and the calculations for any given sample are very fast (30 min for a sample of 50 molecules for 4FE5, on a standard workstation or laptop). Comparisons of running times and of the number of conformations generated can be found in
The best model generated by GARN (in pink) and the equivalent coarsegrained models obtained with other techniques are superimposed on the native structure graph (in black). The GARN technique does not enforce packing, but frequently provides the solution closest to the native structure.
The scoring used did not select the best candidate as the first solution, or even as one of the first five solutions, but the Energy vs. RMSD curve displays interesting behavior.
Despite not having been specifically designed to handle threeway junctions and not including a detailed inventory of their conformations as input, our procedure proved useful for obtaining samples close to native samples.
The RNAJAG results are taken from [
1E8O  3.75  9.07  1.65  11.52  4.12  11.96  5.57  9.82  6.45  10.60  – 
4FE5  2.79  4.77  –  –  1.23  5.87  1.95  12.35  4.80  11.12  – 
4QJH  4.15  7.02  –  –  2.41  12.40  9.24  12.38  7.35  17.16  – 
1LNG  7.19  7.19  3.76  6.85  3.02  10.31  6.05  36.56  3.79  9.67  9.04 
4WFL 1  6.79  6.79  –  –  3.24  8.34  14.01  17.24  8.98  17.57  – 
4WFL 2  5.65  8.83  –  –  4.64  12.25  11.26  14.40  7.46  14.11  – 
4QK8 1  5.12  7.87  –  –  2.94  8.64  5.16  10.68  6.21  13.25  – 
4QK8 2  3.95  8.29  –  –  3.82  10.13  8.62  12.20  5.35  14.75  – 
1MFQ  4.70  5.71  2.87  6.06  5.31  7.5  9.26  19.71  18.16  27.13  5.26 
4GXY 3way  9.76  9.76  –  –  2.92  6.55  4.69  12.93  –  –  – 
4GXY 4way  9.57  13.31  –  –  6.50  11.87  16.55  39.84  –  –  – 
We also compared our method with RNAJAG, for the nine molecules reported by [
The sampling strategy implemented in GARN is based on the hypothesis that SSEs can be modeled spatially as players trying to maximize their own welfare. This hypothesis, although very coarsegrained, is consistent with hierarchical folding models based on advanced energy modeling techniques for RNA [
The choice of the scoring functions between SSEs appeared to be essential, particularly for the repulsive part of the scheme (close contacts). When a molecule contains a relatively large number of helical players, i.e., long helices, the repulsive part of the scheme is extremely important and close contacts must be avoided. Elongated conformations are preferred. However, the presence of a large number of junction elements calls for putatively packed structures best represented by a weak or nonexistent repulsive part of the scoring system. This tradeoff between packing and sampling must, artificially, be handled separately when dealing with simple sampling strategies for RNA molecules. It may also constitute a bias of the knowledgebased setup. This is a limitation of our model, in which decreasing the impact of the repulsive part of the model compensates for the rigidity imposed on the helical SSEs. However, it would not be expected to have a major effect on largescale molecule sampling.
Threeway junctions slightly modify the configuration of the associated player, but they have a strong impact on the whole conformation. In our game model, this means that when one of these players changes its strategy, the score values may change for all the other players, from one turn to the next. The EXP3 algorithm was designed to be efficient in this type of environment and it performed better than the other algorithms tested. A Boltzmannbased strategy therefore appears to be appropriate for the modeling of more complex conformations, for which a suitable energy basin must be found in a potentially rugged landscape. By contrast, simple structures not containing threeway or higherorder junctions gave better results with the simple UCB algorithm, for which a slight change in the strategy of one player changed the scores of the other players. For simple structures, the various 3D conformations obtained were similar, and subtle changes could improve the better results. In this context, the UCB algorithm performs the equivalent of a local optimization in which the energy (scoring) space has to be smooth.
The combination of methods inspired by game theory with knowledgebased models used in this study provided a good framework for sampling RNA molecules with a coarsegrained representation. The flexibility of the strategy described here resulted in a better performance than for existing techniques, for the sampling of mediumsized RNA molecules. The procedure is quick and easy and does not require large amounts of external information. Ideally, the different techniques could be combined for hierarchical or efficient local sampling, making it possible to build very large assemblies [
GARN has three parts: (i) the parameter setup for adjusting the game settings, (ii) the game process in GARN and (iii) comparisons with other published techniques.
(PDF)
Simple modeling of a fourway junction. Five players are used to model the fourway junction. They are located as if the junction consisted of two threeway junctions and a linker.
(PDF)
Distances between all nonadjacent nodes for the
(PDF)
Distances between adjacent nodes for all type of nodes, for the
(PDF)
Distances between helix and twoway junction players in the
(PDF)
The current player, C, has to choose the direction of the next player, N, according to the direction of the previous player, P. If player C is frozen, this player can only choose the strategy corresponding to the black line; if not frozen, this player can choose to follow any of the blue lines.
(PDF)
Mapping of frozen and nonfrozen conformations: example of helix 0 and helix 1 of 1MFQ. The coarsegrained model (white) of the native structure is shown in the left panel. In the nonfrozen mode (in pink), the lattice mapping is close to the native structure and allows for bending. In the frozen mode, in which the helices remain rigid (in blue), the lattice mapping forces the helix to remain straight.
(PDF)
Players from the largest junction play first, and the other players are numbered according to a depth first search, starting from the first largest junction according to 5’3’ ordering.
(PDF)
The native structure graph is superimposed onto the Xray structure. This superimposition shows that the native structure graph (in blue and yellow) represents the native Xray structure well in each case. The graphs closest to those for the native structure are shown in gray (the darker the gray, the closer to the native structure), superimposed on the native structure. These graphs correspond to a good range of samples potentially useful for reconstruction: the global shape of the molecule is recovered and the junction has an interesting geometry.
(PDF)
Evolution of the regret for a threeway junction player during the 4FE5 simulation. The top panel shows all strategies and the bottom panel shows four strategies. After 4000 steps, the amplitude of regret reaches a stationary value.
(PDF)
The best model generated by GARN (in pink) and the equivalent coarsegrained models obtained with other techniques (when available) are superimposed on the native structure graph (in black). The GARN technique does not enforce packing, but often provides the closest solution.
(PDF)
Energy is calculated as the default GARN score multiplied by 1 and normalized.
(PDF)
Energy vs. RMSD curves for the
(PDF)
The
(PDF)
Comparison of the RMSD ranges obtained for different methods with the
(PDF)
The
(PDF)
The molecule contains 49 nucleotides and 8 players. Six different parameter sets per game type are shown. Values shown in blue highlight combinations providing conformations with RMSD values below 8Å. Elements highlighted in yellow correspond to the default GARN options for the molecule.
(PDF)
The molecule contains 55 nucleotides and 8 players. Six different parameter sets per game type are shown. Values shown in blue highlight combinations providing conformations with RMSD values below 5Å. Elements highlighted in yellow correspond to the default GARN options for the molecule.
(PDF)
The molecule contains 67 nucleotides and 13 players. Six different parameter sets per game type are shown. Values shown in blue highlight combinations providing conformations with RMSD values below 8Å. Elements highlighted in yellow correspond to the default GARN options for the molecule.
(PDF)
The molecule contains 127 nucleotides and 24 players. Six different parameter sets per game type are shown. Values shown in blue highlight combinations providing conformations with RMSD values below 11Å. Elements highlighted in yellow correspond to the default GARN options for the molecule.
(PDF)
The molecule contains 172 nucleotides and 32 players. Four different parameter sets per game type are shown. As the molecule is relatively large, due to computing time constraints, extensive results are reported only for the AA game. Values shown in blue highlight combinations providing conformations with RMSD values below 15Å. Elements highlighted in yellow correspond to the default GARN options for the molecule.
(PDF)
The default gameplay scheme is based on the existence of highorder junctions, and on the ratio of the number of players in helices to the number of players in junctions.
(PDF)
Elements in blue indicate the results for the GARN default settings.
(PDF)
* indicates that the computation was performed locally on an Intel Xeon E5607 2.27GHz CPU. NAST computation was also performed with OpenCL on a NVIDIA Quadro 5000 GPU. Other computations were performed on dedicated servers. NAST appears to be much faster than the other methods, but it calculates only secondary structure interactions: the lack on tertiary information as input only allows for extended structures as a result. The computation time for RNAJAG was not available from [
(PDF)
GARN RMSD values are calculated only for the coarsegrained representation of the threeway junction. RNAJAG RMSD values were obtained from [
(PDF)
We thank Alexis Lamiable and Adelene Sim for fruitful discussions. M.B. received funding from the Digiteo project JAPARIn3D. This work was partially supported by the Agence National de la Recherche (ANR2011BSV6011/NGDNSD).