EternaBrain: Automated RNA design through move sets and strategies from an Internet-scale RNA videogame

doi:10.1371/journal.pcbi.1007059

Fig 1.

Eterna and EternaBrain.

(A-C) Puzzle-solving interface presented to human players of Eterna including the state of the puzzle (whether it is solved or not) in the top left corner (red/green outline), the puzzle itself (in the middle), and the toolbar (bottom) with which the players can mutate the RNA sequence to make it fold into the desired state; yellow, blue, red, and green symbols represent A, U, G, and C nucleotides. (A) The desired target structure for the RNA molecule, as indicated by the bullseye in the bottom left (orange highlight). (B) Nature mode, as indicated by the leaf in the bottom left (orange highlight), gives the predicted minimum free energy structure for the current sequence. Since the bases in the top right should be paired with each other (orange circle), this puzzle is not yet folding correctly; this status is shown by the red indicator in the top left corner. (C) The solved puzzle. The nature-mode structure matches the target structure, and the indicator in the top left corner turns green, meaning the puzzle has been solved. (D) (left) Wide distribution of contributed Eterna solutions across different players. For preparing the eternamoves-select data set, we selected any player who had solved more than 3000 distinct puzzles, which left us with 72 players. (right) In EternaBrain, we tested whether information on players’ moves could be used to train a convolutional neural network. (E) For solving new puzzles, the final EternaBrain-SAP framework first uses the EternaBrain convolutional neural net model to predict sequence changes (‘moves’) for new RNA puzzles. In a second stage, the Single Action Playout (SAP), six additional hand-coded strategies are applied to complete the solution.

More »

Expand

Table 1.

Input features used for training and testing EternaBrain convolutional neural network.

More »

Expand

Table 2.

EternaBrain CNN accuracies on eternamoves-select with different splits of training and test sets.

More »

Expand

Table 3.

EternaBrain CNN accuracies on eternamoves-select, grouped by length of puzzle and paired/unpaired status of nucleotide at which move was applied.

More »

Expand

Fig 2.

The 6 strategies included in the SAP.

(A) The original state of the puzzle before SAP. This represents a puzzle initiated with an arbitrary sequence of nucleotides; panel displays the target structure, where mismatched nucleotides (C-A) are highlighted. (B) The first step of the SAP is to correct mismatched pairs. Here, the cytosine nucleotides are switched to uracil to pair with adenine. (C) Changing end pairs to G-C. Changing base pairs that are at the edges of stems and flank loops to G-C pairs lowers the free energy of the molecule. (D) G-internal loop boost. The first nucleotide in an internal loop on either side is switched to a guanine. (E) U-G-U-G super boost. In an internal loop with 2 unpaired bases on either side, the 2 bases are changed to uracil and guanine, in that order, on either side. (F) G-hairpin boost. The first nucleotide in each strand of a hairpin loop is changed to a guanine. (G) Reorienting base pairs. Target base pairs that are not predicted to be folded correctly are ‘flipped’ to lower the energy of the structure. Here, alternating the A-U pairs lowers the energy of the stack. The 5’ end of each puzzle is at the top left, with the puzzle drawn counter-clockwise from that point.

More »

Expand

Fig 3.

EternaBrain performance.

(A) Performance of EternaBrain and 6 previously published algorithms on Eterna100 benchmark. EternaBrain solves 61/100, followed by MODENA (54/100), INFO-RNA (50/100), NUPACK (48/100), DSS-Opt (47/100), RNAinverse (28/100), and RNA-SSD (27/100). (B) Performance of Alternative Model Constructions. The CNN alone could solve only 20/100, and the SAP alone could solve 50/100. Removing various input features passed into the CNN resulted in drops in performance, confirming the importance of these features.

More »

Expand

Table 4.

EternaBrain-SAP performance on Eterna100 upon five additional playouts on the 61 puzzles it solved in its first run.

More »

Expand

Fig 4.

Example EternaBrain-SAP solutions to Eterna100 puzzles.

(A) U solution highlights the fact that the EternaBrain CNN alone can solve puzzles with short stems. (B) Chicken Tracks solution: EternaBrain-SAP can solve puzzles with three stems intersecting in one internal loop. (C) Thunderbolt solution demonstrates that EternaBrain-SAP can solve large puzzles (400 nucleotides long) and solve loops and stems in combination. (D) Shortie 4 solution shows EternaBrain-SAP can solve puzzles with multiple short stems (2 nucleotides long). (E) Shortie 6 is quite similar to Shortie 4, but with the same motif (short stems) repeated. The other algorithms mentioned could not solve Shortie 6 because of the repeated motifs. (F) Hard Y—target structure (left) vs nature-mode (right) structure. EternaBrain-SAP could not solve Hard Y because it required use of a little-used strategy to solve a motif called a zigzag. Since the strategy is not often used by players, the EternaBrain CNN did not learn the strategy and the strategy was not included in the SAP. In each panel, the 5’ end of each puzzle is at the top left, with the puzzle drawn counter-clockwise from that point.

More »

Expand