Table 1.
The two benchmark API datasets that are used for building two different classification models.
Fig 1.
Overview of our study for sampling candidate aptamer sequences using the API classifiers and MCTS.
(A) shows the process of choosing the best model from the random forest classifier trained by the API classification benchmark dataset. (B) illustrates our iterative forward sampling algorithm to obtain the candidate aptamer sequences that bind to the given target protein. The sampling algorithm repeats N times, where N is a user-specified aptamer sequence length. The algorithm takes inputs as the previously selected bases, specifically a target protein sequence and score function that is the best model from (A) before iteration.
Fig 2.
Details underlying our iterative forward sampling algorithm using MCTS.
The process (third iteration in terms of total N) illustrates the internal process of our sampling algorithm. (A) Selection stage of the MCTS: our method uses previously selected bases in the current sampling iteration. It searches based on the UCT score recursively and finishes after arriving at an unknown position. (B) Expansion stage: a new child node is added in the arrived node randomly from the eight child nodes. (C) Simulation stage: the algorithm attempts a random walk until tree depth N and pursues the path from root node to the leaf node. (D) The previous bases (1, 2, 3) and bases of the path (4, 5, 6, 7) are reconstructed as a candidate aptamer sequence. The sequences and interaction scores are added into a set of candidate aptamers. (E) MCTS updates the parameters of the tree using the score calculated in (D). (F) MCTS algorithm repeats the process of (A) to (E) M times. (G) The optimal base is selected from the child nodes of a root node for the next sampling iteration.
Fig 3.
Evaluation of aptamer sequence generation in terms of binding affinity with six target proteins using ZDOCK docking simulation.
(A) ZDOCK scores of aptamers using protein structures from the PDB and (B) ZDOCK scores using structures built via the Swiss-Model server. For protein 1ERK, there are two known aptamers, C3 and C3.59, with 90 and 59 bases, respectively. We compared the docking scores of aptamer sequences generated by our model (in green bar) and Lee et al. [21] (in blue); the known aptamers (in gray) are listed in Table 2. The candidates of our Apta-MCTS (in green bar) yielded higher docking scores than the results reported by Lee et al. [21] (in blue) and the known aptamers (in gray) for five cases: 3V79_1, 5VOE (chain H and L), 2RH1, 1ERK(C3) and 1ERK(C3.59). Additional details are available in S2 Table for (A) and S3 Table for (B).
Table 2.
Target proteins and aptamers obtained from PDB database, which were applied for our model.
Fig 4.
Comparison of binding positions between known aptamer 5VOE(chain A) depicted as a red structure in (B-F) and our candidate aptamers (green) in (G-L) for target protein 5VOE (gray) in (A-L). The structures of aptamers were predicted by SimRNA and RNAComposer and rendered using NGL viewer after the ZDOCK docking simulation was applied. (A) Target protein 5VOE and the angle is always fixed in other figures. (B) Crystallized pose of aptamer 5VOE:A. (C-F) Docked poses of aptamer 5VOE:A with RMSD 5.27Å, 34.31Å, 34.5Å, and 42.33Årespectively compared to the crytallized pose in (B). Our candidates (G-L) show similar binding positions compared to the upper binding sites of (B-F). Especially binding positions in (J-L) are quite similar to ones in (D-F).
Fig 5.
Binding affinity of aptamer samples that have same length with the known aptamers using the docking simulation score by ZDOCK.
(A) Comparison of docking scores between our Apta-MCTS and the method employed by Lee and Han [21]. The diagonal dashed line indicates that the docking scores of both models are tied. Green dots above the diagonal line refer to how Apta-MCTS generated better aptamers with higher docking scores than the method used by Lee and Han [21]. (B) Comparison with known aptamers. Apta-MCTS showed the highest docking scores for this comparison.
Fig 6.
Docking scores for aptamers of various lengths with 32 target proteins in the test dataset.
In general, our model generated better aptamers than the known aptamers in the test dataset. The docking scores of our candidate aptamers are reflected by the grey bars, while the known aptamers by the white bars. For most proteins, aptamers with 70 bases and 90 bases (the 3rd and 4th bars in each protein) showed the highest docking score (Note that all the results in detail are available in S5 Table).
Table 3.
Performance evaluation of various input encoding methods using the dataset of Li et al. [18].
Table 4.
Performance evaluation of various input encoding methods using the dataset of Lee and Han [21].