Binding free energy decomposition and multiple unbinding paths of buried ligands in a PreQ1 riboswitch

Riboswitches are naturally occurring RNA elements that control bacterial gene expression by binding to specific small molecules. They serve as important models for RNA-small molecule recognition and have also become a novel class of targets for developing antibiotics. Here, we carried out conventional and enhanced-sampling molecular dynamics (MD) simulations, totaling 153.5 μs, to characterize the determinants of binding free energies and unbinding paths for the cognate and synthetic ligands of a PreQ1 riboswitch. Binding free energy analysis showed that two triplets of nucleotides, U6-C15-A29 and G5-G11-C16, contribute the most to the binding of the cognate ligands, by hydrogen bonding and by base stacking, respectively. Mg2+ ions are essential in stabilizing the binding pocket. For the synthetic ligands, the hydrogen-bonding contributions of the U6-C15-A29 triplet are significantly compromised, and the bound state resembles the apo state in several respects, including the disengagement of the C15-A14-A13 and A32-G33 base stacks. The bulkier synthetic ligands lead to significantly loosening of the binding pocket, including extrusion of the C15 nucleobase and a widening of the C15-C30 groove. Enhanced-sampling simulations further revealed that the cognate and synthetic ligands unbind in almost opposite directions. Our work offers new insight for designing riboswitch ligands.

Introduction used to investigate ligand-induced conformational changes of the aptamer domains from the guanine, adenine, and S-adenosylmethionine sensing riboswitches [27][28][29][30]. The small size of the PreQ 1 riboswitch aptamer makes it attractive for MD simulations [31][32][33]. Questions regarding binding affinity and selectivity can be addressed by binding free energy calculations, such as by the molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) method [34]. A detailed description of the ligand binding and unbinding paths provide additional insight. As ligand entrance to and exit from a buried site, as found in the PreQ 1 riboswitch, occur in timescales usually beyond the capability of conventional MD simulations [35], special techniques are required to speed up the process, such as steer MD [36][37][38] and metadynamics [39,40]. Metadynamics works by adding an external, history-dependent bias potential that acts on a selected number of collective variables.
The folding of RNA requires cations to counter the electrostatic repulsion between backbone phosphates [41,42]. Mg 2+ , due to its small radius and double charge, can not only directly  1 in an oblique view to highlight the base stacking with G11 above and with G5 and C16 below. Bottom right: top view highlighting the inplane hydrogen bonding with U6, C15, and A29. This structure was prepared using coordinates from the PDB entry 6E1W [23], with missing nucleotides copied from PDB entry 3Q50 [24]. (B) Chemical structures of PreQ 1 and PreQ 0 . (C) Chemical structures of three synthetic ligands, L 1 to L 3 .
Here we report MD simulation results on the energetics and dynamics of the Tte PreQ 1 riboswitch aptamer in complex with the cognate ligands PreQ 0 and PreQ 1 and the synthetic ligands L 1 , L 2 , and L 3 . Mg 2+ are found to be essential in stabilizing the binding pocket for the cognate ligands. By comparing and contrasting these two groups of ligands, we learn how the chemical (e.g., number of hydrogen bond donors and acceptors) and physical (e.g., molecular size) features of ligands affect binding affinity and ligand exit paths. In particular, the reduction in the number of hydrogen bond donors and acceptors from five in the cognate group ( Fig 1B) to one in the synthetic group (Fig 1C) leads to a dramatic loss in hydrogen bonding with nucleobases. The larger sizes of the synthetic group also lead to significant loosening of the binding pocket, including extrusion of the C15 nucleobase and a widening of the C15-C30 groove. Correspondingly, whereas the preferred exit of the cognate ligands is through the front door between G5 and G11, the preferred exit of the synthetic ligands is the back door between C15 and C30.

Results
We carried out a total of 153.5 μs MD simulations for the Tte PreQ 1 riboswitch aptamer bound with the cognate ligand Q 1 or Q 0 , or with the synthetic ligand L 1 , L 2 , or L 3 , or in the apo form (Table 1). Simulations were done with and without Mg 2+ in order to uncover the effects of this important ion. In addition to conventional MD (cMD), metadynamics simulations were also run to investigate the unbinding paths of ligands. The main results are presented from the simulations with Mg 2+ ; only when comparison is made we show Mg 2+ -free results. Most of the simulations were run using the AMBER ff99bsc0+χ OL3 force field [54][55][56] for RNA and the Li et al. [57] (Li13) parameters for Mg 2+ . To validate the robustness of the findings, additional simulations were run using the CUFIX force field for RNA [58] along with the Li13 parameters for Mg 2+ and the ff99bsc0+χ OL3 force field along with the Allner et al. [59] (Allner12) parameters for Mg 2+ .

Free energy decomposition reveals how physicochemical features of ligands affect binding affinity
To explain the significant difference in binding affinity between the cognate and synthetic groups of ligands, we calculated the binding free energies by applying the MM-PBSA method to the second half of eight replicate cMD simulations of each RNA-ligand complex. MM-PBSA has been successfully used on many RNA-ligand [29,30] and protein-ligand [60,61] systems. Although this method can have large uncertainties in the absolute free energy calculated for a given ligand, the relative difference in binding free energy between ligands calculated from comparative MD simulations can be very informative [62]. The binding free energy consists of five terms: where ΔE ele represents the average electrostatic interaction energy between the RNA and the ligand in gas phase; ΔE vdW is the counterpart for van der Waals interactions; ΔG pol and ΔG nonpol account for the solvent environment of the RNA-ligand complex; ΔS is the change in entropy upon binding; and T is the absolute temperature. These components and the total binding free energies for the cognate and synthetic ligands, calculated from the simulations in the presence of Mg 2+ , are listed in Table 2. MM-PBSA predicted binding free energies of~-18 kcal/mol for the cognate group and close to~1 kcal/mol for the synthetic group. Though these results exaggerate the actual difference in ΔG bind (see row with heading "ΔG exp " in Table 2), they do correctly predict the cognate group as the stronger binders. Comparing the two groups of ligands, the polar components (ΔE ele +ΔG pol ) are much more favorable (by~30 kcal/mol) to the cognate group, offset only partially (by~10 kcal/mol) by the nonpolar components (ΔE vdW +ΔG nonpol ) that favors the synthetic group. These contrasts can be easily attributed to the greater number of hydrogen bond donors and acceptors in the cognate group and the bulkier sizes of the synthetic group. To gain insight into how RNA-ligand interactions lead to the difference in affinity between the cognate and synthetic groups, we decomposed ΔG bind into contributions of the 33 nucleotides of the Tte aptamer (Fig 2A). The correlation coefficients of these individual contributions for any two ligands are nearly 1 within both the cognate and synthetic groups, but reduces to~0.6 between the groups (Fig 2A, inset table). This correlation analysis clearly indicates that the two groups of ligands are distinct. The only nucleotides that contribute more than 1.5 kcal/mol in at least one complex are the six that form either base stacking (G5, G11, C16) or in-plane hydrogen bonding (U6, C15, A29) with the ligands (Fig 1A). The base-stacking nucleotides contribute nearly the same to the binding free energies of the two groups of ligands, but the hydrogen-bonding nucleotides differ by 1.8, 5.7, and 2.4 kcal/mol, respectively, Top: illustration of when a nucleobase is ("in") or is not ("out") in a stacking position with the ligand. A rectangle is drawn around the ligand rings atoms, with a minimum of 0.5 Å separation (shown by red lines). A cytosine is in an "in" position, with vertical distance drawn as a solid line, as the projection of its center is inside the rectangle; a guanine is in an "out" position, with vertical distance drawn as a dashed line, as the projection of its center is outside the rectangle. Bottom: in-fractions of three nucleobases and their average vertical distances from the ligand rings. in their contributions to the two groups. This result identifies in-plane hydrogen bonding as the dominant factor for the difference in affinity between the cognate and synthetic groups.
The MM-PBSA results calculated from the simulations in the absence of Mg 2+ are presented in S1 Table and S1A Fig. The qualitative differences between the cognate and synthetic ligands described above are also valid in the Mg 2+ -free simulations. The only major change is that the binding free energies of the cognate group become less favorable by~10 kcal/mol, which come entirely from the polar components. The stabilization of cognate ligand binding by Mg 2+ is explained below.

Binding poses of the five ligands differ in overt and subtle ways
Next we present structural differences around the binding pocket among the aptamer-ligand complexes in the cMD simulations. To characterize base stacking, we calculated the fraction ("in-fraction") of snapshots where a nucleobase falls within a rectangle around the ring atoms of the ligand, and among these snapshots, the vertical distance between the center of the nucleobase and the rectangle (Fig 2B, top). For the cognate ligands, the in-fractions of G5, G11, and C16 all are close to 100%, but for the synthetic ligands, the in-fractions of G5 decrease to between 25% to 60% (Fig 2B, bottom). Among the in-fractions, the distances between the nucleobases and the ligand rings are about 3.5 Å, the van der Waals contact distance for carbon atoms. A subtle but consistent difference within the cognate group is that all the three nucleobases have slightly higher in-fractions and shorter distances, thus implicating a tighter binding pocket, for Q 0 than for Q 1 . The tighter binding pocket for Q 0 is consistent with the more favorable van der Waals interaction energy and the higher entropic cost listed in Table 2. In the absence of Mg 2+ , the in-fractions of G5 decrease to below 50% for the cognate group (S1B Fig). The effect of Mg 2+ on base stacking is more subtle for the synthetic group, with the in-fractions of G11 and C16 decreasing to between 71% to 85%.
The three nucleobases, U6, C15, A29, form six in-plane hydrogen bonds with the cognate ligands in the crystal structures ( Fig 1A) [23][24][25][26]. These hydrogen bonds are well maintained in the cMD simulations with Mg 2+ (Table 3). Fig 2C and 2D shows the typical poses of Q 1 and Q 0 , respectively, in the binding pocket. In particular, the C15 nucleobase remains parallel to the ligand rings and forms three stable hydrogen bonds with Q 1 and Q 0 , pairing N4, N3, and O2 of C15 with O1, N1, and N5 of Q 1 and Q 0 (S1 Movie). About one third of the time, O2 of C15 and N1 of Q 1 and Q 0 form an additional hydrogen bond. In contrast, the rings of the synthetic ligands have a single hydrogen bond donor or acceptor (compared with five for the cognate ligands) and forms a single hydrogen (Table 3). For L 1 and L 2 , the O2 acceptor pairs with the A29 N6 donor, and the C15 nucleobase extrudes into an orthogonal orientation, to accommodate the synthetic ligands' bulkier size (Fig 2E and 2F). In the crystal structures of the aptamer bound with the synthetic ligands [23], L 3 is positioned similarly to L 1 and L 2 and also hydrogen bonds with A29. However, unlike L 1 and L 2 , L 3 is a donor, not acceptor, and correspondingly the partner changes from N6 to N1 of A29. As a result of this change in hydrogen bonding partner, the rings of L 3 move closer to C30 (Fig 2G). In the cMD simulations, hydrogen bonding with A29 is found in only 27.6% of the snapshots (Table 3). Instead, in 55.2% of the snapshots, L 3 moves laterally to hydrogen bond with another acceptor, i.e., O4 of U6, allowing the C15 nucleobase to keep its parallel orientation ( Fig 2H). We label the A29-hydrogen bonded minor and the U6-hydrogen bonded major poses as L 3m and L 3M , respectively. The L 3m and L 3M poses readily interconvert, with multiple transitions observed in each simulation (S2 Fig and S2 Movie). Q 1 differs from Q 0 by the substitution of a methylamine for a cyano ( Fig 1B). In 40.5% of the snapshots, this methylamine hydrogen bonds with O6 or N7 of G5; in another 32.6% of the snapshots the hydrogen bond partner switches to N7 of G11 (Table 3). Thus the methylamine group of Q 1 , by changing the C3-C5-C6-N3 torsion angle (S1 Movie), alternates its hydrogenbonding partner between G5 and G11. In the recent crystal structure [Protein Data Bank (PDB) entry 6VUI] [25], the methylamine group hydrogen bonds with G5, though the authors did consider but dismissed G11 as an alternative partner. L 1 also has a methylamine, and it too can hydrogen bond with O6 or N7 of G5 (Table 3). However, the methylamine in L 1 is more separated from nearest ring than in Q 1 and this greater separation prevents hydrogen bonding with G11. Indeed, when the L 1 methylamine hydrogen bonds with G5, the rings are removed from G5 and this explains why the in-fraction of G5 for L 1 is only about one half of that for L 2 ( Fig 2B).
Mg 2+ significantly affects the hydrogen bonding of the cognate ligands with C15 (Table 3; Figs 2C and 2D and S1C and S1D). Whereas C15 and Q 1 / Q 0 form three stable hydrogen bonds in the simulations with Mg 2+ , only one stable hydrogen bond is formed in the absence Mg 2+ , between C15 O2 and Q 1 / Q 0 N1 (or N5). This happens as the C15 nucleobase moves and tilts away from the ligand rings (see below). Since the synthetic ligands do not hydrogen bond with C15, their in-plane hydrogen bonding is not affected by Mg 2+ (Table 3; Figs 2E-2H and S1E-S1H).

How does Mg 2+ stabilize aptamer binding of cognate ligands?
All our MD simulations were carried out before the release of the recent crystal structures of the Tte aptamer in apo form and bound with Q 1 (PDB entries 6VUH and 6VUI) [25], in which three and two Mn 2+ ions, respectively, were located in the major groove lining the ligand binding pocket. These crystal metal sites thus allowed us to test the MD simulations, with seven Mg 2+ ions initially placed using MCTBI [52,53]  We label these sites as M1, M2 (M2'), M3, and M4. We also tested initial Mg 2+ ion placement in the apo form using the Leap module in AMBER18 [63], which finds ion binding sites on a grid according to the Coulomb potential of the RNA. The Mg 2+ densities in these simulations are similar to those started with MCTBI placement (S4 Fig), demonstrating that the Mg 2+ ion sites found in the cMD simulations are robust and insensitive to the precise initial placement. Each Mg 2+ ion coordinates with six polar groups (Fig 3B, top panel). Top: coordination of Mg 2+ by six water molecules and the latter's hydrogen bonding with the G3 and G4 nucleobases and A14 phosphate. Bottom: two distances, d [4][5][6][7][8][9][10][11][12][13] (between G4 O6 and A13 OP1) and d [14][15][16] (between A14 P and C16 N4), introduced to characterize the effects of the Mg 2+ ion. (C) Probability densities of d 4-13 and d [14][15][16] , in cMD simulations with and without Mg 2+ ions. (D) Effect of Mg 2+ ions on the separation of the L2 loop from the S1 helix in the Q 1 -bound form. Two representative structures are superimposed, with the aptamer in the presence of Mg 2+ shown in the same multi-color scheme as in Fig 1A and  Of particular importance is a deep site, M2, in the complexes with the cognate ligands, where Mg 2+ bridges between G4 and G3 on the S1 helix and A13 on the L2 loop ( Fig 3B). To present a full picture of this bridging effect in the MD simulations, we monitored two distances between S1 and L2: d 4-13 between A4 O6 and A13 OP1; and d [14][15][16] between A14 P and C16 N4 (Fig 3B, bottom panel). The probability densities of these two distances both peak around 8.5 Å (Fig 3C). In the simulations without Mg 2+ , the peaks shift to larger distances by 2 to 4 Å, indicating a greater separation of L2 from S1. The increased separation is also evident when representative structures from the simulations of the Q 1 -bound aptamer with and without Mg 2+ are superimposed (Fig 3D, top panel). The further separation of L2 from S1 creates new room for C15 (Fig 3D, bottom panel), which as noted above moves and tilts away from the cognate ligand rings in the absence of Mg 2+ (compare Figs 2C and S1C). Very similar differences are also observed in the simulations of the Q 0 -bound aptamer with and without Mg 2+ (Figs 3C and S3E; also compare Figs 2D and S1D). Therefore Mg 2+ is essential in maintaining C15 in a position to form stable hydrogen bonds with the cognate ligands.

Simulations with alternative force fields confirm the differential stabilization of aptamer complexes with cognate and synthetic ligands
The simulations reported above were run using ff99bsc0+χ OL3 for RNA and Li13 for Mg 2+ . To check the robustness of the resulting findings regarding the energetic and structural differences between the aptamer complexes bound with cognate and synthetic ligands, we ran additional simulations using alternative force fields for RNA and Mg 2+ . The alternative RNA force field was CUFIX [58], which was based on ff99bsc0 but with modifications for Lennard-Jones parameters of selected atom types, including phosphate oxygen atoms. The alternative parameter set for Mg 2+ was Allner12 [59], which was optimized to fit experimental data for residence times of coordinated water molecules. For both the Q 1 -and L 1 -bound aptamers, we ran simulations using CUFIX paired with Li13 and using ff99bsc0+χ OL3 paired with Allner12.
In S2 Table, we compare the binding free energies of Q 1 and L 1 from the ff99bsc0+χ OL3 / Li13 original simulations with those from the CUFIX/Li13 and ff99bsc0+χ OL3 /Allner12 simulations. The large difference in binding free energy between Q 1 and L 1 from the original simulations are reproduced by the additional simulations. Moreover, upon decomposition, the contributions of individual nucleotides correlate extremely well among the three force field combinations, with correlation coefficients at 0.98 to 1.00.
The probabilities of Q 1 and L 1 hydrogen bonding with nucleotides in the original simulations are also well reproduced in the additional simulations (S3 Table). The probabilities of two hydrogen bonds, between the O1 atom of Q 1 and the N4 atom of nucleotide C15 and between the N1 atom of Q 1 and the N3 atom of this nucleotide, have modest reductions (from 90% to~70%) in the additional simulations, compensated by slightly higher probabilities of three other hydrogen bonds. Lastly, the Mg 2+ densities in the additional simulations of the Q 1and L 1 -bound aptamers are very similar to those in the original simulations (S5 Fig). Again, Mg 2+ densities in the Q 1 -bound form have a single peak around the M2 site as found in the Q 1 -bound crystal structure (PDB entry 6VUI), but two peaks, around M2 and M2', as found in the apo crystal (PDB entry 6VUH). Below we return to results obtained by simulations using ff99bsc0+χ OL3 /Li13.

The Aptamer bound with synthetic ligands resembles the apo state
Connolly et al. [23] recognized two key differences between the Q 1 -and L 1 -bound structures and concluded that the latter structure is similar to the apo state. The first, already described above, is the extrusion of the C15 nucleobase into an orthogonal orientation (Fig 2C and 2E).
The second is a farther separation of the A32 and G33 nucleobases from the ligand rings in the L 1 -bound structure. However, the latter crystal structure was determined with the A13 and A14 nucleobases removed, and thus represented an incomplete picture. Our MD simulations of the complete aptamer now provide strong additional support for the conclusion that the structures bound with the synthetic ligands are similar to the apo state.
First, the L2 loop adopts distinct conformations in the apo and Q 1 -bound structures (PDB entries 6VUH and 3Q50 [24,25]; Fig 4A, left panel). In the apo structure, L2 is closer toward the ligand binding pocket to allow the insertion of the A14 nucleobase into the pocket. L2 in our simulations of the L 1 -bound aptamer adopts a similar "close" conformation, although the A14 nucleobase is extruded to accommodate the presence of the ligand (Fig 4A, right panel). Second, as noted above, Mg 2+ densities in our simulations of the Q 1 -bound aptamer overlap with Mn 2+ ions found in the Q 1 -bound structure (PDB entry 6VUI) [25] (Figs 3A; 4B, left panel; and S5, top row). In contrast, Mg 2+ densities in the L 1 -bound aptamer overlap with Mn 2+ ions found in the apo structure (PDB entry 6VUH) [25] (Figs S3D; 4B, right panel; and S5, bottom row). Finally and most importantly, in our simulations of the Q 1 -bound aptamer, three nucleotides from the L2 loop, C15, A14, and A13, maintain continuous base stack with two nucleotides, A32 and G33 (the Shine-Dalgarno sequence), in the S2 helix (Fig 4C, left  panel). The continuous base stack is crucial for inhibiting gene expression by sequestrating the Shine-Dalgarno sequence from recognition by the ribosome [25]. However, in the simulations of the L 1 -bound aptamer, the L2 nucleotides and the S2 nucleotides form two separate stacks (Fig 4C, right panel). Both A14 and A13 take the orthogonal orientation of C15 to form a base stack that is disengaged from the A32-G33 stack. In the apo crystal structure, both C15 and A13 have the orthogonal orientation (Fig 4A, left panel).
To gain a global sense on the difference in conformational space sampled by the Q 1 -and L 1 -bound aptamers, in Fig 4D we present their density maps over two functionally important collective variables. One of these variables is the distance of the A32 and G33 nucleobases from the binding pocket ( Fig 4B); the other is the average angle between the A32 and G33 nucleobases and the A13, A14, and C15 nucleobases (Fig 4C). For the Q 1 -bound aptamer, the highest density occurs at a distance of 13.5 Å and an angle of 27.5˚. For the L 1 -bound aptamer, the peak density moves to a larger distance of 14.5 Å and a much larger angle of 80.5˚. These differences are already illustrated in Fig 4B and 4C. However, the density maps cover relatively broad regions, with other local minima.

The synthetic ligands lead to loosening at the back of the binding pocket
We now examine the total volume explored by the atoms of each ligand in the cMD simulations, by calculating the density contour of the ligand (Fig 5A-5C). In line with the foregoing observation that the binding pocket is tight for Q 0 (Fig 2B, bottom), this cognate ligand shows a very compact density contour (Fig 5A, green). The density contour of Q 1 (Fig 5A, red) is slightly expanded around the rings, and there is also extra density for the methylamine "head" group.
The density contours of L 1 and L 2 are further expanded, at both the back (facing the C15-C30 groove) and the front (facing the G5-G11 groove) (Fig 5B). At the back, the expansion, due to the additional ring, would clash with the C15 nucleobase and leads to its extrusion into an orthogonal orientation. At the front, the expansion is largely due to the longer head group (compare Fig 1B and 1C). This front expansion, both longitudinally and laterally, is especially prominent for L 2 , which has two extra terminal methyls in the head group. The aforementioned possibility of the L 1 methylamine hydrogen bonding with G5 leads to a retraction of the rings toward the back, accounting for the greater back expansion of L 1 relative to L 2 . As described above, L 3 readily switches between two poses, L 3m (hydrogen bonding with A29) and L 3M (hydrogen bonding with U6) (S2 Fig and S2 Movie). The density contour of L 3m (Fig 5C, red) moves slightly farther toward the back than that of L 1 , due to the change in hydrogen bonding partner from N6 to N1 of A29 (Fig 2E and 2G). Meanwhile, the density contour of L 3M (Fig 5C, green) moves toward the front, passing that of L 2 .
The deeper penetration into the back of the binding pocket by L 1 , L 2 , and L 3m relative to the cognate ligands are illustrated by snapshots shown in Figs 5D-5F and S6A. In one of the eight simulations of the L 3 -bound aptamer (S2 Fig, MD4), a major portion of the ligand rings is even transiently positioned outside the back door between C15 and C30 (Fig 5F). In two simulations without Mg 2+ , L 3 escaped altogether through the back door.

Cognate and synthetic ligands unbind through opposite pathways
The fact that Q 1 and Q 0 are exposed only at the front suggests that the cognate ligands bind and unbind through the front door. On the other hand, the exposure of the synthetic ligands at both the front and the back and the (partial) escape of L 3 through the back door suggest that these ligands may enter and exit through both doors. To investigate the unbinding and rebinding pathways, we carried out 15 well-tempered metadynamics [40] simulations for each ligand. In these simulations, biases were introduced to flatten the potential of mean force along a collective variable, here defined as the distance, r, from the center of the ligand to the center of the binding pocket (lined by the six nucleotides G5, U6, G11, C15, C16, and C30). The biases were gradually reduced during the 500 ns simulations. The trajectories of the ligands are illustrated in Figs 6A and S7A.
To determine whether unbinding or rebinding occurred through the front door (between G5 and G11) or back door (between C15 and C30) (Fig 6B), we introduced a plane passing through the C1' atoms of U6 and A10 and the C4 atom of C15, which bisects the binding pocket (Figs 6A and S7A). We defined its normal vector, pointing from the back to the front of the binding pocket, as the z axis. Along each ligand trajectory, we monitored both r and its z component (Figs 6C and S7B). The first increase of r to 9 Å was labeled as an unbinding event, whereas the next decrease of r to 1 Å was labeled as a rebinding event; and this label was repeated till the end of the trajectory. Depending on whether z was positive or negative when the unbinding or rebinding event occurred, the passage was through the front door or back door. Due to the large biases at the beginning of each simulation, the ligand very rapidly left the binding pocket. We started counting only from the subsequent rebinding event.
The total numbers of unbinding and rebinding events for each ligand are listed in Table 4. For the cognate ligands, unbinding shows a significant preference for the front door. Q 1 has 14 events through the front door but only 4 events through the back door (S3 Movie); for Q 0 , the counts are 11 versus 6. In contrast, the overwhelming preference for synthetic ligand unbinding is through the back door, with a total of 63 events (S4 Movie). The total number of frontdoor unbinding events is only 9 for these ligands.
The opposite preferences of the cognate and synthetic ligands also carry over to rebinding, though the preferences are somewhat blunted for both types of ligands. For cognate ligand rebinding, there is only a modest preference for the front door (11 events each for Q 1 and Q 0 , compared with 9 and 7 events, respectively, for the back). For the synthetic ligand rebinding, the total number of events through the back door is 61, while the counterpart through the front door is 17. The preference for the back door is still significant but not as overwhelming as found for unbinding.

Discussion
We have investigated the molecular determinants underlying the binding of the PreQ 1 riboswitch aptamer to cognate and synthetic ligands by combining conventional MD simulations, free energy decomposition, and metadynamics simulations. The analyses on structural, energetic, and dynamic properties have advanced our understanding on both the overt differences between the cognate and synthetic groups of ligands and subtle differences within each group  PLOS COMPUTATIONAL BIOLOGY of ligands. In particular, the reduction of hydrogen bond donors and acceptors is the main reason for the decreased binding affinities of the synthetic ligands, while the increase in rings resulting in the opening of a back door to the binding pocket. Our work has demonstrated the power of molecular dynamics simulations in complementing structure determination and binding assays to provide crucial missing links. For example, the L2 loop is essential both for stabilizing ligands and for communicating ligand binding to the Shine-Dalgarno sequence for downstream signaling. Yet this loop is highly dynamic and nucleobases in the loop were cleaved [23] or subject to distortion by crystal contacts [25] in structure determination. Our MD simulations have now shown how this loop responds to the synthetic ligands (by extruding to form a base stack orthogonal to that formed when bound with the cognate ligands; Fig 4C) or to Mg 2+ (which ties L2 to S1 to keep C15 in a position to form stable hydrogen bonds with the cognate ligands; Fig 3D). Moreover, our MD simulations have found that the methylamine head group of Q 1 samples different torsion angles to alternate its hydrogen-bonding partner between G5 and G11, and L 3 samples different poses to alternate its hydrogen-bonding partner between U6 and A29. Most interestingly, our MD simulations have revealed that ligands can enter and exit the binding pocket through multiple pathways and the cognate and synthetic ligands have opposite preferences. We hope that both our specific lessons on the PreQ 1 riboswitch and our approach will provide guidance for designing riboswitch ligands in the future.
Our MD simulations, totaling 153.5 μs, can be considered extensive. Some of the simulations were added specifically to validate the robustness of the findings. For example, we compared two methods for initial placement of Mg 2+ ions, and found very similar Mg 2+ densities in the subsequent simulations. In addition, we compared energetic and structural properties calculated from three different force field combinations for RNA and Mg 2+ , and found the results to agree well with each other. Our simulations are also directly validated by experimental data-the Mg 2+ densities in the simulations, started from earlier crystal structures without metal ions, match well with those found in recent crystal structures.
A number of experiments can be designed to further test the predictions of our MD simulations. First, we have found that the number of hydrogen bond donors and acceptors is a key determinant for RNA binding affinity, and in large part explains the weaker affinities of the synthetic ligands. This conclusion can be tested by synthesizing analogs of L 1 , L 2 , and L 3 with C-C or C-H bonds replaced by polar covalent bonds. Second, we have predicted that the Q 1 methylamine can form alternative hydrogen bonds, with either the G5 nucleobase or the G11 nucleobase. Moreover, the entire L 3 ligand can have alternative poses, with hydrogen bonding to either the U6 nucleobase or the A29 nucleobase. These predictions can be tested by NMR experiments. Lastly, the opposite unbinding pathways of the cognate and synthetic ligands revealed by our simulations can also be tested experimentally. One potential method is to use small molecules called minor groove binders, to selectively block the front exit or the back exit. If our simulation results are correct, a minor groove binder that blocks the front exit will significantly impede the unbinding of the cognate ligands but have little effect on the unbinding of the synthetic ligands, while the opposite is expected for a minor groove binder that blocks the back exit. A second potential method is to introduce nucleobase analog FRET-pairs [64], which might be able to capture transient opening of the front or back exit.

Preparation of molecular systems
Initial structures of the PreQ 1 riboswitch aptamer bound with Q 1 , L 1 , L 2 , and L 3 were taken from PDB entries 6E1W, 6E1S, 6E1U, and 6E1V, respectively [23]. The missing nucleotides were transplanted from an earlier Q 1 -bound structure in PDB entry 3Q50 [24]. The Q 0 -bound complex was generated from the Q 1 -bound complex by substituting the ligands. The apo form was generated by stripping Q 1 from the Q 1 -bound complex. Missing hydrogen atoms of the aptamer were added by using the Leap module in AMBER18 [63]. The structures of the ligands were optimized using the Gaussian 16 program [65] at the HF/6-31G � level. Note that our MD simulations were completed before the release of PDB entries 6VUH and 6VUI containing the apo and Q 1 -bound structures, respectively, in which Mn 2+ ions were resolved.
Each structure was then solvated in a truncated octahedron periodic box of TIP3P [66] water molecules with a 12 Å buffer. Systems were prepared both without and with Mg 2+ . The primary method for adding Mg 2+ was MCTBI [52,53], which identified seven sites. Alternatively, we placed 16 Mg 2+ ions in the solvent by using the Leap module. Additional Na + ions were added to neutralize the charges of systems with and without Mg 2+ ions.
Force-field parameters of the ligands were from the restrained electrostatic potential charges and the general Amber force field [67]. The parameters for Na + ions were from Joung and Cheatham [68]. In most of the simulations, the force field for RNA, denoted as ff99bsc0+χ OL3 , was an improved version of AMBER ff99 [54], with correction for α/γ dihedrals (bsc0) [55] and correction for χ dihedrals (χ OL3 ) [56]. The parameters for Mg 2+ ions were from Li et al. [57] (Li13). Some simulations were also repeated using two other force field combinations for RNA and Mg 2+ . One paired the CUFIX force field for RNA [58] with Li13 for Mg 2+ ; the other paired the ff99bsc0+χ OL3 force field for RNA with Mg 2+ parameters from Aller et al. [59] (Allner12).

Conventional MD simulations
All cMD simulations were carried out by running the AMBER18 package [63]. Each system was minimized by 2500 steps of steepest descent and 2500 steps of conjugate gradient. The system was then heated from 100 K to 300 K over 50 ps and maintained at 300 K for 50 ps under constant volume. Subsequently the simulation was at constant temperature and pressure for 50 ps to adjust the solvent density. Up to this point, harmonic restraints at a force constant of 5 kcal/mol�Å 2 were imposed on all solute atoms except those on the L2 loop and nucleotides 16 and 33. The last step of equilibration was simulation at constant temperature and pressure for 1 ns, without any restraint. The temperature (300 K) was regulated by the Langevin thermostat [69] and pressure (1 atm) was regulated by the Berendsen barostat [70]. Covalent bonds involving hydrogen atoms were treated with the SHAKE algorithm [71] to allow for a time step of 2 fs. The particle mesh Ewald method [72] was applied to treat long-range electrostatic interactions, with the nonbonded cutoff at 12 Å. Eight replicate simulations of all the systems were carried out using ff99bsc0+χ OL3 /Li13 for 1 μs at constant temperature and pressure. Additionally, four replicate simulations of the Q 1 -and L 1 -bound forms were carried out using both the CUFIX/Li13 and the ff99bsc0+χ OL3 /Allner12 force fields. Snapshots were saved every 10 ps for later analysis.

Binding free energy calculations
Binding free energies and the decomposition into contributions of individual nucleotides were obtained by the MM-PBSA method [34]. For each snapshot of the simulations, the binding free energy was calculated according to Eq (1), where "Δ" means the difference between the complex and the separated RNA and ligand. ΔE ele was from the Coulomb interactions between the RNA and ligand partial charges, and ΔE vdW was from the van der Waals interactions between RNA and ligand atoms. Both ΔE ele and ΔE vdW were calculated without applying a cutoff. ΔG pol was calculated by a finite-difference solution of the Poisson-Boltzmann equation at 0 salt concentration. A cubic grid with 0.4 Å spacing was employed, and 5000 linear iterations were performed. The solute and solvent dielectric constants were 1 and 80, respectively. The dielectric boundary between solute and solvent was the molecular surface defined by a 1.4 Å probe radii. ΔG nonpol was estimated from the solvent-accessible surface area (SASA, also using a 1.4 Å probe radii) as γSASA+β [73], where the surface tension constant γ and the correction constant β were 0.00542 kcal/mol�Å 2 and 0.92 kcal/mol, respectively. The atomic radii for both ΔG pol and ΔG nonpol calculations were taken from the PARSE parameter set [74].
Entropies were obtained using the nmode module of AMBER18 [63]. Prior to the normal mode calculation, each snapshot was energy-minimized using conjugated gradient in vacuum without cutoff for nonbonded interactions. A distance-dependent dielectric constant of 4r was used to mimic solvent screening. The minimization was stopped either when the root-meansquare of the elements of the gradient vector reached 10 −4 kcal/mol�Å or the number of cycles reached 50000. After performing normal mode analysis, the vibrational entropy was obtained by adding up the contributions from all the nonzero-frequency normal modes, each treated as a quantum harmonic oscillator. The translational and rotational entropies were then added to yield the total entropy. This entropy calculation was done once for the RNA-ligand complex, once for the RNA (with the ligand stripped), and once for the ligand (with the RNA stripped). Finally ΔS was obtained as the difference in entropy between the complex and the separated RNA and ligand.
From each replicate simulation, 2500 snapshots were extracted from the second 500 ns, resulting in a total of, e.g., 20,000 snapshots among 8 replicates, for the MM-PBSA calculations on each ligand. Decomposition into contributions of individual nucleotides was done only for the enthalpic components (ΔE ele , ΔE vdW , ΔG pol and ΔG nonpol ).

Other analyses
Vertical distance in Figs 2B and z in 6B were calculated using tcl scripts in VMD [75]. All other distances and hydrogen bond formation were determined by the CPPTRAJ program [76]. Hydrogen bond criteria were donor-acceptor distance < 3.5 Å and donor-H-acceptor angle > 120˚. Densities of ligands and of Mg 2+ were calculated from saved snapshots of the cMD simulations by using the water-hist program in the LOOS package [77].

Well-tempered metadynamics simulations
Well-tempered metadynamics simulations were as described [40]. In this method, Gaussian functions were applied to fill up wells in the potential of mean force for a collective variable. We chose the distance r from the center of the ligand to the center of the binding pocket (defined by the six nucleotide G5, U6, G11, C15, C16, and C30) as the collective variable. The Gaussian widths was set to 0.5 Å, and the Gaussian hills height was initially set at 1.2 kcal/mol and was gradually decreased with a bias factor of 15 over the course of the simulation. A soft harmonic restraining potential was also applied on the center of the ligand to keep the ligand close to the RNA. Fifteen metadynamics simulations using ff99bsc0+χ OL3 /Li13 were run for 500 ns each, utilizing the PLUMED v2.3 [78] plugin to the GROMACS 5.1.2 package [79].
Supporting information S1  In each panel, a horizontal dashed line is drawn at 3.5 Å. The horizontal bar at the bottom is colored red or blue, according to whether the U6 or the A29 distance is < 3.5 Å. The blue sections are raised slightly to better distinguish from the red sections. The bottom right panel shows an enlarged view of the blue and red sections of the horizontal bar. The MD4 simulation is special as the ligand partially slipped through the back door around 500 ns, pushing A29 out of the binding pocket; the ligand rings then flipped and retracked, leading to large distances from A29. Accordingly the upper bound of the ordinate is increased from 11 Å to 20 Å.