Predicting the mechanism and rate of H-NS binding to AT-rich DNA

Bacteria contain several nucleoid-associated proteins that organize their genomic DNA into the nucleoid by bending, wrapping or bridging DNA. The Histone-like Nucleoid Structuring protein H-NS found in many Gram-negative bacteria is a DNA bridging protein and can structure DNA by binding to two separate DNA duplexes or to adjacent sites on the same duplex, depending on external conditions. Several nucleotide sequences have been identified to which H-NS binds with high affinity, indicating H-NS prefers AT-rich DNA. To date, highly detailed structural information of the H-NS DNA complex remains elusive. Molecular simulation can complement experiments by modelling structures and their time evolution in atomistic detail. In this paper we report an exploration of the different binding modes of H-NS to a high affinity nucleotide sequence and an estimate of the associated rate constant. By means of molecular dynamics simulations, we identified three types of binding for H-NS to AT-rich DNA. To further sample the transitions between these binding modes, we performed Replica Exchange Transition Interface Sampling, providing predictions of the mechanism and rate constant of H-NS binding to DNA. H-NS interacts with the DNA through a conserved QGR motif, aided by a conserved arginine at position 93. The QGR motif interacts first with phosphate groups, followed by the formation of hydrogen bonds between acceptors in the DNA minor groove and the sidechains of either Q112 or R114. After R114 inserts into the minor groove, the rest of the QGR motif follows. Full insertion of the QGR motif in the minor groove is stable over several tens of nanoseconds, and involves hydrogen bonds between the bases and both backbone and sidechains of the QGR motif. The rate constant for the process of H-NS binding to AT-rich DNA resulting in full insertion of the QGR motif is in the order of 106 M−1s−1, which is rate limiting compared to the non-specific association of H-NS to the DNA backbone at a rate of 108 M−1s−1.

Introduction Bacterial chromosomal DNA is organized within the nucleoid, which is distinctly different from the cytoplasm. The organization of the nucleoid in bacteria involves a group of DNA binding proteins known as architectural proteins. The Histone-like Nucleoid Structuring protein (H-NS) is a architectural protein occurring in Gram-negative enterobacteria and plays a key role in the genome organization. H-NS can structure DNA by binding to two separate DNA duplexes or to adjacent sites on the same duplex, depending on external conditions [1][2][3][4][5][6]. In addition, H-NS is a global regulator of transcription as it binds to promoter regions [7][8][9]. As H-NS has a preference to bind to foreign genetic material, it functions as a xenogeneic silencer. Activation of foreign H-NS-silenced genes occurring in response to lethal environmental conditions links H-NS to bacterial stress resistance and virulence [10][11][12][13].
H-NS is a relatively small protein composed of 137 amino acid residues that comprises two domains: the oligomerization domain and the DNA binding domain. The first 83 residues represent the oligomerization domain and is composed of four helices [14]. This domain contains two sites for homodimerization and can multimerize into higher order structures [15]. At low concentrations, H-NS primarily exists as a dimer [1]. Residues 89-137 make up the DNAbinding domain, which consists of an antiparallel β-sheet, an α helix and a 3 10 helix [15,16]. NMR experiments on the full-length H-NS protein indicate that the oligomerization domain and DNA binding domain function independently, suggesting that a flexible linker connects the two [17]. This region, residues 65-93, contains the most divergent amino acid sequence of H-NS-related proteins and is composed of amino acids that typically occur in linkers [17,18]. However, removing positively charged residues from the linker abrogates DNA binding by H-NS [19].
The loop (residues 112-114) between one β strand and the α helix of the DNA-binding domain contains a conserved three amino acid sequence: QGR [4,17]. NMR experiments indicate that this motif interacts with the minor groove of DNA [16] in a manner similar to other H-NS related proteins, such as Ler and Lsr2 [16,20,21]. Gel electrophoresis experiments revealed that H-NS binds to curved DNA [22,23]. Several experiments showed that H-NS prefers to bind to conserved nucleotide sequences that are rich in AT and tend to be curved [24][25][26][27][28]. Recent studies using either protein binding microarrays or chromatin immunoprecipitation reveal that H-NS has a high affinity for AT-rich sequences with short A-tracts interrupted by TpA steps [3,10,24,28]. This is in agreement with previously reported high-affinity H-NS binding sites [25]. Varying the relative location of high affinity H-NS binding sites on plasmids with respect to each other results in different topologies of the resulting plasmid and H-NS complexes [29].
To date, highly detailed structural information on the H-NS-DNA complex is still lacking. Molecular simulation can complement experiments by providing information in atomistic detail. A coarse grained approach highlighted the relevance of protein flexibility in forming DNA bridges, yet lacks full atomistic insights [30,31], that are necessary in order to consider eventual structural modifications of complexes [32,33]. All-atom molecular dynamics (MD) simulations follow a molecular system in time, and can provide the required resolution in both space and time to characterize the binding of H-NS to DNA. Recently, an MD exploration of the conformational space of an H-NS dimer showed flexible regions in the connectors between the dimerization domains and the DNA-binding domain [6]. The region linking the DNA-binding domain to the N-terminal region of H-NS is shown to be involved in DNA binding [19].
However, the relatively large system size in combination with the slow interaction dynamics in the order of microseconds to seconds, only allows for qualitative predictions, as most of the simulation time is spent with the system being in a stable state. Quantitative predictions, such as the rate constant associated with the binding of H-NS to DNA, require many transitions from one stable state to another. Assuming a binding rate constant in the order of 10 6 M −1 s −1 , observing a single transition would require at least 1 μs of molecular dynamics simulations. A reasonably accurate estimation of the rate constant, therefore, would require simulation times in the order of seconds, which is currently impossible. Focusing on the transitions regions by avoiding long waiting times in stable states enables a directed end efficient sampling of the transitions [34]. Path sampling techniques achieve this focusing on the transitions by performing an importance sampling procedure in trajectory space [34,35]. New relevant paths are generated by running relatively short MD trajectories within the transition region, providing a speed up of several orders of magnitude [35,36]. To compute the reaction rate, we use replica exchange transition interface sampling (RETIS) [35]. The method requires the definition of multiple interfaces λ i along the estimated reaction coordinate. For each interface, reactive (from state A to state B) and non-reactive trajectories (from state A back to state A) are collected, from which the probability to reach the next interface can be computed. The product of these probabilities gives the reaction rate [37,38].
In the first part of this paper, we use all-atom molecular dynamics to characterize the different ways in which the DNA-binding domain of H-NS can bind to a high affinity nucleotide sequence AATATATT based on known H-NS binding sites [3,10,24,28], containing two AT steps. In the second part of the paper, we provide mechanistic insights and a prediction of the rate of the binding of H-NS to a high affinity DNA sequence by means of RETIS simulations.

MD setup
We performed Molecular Dynamics (MD) simulations of the following systems in explicit water: H-NS; dsDNA with nucleotide sequence GCAATATATTGC; and H-NS with GCAA-TATATTGC. The solution NMR structure of the DNA-binding domain of Salmonella typhimurium H-NS-like protein Bv3F (residues 91-139, PDB code 2L93) [16] served as a starting structure for H-NS. As the N-terminal end of this domain is connected to a linker in the full length protein, we placed an acetyl cap on the N-terminus to neutralize its charge. An initial structure for the dsDNA was obtained from the make-na sever developed by Stroud, J. (2006), which models any nucleotide sequence as an ideal B-DNA structure. We chose as a high affinity sequence AATATATT based on known H-NS binding sites [3,10,24,28], Containing two AT steps. This sequence is capped with GC base pairs at both ends to lower the probability of base opening at the DNA ends.
The coordinates of the dsDNA and the C-terminal domain of H-NS were combined resulting in an initial minimum separation distance between H-NS and DNA of 2.0 nm, and are therefore not directly in contact with each other at the start of the MD simulations. Fig 1A  shows a snapshot of the initial configuration. Preparation of the system consisted of placing the structures in a periodic dodecahedron box, with the box boundaries at least 1 nm from the system, followed by the addition of water molecules. To mimic experimental conditions [16] and neutralize the system, we added 50 mM NaCl by replacing water molecules with ions. Interactions between atoms are described by the AMBER03 force field [39] in combination with the TIP3P water model [40]. We selected this particular force field as it contains topologies for both amino acids and nucelotides and performs reasonably well, as long as no major conformational changes are involved [41]. For non-bonded interactions, both van der Waals and electrostatic, we used a cut-off at 0.8 nm. Long range electrostatic interactions were handled by the Particle Mesh Ewald method [42,43] with a grid spacing of 0.12 nm. To remove unfavorable interactions we performed energy minimization using steepest descents. By applying position restraints on the heavy atoms of the protein and DNA with a force constant in each direction of 1000 kJ/mol nm 2 and performing 0.1ps of MD at a temperature of 300K and a pressure of 1 bar, we relaxed the water and ions around the initial structures. After preparation, we performed twenty 50 ns MD runs for the H-NS-DNA system, varying initial conditions by assigning new random starting velocities drawn from the Maxwell-Boltzmann distribution at 300K. All simulations were performed with the GROMACS software suite, versions 4.5.3 and 4.5.4 [44,45] at the Dutch National Supercomputer, Scinet [46], and a locally maintained cluster, with the leap-frog integration scheme and a time step of 2 fs, using LINCS [47] to constrain bonds in the protein and Predicting the mechanism and rate of H-NS binding to AT-rich DNA SETTLE [48] to constrain water bonds. All simulations were performed in the isothermalisobaric ensemble at a pressure of 1 bar, using the v-rescale thermostat [49] and the isotropic Parrinello-Rahman barostat [50,51].

Analysis
During the MD simulations, the frames were stored every 20 ps. Analysis consisted of visual inspection using VMD [52] and the calculation of various geometric parameters, including the probability of finding either the protein or the DNA within 0.6 nm of a residue of the DNA or protein respectively, and the probability of finding either Na + or Cl − ions within 0.6 nm of a residue in the protein or DNA. This is achieved by first computing the minimum distance between a given residue and the protein/DNA or ion, and then computing the probability histogram of that distance for all simulations. For the ion probability conformations in either the backbone bound or the fully inserted states were included. In addition we calculated the root mean square deviation (RMSD) of both H-NS and the DNA, with respect to equilibrated starting structures, including all atoms in the calculation. Also, we computed distances and number of hydrogen bonds between hydrogen bond donors in the protein and hydrogen bond acceptors in the DNA. A hydrogen bond is counted when donor and acceptor are within a distance of 0.35 nm and the angle between acceptor, donor and hydrogen is less than 30˚. Snapshots are visualized using pymol, developed by Schrödinger, L.L.C. (2010) [53].
A promising quantitative descriptor to follow the interaction between DNA and H-NS has been found in mapping the counting the number of contacts c ij between hydrogen bond acceptors in the minor groove of DNA, labeled i, and hydrogen bond donors in the QGR motif of H-NS, labeled j. Fig 1B shows a schematic representation of these hydrogen bond donors and acceptors. For each pair ij contacts are counted with the expression: where r ij is the distance between atom i and atom j, located in the DNA and H-NS, respectively. The parameters r 0 = 0.4 nm, d 0 = 0.25 nm, nn = 2, mm = 4 have been chosen such to count contacts at hydrogen bond distance (< 0.35 nm) as 1 and contacts at 0.7 nm as 0.5. This provides a smooth and descriptive function able to discriminate between the different binding modes. Summing the contacts for all pairs results in the contact map parameter c QGR−minor : where N DNA and N H−NS are the number of interaction sites in the DNA and in H-NS. In this contact map, hydrogen bond donors in Q112, G113 and R114 have been included. In addition, we calculated the number of contacts between hydrogen bond donors in R93 (atoms N, NZ, NH1 and NH2) and hydrogen bond acceptors in the minor groove of the AT bases c R93−minor .
To discriminate between different binding modes, the contact map is also computed by considering each hydrogen bond donor in the QGR motif separately with respect to the acceptors in the minor groove of the DNA: with j indicating the atoms Q112-N, Q112-NE2, G113-N, R114-N, R114-NZ, R114-NH1, R114-NH2 in the QGR motif.

RETIS setup
The computation of the transition rate between two states, labeled A and B, requires the definition of these states in terms of a set of order parameters. By performing a series of MD simulations, it is possible to characterize the mechanism of the transition between these states and compute its rate. By defining interfaces λ i along an order parameter that sufficiently describes the progression of the transition, the transition rate r AB becomes: where f AB is the flux of MD trajectories passing by interface λ 0 and P i|i+1 are the local probabilities to pass interface λ i+1 given the crossing of the interface λ i , called the crossing probability.
In essence, the order parameter space is divided in subsections (ensembles) by arbitrarily positioned interfaces. The first ensemble provides the flux, which is a quantification of the frequency of the system escaping stable state A. The other ensembles provide the local probability to cross the next interface, given that the previous one has been reached. In regions with large free energy differences, the likelihood to reach the next interface becomes smaller, and therefore a higher number of interfaces is preferable. Further details on the approach, including the mathematical description to compute the flux and the local probabilities, can be found in Refs. [34,36,38,54]. From the MD simulations, we obtained numerical definitions for the stable states, based on the number of contacts between the QGR motif and the minor groove side of the AT basepairs c GQR−minor . The BB state is the non-specific interaction between H-NS and the DNA backbone and corresponds to values of 0 < c GQR−minor < 10. The FI state corresponds the full insertion of the Q112, G113 and R114 segment of H-NS, corresponds to values of c QGR−minor > 30. An intermediate meta-stable state, state PI, has been also detected for values of c QGR−minor around 20, which corresponds to an insertion of either Q112 or R114 into the minor groove. According to the RETIS procedure and following the developers' guidelines [36,38,54], interfaces have been positioned along the order parameter. If during the preparatory procedures steep gradients were observed, more interfaces were added. In other words, if the probability to cross the next interface from a particular interface becomes very low (less than 20% as a rule of thumb), an additional interface should be added. Setting up a RETIS simulation is a trade-off between keeping the number of interfaces as low as possible, to reduce computational costs, and maintaining sufficient crossing. It is worthwhile to point out that an optimal interface number and positioning would improve the efficiency of the sampling, but it would not affect the final results. In cases where the potential energy surfaces is as complex as in the present study, it is computationally exceedingly expensive to achieve the optimal set up. In the binding of H-NS to DNA, a metastable state PI has been identified which imposed a division of the sampling of the overall transition in two regions: BB ! PI and PI ! FI. In the BB ! PI transition, the interfaces has been located at values for c QGR−minor as follows: l BBÀ PI Several cycles of RETIS simulations allowed for an educated guess of the interface positions.
A series of snapshots have been randomly selected from a regular MD trajectory connecting the BB and FI states, covering the range of value of c QGR−minor between BB and PI and PI and FI. These trajectory fragments constituted the initial points to start the RETIS simulations. The RETIS simulations have been performed via the PyRETIS library [37].
RETIS considers three Monte Carlo based moves. Two way shooting, time reversal and swapping. A two-way shooting move consists of starting two MD simulations from a randomly selected snapshot on the previously accepted trajectory. The shooting points, to which random velocities have been assigned respecting the Boltzmann distribution of kinetic energies for the given temperature (aimless shooting), are the seeds from which the new trajectories are propagated along positive and negative, times to generate a full new trajectory. Two consecutive trajectories have, in this scheme, only one point in common. In time reversal moves the trajectory is reversed in time by reversing the sequence of snapshots and by reversing the velocities of each atom in each snapshot. In the swapping moves, two trajectories that both satisfy the properties of both relative ensemble are swapped. A description of the moves and the criteria to select their relative frequency is provided in Refs. [38,54].
Initially, to decorrelate the paths with the initials snapshots, a total of about 400 two-way aimless shootings moves have been performed in 4 independent parallel simulations. Thereafter, the production runs consisted in in about 500 RETIS cycles for the rate estimation of the transitions BB ! PI and PI ! FI. A cycle corresponds to a RETIS move for each ensemble. For 12 ensembles, a total of 6000 trajectories have been considered of which 1500 generated by shooting moves, 1500 by time-reversal moves and the remaining by swapping moves [36,38]. The resulting acceptance path ratio has been equal to 0.31, while the average path length for BBtoPI and PItoFI has been around 30000 time steps corresponding to 60 ps. The aggregate simulation time is around 200 ns. For each simulation, the order parameter and the other descriptors have been stored every 0.1 ps.
To visualize the underling mechanisms, two-dimensional histograms of the path ensembles of the transitions BB ! PI and PI ! FI have been produced, projected onto pairs of selected descriptors. From all the generated paths, the relative frequency of visiting a 2D region has been computed in a grid of 500 by 500 bins which subdivided the interval between the minimum and maximum value of each respective descriptor.

Different modes of binding
Several studies have indicated that H-NS has a preference for AT-rich DNA, [25,26,28,29]. Additional research has shown that H-NS binds DNA with a highly conserved motif consisting of three amino acids, QGR, in the DNA-binding domain [4,17]. To confirm these observations with molecular dynamics, we performed twenty 50 ns MD simulations of the DNA binding domain of H-NS and an AT-rich nucleotide sequence. For the latter, we selected AATATATT, capped with GC, resulting in sequence 5'-GCAATATATTGC-3', based on known H-NS binding sites [3,10,24,28] For the part of H-NS, we used the NMR structure for residues 91-139 (PDB code 2L93 [16]). The protein and the DNA were placed at a separation distance of 2 nm, see Fig 1A. Note that no hydrogen bonds are formed between the protein and the DNA at the start of the MD simulations. We have conducted MD simulations with a larger distance of separation between the protein and the DNA. These simulations resulted in many complexes with the protein attached to one of the DNA ends. Such conformations are not relevant from a physiological point of view. Visual inspection of the MD trajectories reveals that H-NS binds to the DNA in all trajectories. The protein has a strong preference for binding to the AT region in the DNA sequence, as evidenced by the high probabilities for finding protein within 0.6 nm of a nucleotide, see Fig 2A. The probabilities at the CG ends of the DNA arise from the protein binding to the DNA end. Fig 2B shows the probability of finding DNA within 0.6 nm of a protein residue. The highest probability occurs for R114, followed by G113, Q112 and R93. These residues are all conserved and when altered, abolish DNA binding [16,19].
For the hydrogen bond donors in the residues identified as interacting with DNA (R93, Q112, G113 and R114) we have plotted time traces of the minimum distance between the hydrogen bond donors in these residues and hydrogen bond acceptors in the minor groove of the AT basepairs  Fig, columns Q112NE2, R114NH1 and R114NH2, show that partial insertion can occur with the sidechain of either Q112 or R114 (indicated as PI-Q and PI-R respectively). The distance between Q112 and th DNA fluctuates more than the distance, between R114 and the DNA, suggesting that the former may be less stable. As these MD simulations have not sampled all states exhaustively, we refrain from making more quantitative statements based on this data.   When H-NS is in the BB state, the QGR motif interacts with either the phosphate and/or the deoxyribose groups, see Fig 3. R114 can form multiple salt bridges with the phosphate groups, whereas Q112 and G113 can form hydrogen bonds with the DNA backbone. In the BB binding mode, the QGR motif can adopt many different orientations with respect to the DNA backbone, some of which also involve R93. In the PI binding mode, one residue of the QGR motif interacts with one or more bases in the minor groove of the DNA via hydrogen bonds. Partial insertion can occur with either R114 or with Q112, see Fig 3. When R114 is inside the minor groove, Q112 interacts with either the phosphate backbone, R93 or solvent. R114 interacts with phosphate groups when Q112 is inside the minor groove. The protein typically covers one to four base pairs when partially inserted, depending on the orientation of residues in the QGR motif not involved in minor groove interactions. Finally, full insertion of the QGR motif occurs when both Q112 and R114 form hydrogen bonds with bases in the minor groove, see Fig 3. When fully inserted, the QGR motif is aligned with the phosphate backbone and covers three to four base pairs. G113 can also form hydrogen bonds to the bases, causing Q112 and R114 to extend such that the QGR motif covers five base pairs in the minor groove.
Finally, we investigated whether the location of ions on the DNA and protein change upon binding. To this end we calculated the probability of finding a Na + or Cl − ion close to the Predicting the mechanism and rate of H-NS binding to AT-rich DNA protein or DNA, p ion , see Fig 5. For the DNA we observe that less Na + ions are located close to the TATT bases in the FI binding mode compared to the BB state. For the protein, changes in p ion occur for both Na + and Cl − in going form BB to FI. The drop in finding ions close to R114 clearly relates to its insertion into the minor groove. Increases in p ion for sodium may indicate that in the FI state those residues have lost the transient interactions with R93 or R114, which are now replaced by Na + . The decrease of sodium interactions in the FI state for residues 130-139 may be a consequence of the DNA being closer to those residues.

Mechanism and rates of H-NS binding to DNA
Upon binding to AT-rich DNA, the QGR motif visits various stable states with a residence time of more than 10 ns. We performed RETIS simulations to characterize the mechanisms of the transitions between these states. Starting from an initial trajectory that samples a transition, RETIS collects trajectories along the order parameter space by monitoring MD simulations [37]. Once a simulation enters a stable state, it is stopped, limiting an excessive sampling of the stable state. RETIS calculations thus permits to compute the rate constant for H-NS binding to DNA. The MD simulations show that the number of contacts between the QGR motif and the minor groove side of the AT basepairs in the DNA c QGR−minor sufficiently indicates the progression of H-NS binding to DNA. Fig 6 reports the results of the weighted histogram analysis method applied to the local crossing probabilities as a function of the main order parameter c QGR−minor for the BB to PI to FI transitions. The profile of the combined crossing probabilities is smooth, indicating that c GQR−minor is a good choice as a reaction coordinate to describe the binding of H-NS to DNA and that no additional processes play a role. Table 1 reports the values of the flux, the crossing probabilities and the resulting rate constants, and relative errors. No paths that directly connect states BB and FI have been sampled. This suggests that full insertion of the QGR motif occurs via an intermediate state in which the Predicting the mechanism and rate of H-NS binding to AT-rich DNA motif is partially inserted. Since BB and PI transitions are adjacent in the considered order parameter space, an overall rate r BBtoPI can be computed by multiplying the crossing probabilities P BBtoPI and P PItoFI by the flux out of the BB state ϕ BB . The rates for the three transitions BB to PI, PI to FI and BB to FI are given in Table 1. The errors on the estimation of the rate for transition events in complex energy lanscapes is commonly accepted to be within an order of magnitude [55]. We therefore consider the 100% error achieved to be relatively modest and within the accuracy of the force field. Further computations would, therefore, add only a very limited value without providing new insight into the process. Predicting the mechanism and rate of H-NS binding to AT-rich DNA After non-specific association to the backbone, the insertion of H-NS into the minor groove occurs in the order of 10 6 M −1 s −1 . When considering the binding of H-NS to DNA as limited by diffusion, the forward rate of the process is in the order of 10 8 M −1 s −1 [56]. Including electrostatic consideration would make the non-specific association even faster [56]. We therefore conclude that the transition from BB to the fully inserted conformation is the rate-limiting step in the binding process of H-NS to DNA.
By projecting the trajectories collected during the RETIS simulations onto relevant geometrical parameters allows for insights into the mechanism in the form of a probability density plot. Fig 7 shows the RETIS trajectories as a probability density projected onto c QGR−minor and several number of contact counts between the minor groove of the AT basepairs and the individual hydrogen bond donors in the QGR motif. In all trajectories the number of contacts between the hydrogen bond donors in the Arg114 sidechain (atom NH1) and the DNA start to increase from 2 to 5 contacts before forming the PI state at c QGR−minor = 20 (Fig 7E), while the other hydrogen bond donors do not form increasingly more contacts This observation indicates that the sidechain of R114 is the first to enter the minor groove to form the PI state. When comparing the profiles focused on the protein backbone contacts, Fig 7(A)-7(C), the backbones of R114 and G113 insert before Q112. The sidechain of Q112 is the last to enter the minor groove, which happens after c QGR−minor = 23. We also computed the number of contacts between the hydrogen bond donors in R93 and the minor groove c R93−minor and plotted this as a probability density in Fig 7(F). The number of contacts in the overall transition from BB to FI averaged to around 3, indicating interactions between R93 and the minor groove. In the transition from BB to PI two density spots appear at c R93−minor = [2,3] and c R93−minor = 4, suggesting that R93 has multiple ways of interacting with the DNA. The calculation of c R93−minor involves counting the contacts between four atoms in R93 and the minor groove and can be qualitatively compared to c QGR−minor , which is dominated by the number of contacts between R114 and the minor groove. Compared as such, the values for c R93−minor are low, indicating that R93 does not insert in the minor groove, but rather interacts with the DNA backbone. Summarizing, binding of the H-NS DNA binding domain to the minor groove of AT basepairs occurs via the QGR motif, and also involves R93.
Our observations from both the MD and RETIS simulations suggest a sequential mechanism for the insertion of the QGR motif into the minor groove, initiated by the side chain of R114, as summarized in Fig 8. The time traces obtained from the MD simulations in S1 Fig all show that the FI state is formed via insertion of the sidechain of R114 followed by insertion of the sidechain of Q112. These observations are confirmed in the RETIS simulations. The MD simulations occasionally show the formation of a PI state with Q112 inserted. This interaction fluctuates more than the interaction between R114 and the minor groove. Furthermore, the time traces show that insertion of Q112 can be followed by a return to the BB state, which is also confirmed by the RETIS simulations. The flux indicates the frequency of escaping the initial state, the crossing probability is the probability of reaching the next interface when crossing interface i. The reaction rate is calculated according to Eq 4. All values are given with their relative errors, obtained by dividing the standard deviation by the square root of the sample size. https://doi.org/10.1371/journal.pcbi.1006845.t001 Predicting the mechanism and rate of H-NS binding to AT-rich DNA

Discussion
In this work, we show that the conserved QGR motif is the main initiator of DNA binding by H-NS, aided by another arginine, R93. In our simulations we did not observe any other interactions between H-NS and the DNA that lasted longer than 10ns. These findings agree with the experimental results by Gordon et al., who found that H-NS binding is mediated by a Q/RGR motif [16]. The role of R93 emphasizes the recently identified importance of charged residues in the linker region, as recently found by superresolution microscopy studies on the DNA binding properties of H-NS mutants [19]. H-NS can bind to DNA in various modes, which we grouped into three categories: bound to the backbone (BB), the QGR motif partially inserted into the minor groove (PI) and fully inserted into the minor groove (FI). Our results show that a fully inserted QGR motif has a lifetime of more than 50ns. Moreover, this complex is structurally in excellent agreement with the docked structure proposed by Gordon et al. [16]. Currently more accurate force fields are available for both proteins [57] and nucleic acids [58]. Despite using a less accurate force field, AMBER03 [39], we were able to reproduce experimental observations, and provide additional insights into the binding process. These results did not involve any major conformational changes in the protein nor in the DNA, and therefore, we believe that the AMBER03 force field proved to be sufficient.
The MD simulations indicate that the number of base pairs shielded by the QGR motif during complex formation depends on the orientation and conformation of the QGR motif when bound to DNA. Root mean square fluctuation (RMSF) calculations of the DNA-binding domain with respect to the NMR structure, using C-alpha atoms, indicate that the QGR motif is one of the more flexible parts of the domain (data not shown). The motif exhibits various conformations ranging from the two side chains (Q112 and R114) fully extending along the principal axis of the protein backbone or curving away from the backbone. This flexibility enables the QGR motif to cover three to five base pairs, while aligned along the phosphate backbone. In the FI state three to five base pairs are covered and one to four base pairs are shielded by H-NS in the PI binding mode. When bound to DNA, the H-NS DNA-binding domain prevents additional domains to bind directly adjacent to the first. This excluded volume effect shields up to three extra base pairs in addition to the base pairs directly occluded by the QGR motif. Cross-linking experimental studies have identified the size of the binding site of H-NS between 15-20 base pairs [59], which was further refined to 8 to 10 base pairs based on NMR experiments experiments combined with an bioinformatics approach [60]. These observations agree well with our predictions.
The models of a full-length H-NS dimer as constructed by van der Valk et al. [61] allows for an interpretation of our results in a larger context. These models were constructed by combining the structural information for the N-terminal oligomerization domain (PDB:3NR7 [14]) and the NMR structure of the DNA-binding domain (PDB:2L93 [16]) with the flexible linker modeled as a random coil [61]. The distance between two DNA-binding domains in these models depends on the distances between the C-terminal ends of the oligomerization domains, the conformation of the linker region and the conformation of the DNA-binding domain. Since the RMSD of the DNA binding domain indicates that hardly any structural changes occur upon binding (see Fig 4D, we measured the distance between the N-terminal end of the DNA binding domain (A91-C) and the center of the QGR motif (G113-CA) at 1.9 nm. The linker is 1.2 nm in length when in random coil conformation and 2.4 nm when fully stretched. Finally, the distance between the C-terminal ends of the oligomerization domains is 9 nm when dimerization occurs via the N-terminal dimerization site and 2.3 nm when dimerization occurs via the dimer-dimer interaction site. As a first estimate for the range of distances between two DNA-binding domains in a H-NS filament on DNA, we estimate that the distance between adjacent QGR motifs ranges from 8 to 17 nm. However, as more recent MD simulations indicate considerable flexibility for both the helix connecting the two dimerization domains in the oligomerization domain [6] and the region linking the DNAbinding domain to the rest of the protein [6,19], this interpretation is prelimiary, and warrants further modeling, but this is beyond the scope of this work.
A recent bioinformatics study revealed that arginine residues have a preference for narrow minor grooves [62]. This study suggested that arginine binding to narrow minor grooves is a mechanism widely used in protein-DNA recognition. Electrostatic potential calculations showed that narrow minor grooves have a strongly enhanced negative electrostatic potential, providing an explanation for the preference of arginine to bind to narrow minor grooves [62]. Glutamine was identified as the third most frequent narrow minor groove binder, following arginine and lysine. These results agree well with the prominent role of R114 in the H-NS DNA binding modes and suggest that the QGR motif is optimal for binding to narrow minor grooves.
As lysine was identified as a frequent minor groove binder, we investigated whether lysine residues in the H-NS DNA-binding domain contributed to DNA binding. The DNA-binding domain contains six lysine residues, of which four may interact with the DNA via salt bridge interactions to phosphate groups, see Fig 2 Interactions of lysines with the bases from either the minor or major groove of the DNA are not observed. Electrostatic potential calculations indicated that the energetic cost of desolvation is larger for lysine than for arginine [62]. We speculate that the lysines in the H-NS DNA binding domain may provide additional attractive interactions during the initial stage of H-NS moving closer to the DNA.
Different studies suggest that H-NS favors curved DNA over straight DNA [23,63,64]. Since twelve base pair DNA is too short to determine curvature, we cannot comment on the link between the various binding modes and the curvature of the nucleotide sequences. However, since the QGR motif only binds a short region of DNA, we believe it to be unlikely that individual DNA-binding domains discriminate between curved and straight DNA.

Conclusion
Molecular dynamics simulations revealed that the H-NS DNA-binding domain binds to DNA with its conserved QGR motif, aided by Arg93. The association of the protein domain to DNA does not result in deformation of either the protein or the DNA. We identified three binding modes, which can be distinguished via the number of contacts between the QGR motif and the DNA. The binding modes, in order of increasing number of contacts between the protein and the bases, are BB (backbone bound) PI (partially inserted) and FI (fully inserted). Replica Exchange Ttransition Interface Simulations enabled the calculation of the rate of the transition from the backbone bound state to the fully inserted state. These calculations indicate that the fully inserted state is always reached via the partially inserted intermediate, and that R114 initiates the binding. The rate of going from BB to FI is predicted to be in the order of 10 6 M −1 s −1 .