Simulations Meet Experiment to Reveal New Insights into DNA Intrinsic Mechanics

The accurate prediction of the structure and dynamics of DNA remains a major challenge in computational biology due to the dearth of precise experimental information on DNA free in solution and limitations in the DNA force-fields underpinning the simulations. A new generation of force-fields has been developed to better represent the sequence-dependent B-DNA intrinsic mechanics, in particular with respect to the BI ↔ BII backbone equilibrium, which is essential to understand the B-DNA properties. Here, the performance of MD simulations with the newly updated force-fields Parmbsc0εζOLI and CHARMM36 was tested against a large ensemble of recent NMR data collected on four DNA dodecamers involved in nucleosome positioning. We find impressive progress towards a coherent, realistic representation of B-DNA in solution, despite residual shortcomings. This improved representation allows new and deeper interpretation of the experimental observables, including regarding the behavior of facing phosphate groups in complementary dinucleotides, and their modulation by the sequence. It also provides the opportunity to extensively revisit and refine the coupling between backbone states and inter base pair parameters, which emerges as a common theme across all the complementary dinucleotides. In sum, the global agreement between simulations and experiment reveals new aspects of intrinsic DNA mechanics, a key component of DNA-protein recognition.


Introduction
GpG•CpC are characterized by BII percentages markedly higher than the average (21%); ApN•NpT (N: any base) and TpA•TpA can be globally considered as BI-rich; GpA•TpC is an intermediate case, with BII percentage only slightly lower than average. These intrinsic sequence-specific BII propensities in solution were summarized on a scale called TRX [28], by reference to the couplings with Twist, Roll and X-disp. TRX was recently validated further by a large set of δPs collected on new DNA oligomers [27].
This detailed information on DNA in solution offers a precious framework for testing and refining DNA force fields. Thus, the AMBER force fields Parm98 [29] and Parm99 [30] stabilized artefactual α/γ conformations that caused severe distortions in DNA [19,31]. Such undesirable α/γ transitions were corrected in the subsequent potential Parmbsc0 [12]. First observed by NMR [32,33] and then generalized and quantified in the TRX scale [34], the modulation of CpG BII propensity by the 3'-and 5'-neighbors was qualitatively retrieved by Parmbsc0 [35]. Also, the sensitivity of the BI $ BII equilibrium to the type of monovalent cation (K + , Na + ) was demonstrated by NMR [26]. Parmbsc0 simulations do not seem to reproduce this dependence, yet they suggest a mechanism that could explain how K + and Na + affect the backbone motions [35]. Concerning the CHARMM family of force-fields, the early thorough systematic calibration of the DNA backbone torsional energetics for CHARMM27 [17,36] prevented artefactual α/γ transitions and resulted in a force-field which treats B-DNA robustly [6,37,38]. Importantly, CHARMM27, like Parmbsc0, correctly represent the mechanical coupling between the backbone states and the helical parameters [9,38,39].
In response, two force-fields were recently conceived to improve the DNA backbone representation: Parmbsc0 εzOLI [13], derived from Parmbsc0 -, and CHARMM36 [11], built on CHARMM27. Parmbsc0 εzOLI and CHARMM36 were developed guided by DNA X-ray structures and a small set of BII percentages extracted from NMR. In initial tests with B-DNA, both force fields notably increased the sampling of the BII form compared to prior potentials [11,13]. Since twist and groove shape are coupled to the BI $ BII equilibrium, the structural outcome obtained with Parmbsc0 εzOLI significantly differs from that yielded by Parmbsc0 [13]. These initial tests are encouraging and call for a more systematic examination of the performance of these potentials, especially in the light of experimental data not used to train the force-fields.
The present work exploits a wealth of recent 31 P NMR chemical shifts on the DNA backbone motions, to thoroughly evaluate the performance of the Parmbsc0 εzOLI and CHARMM36 potentials. These data were collected on four DNA dodecamers [27], independent of those used to develop those force-fields. Together, the dodecamers cover a 39 bp segment in the 5' half of sequence 601, the best artificial sequence at forming nucleosome core particle [47], which is therefore important to understand how DNA is packaged. The TRX approach [28] combined with the analysis of the 31 P chemical shifts of the four dodecamers [27] provides evidence that the intrinsic structural characteristics of the free sequence 601 largely account for its strong affinity for the histone core. In addition to their biological relevance, the 72 dinucleotides (NpN) of the four dodecamers behave as expected from the TRX scale regarding the effect of sequence on the BII propensities [27]. These dodecamers and the attending experimental information are therefore ideally suited to evaluate Parmbsc0 εzOLI and CHARMM36, with emphasis on the representation of the BI $ BII equilibrium and the coupled helicoidal parameters. Importantly, we show how the improvements brought by these new potentials lead to new insights into DNA structure and dynamics, which are essentially consistent across the two force-fields.
The first step was to compare the BII percentages inferred from δP measurements to those generated by MDs. The simulated fine modulation by the sequence is not yet fully satisfactory. For instance the simulated BII populations of some steps, as those of GpC with both force fields, tend to be underestimated. However, Parmbsc0 εzOLI and CHARMM36 represent the backbone behavior much more realistically than previous force fields. CHARMM36 in particular shows a very good ability to obtain BII-rich steps. This advance enabled to examine for the first time the conformational combinations corresponding to the states of facing phosphate groups, i.e. BI•BI, BI•BII, BII•BI and BII•BII. We find that the conformational states of the two facing phosphate groups of any complementary dinucleotide are not correlated in either Parmbsc0 εzOLI or CHARMM36 simulations. An important practical consequence is that the populations of the combinations of facing phosphates can be easily deduced from the overall individual BII populations inferred experimentally for every phosphate. This approach reveals that the four dodecamers contain a sizable number of steps where BII-containing states, BI•BII, BII•BI and BII•BII, dominate. Such quantification is critical for accurately describing the conformational landscape explored by the complementary dinucleotides, because the backbone combinations are tightly associated with helical parameters, as documented here. Overall, our results deepen our understanding of the intrinsic B-DNA mechanics, which is a key player in the indirect readout of DNA sequences by proteins.

Overview of the simulations
Each of the four dodecamers (Table 1) was simulated with the Parmbsc0 εzOLI [13] (P-MDs) or CHARMM36 [11] (C-MDs) force-field, resulting in a total of 8 MDs. The MDs of Oligo 1, 2 and 3 were 450ns each, while for Oligo 4 the trajectories were extended to one microsecond. Additional sampling was performed on Oligo 4 since its alternation of BI and BII-rich dinucleotides is especially relevant to test the convergence of backbone dynamics.
During the present simulations, the base pairs N 2 !N 11 •N 14 !N 23 were stable, with~99% of Watson-Crick pairing. The root mean square deviations (RMSDs) between a regular canonical B-DNA and the simulated snapshots fluctuated around 2.6±0.6 Å in P-MDs and 2.1±0.5 Å in C-MDs (S1 Fig). The slightly larger RMSDs for P-MDs versus C-MDs gave the first indication of subtle differences between the force-fields, but one should refrain from interpreting these differences in terms of relative validity of the two force-fields, since canonical B-DNA is a somewhat artificial construct. Then, we examined the five dihedral angles of the phosphodiester backbone, α, β, γ, ε and z. In both P-and C-MDs, α/β/γ conform to the canonical g-/ trans/g+ pattern observed in free DNA [14,15,[17][18][19]. The torsions ε and z, which undergo  (Fig 1). The convergence of the BII populations is of evident relevance, especially to compare simulated BII percentages to their experimental counterpart. Previous analyses of very long trajectories (up to~45 μs) with Parmbsc0 and CHARMM36 showed reasonable convergence of the fast motions (timescale < 100ns) on internal parts of DNA oligomers after only~50ns [5]. A similar conclusion was drawn from μs simulations with Parmbsc0, using as convergence criteria the average helical and backbone parameters [41]. These previous studies indicate that the timescale of the present MDs should be amply sufficient to investigate the backbone motions. Indeed, our results confirm this expectation, keeping in mind that constraints were applied to the terminal base-pairs to maintain their Watson-Crick base-pairing. A detailed justification of the protocol is given in Materials and Methods. Clearly, the BII population of some phosphates did not converge over the first 50 ns, which may be considered as a reasonable equilibration time. Thus, for each phosphate, convergence was monitored by plotting its BII percentage over increasing trajectory lengths, from 150ns upwards (S2 Fig). For each Oligo treated with Parmbsc0 εzOLI or CHARMM36, convergence of the BII populations was reached within the simulation times, including for phosphates with high BII populations (S2 Fig).
An additional test was performed with the MDs of Oligo 4 extended to 1 μs, by comparing the beginning -from 50 to 150ns-and the end -from 900 to 1000ns-of trajectories (S3 Fig). This analysis produced very similar BII percentages on each step of Oligo 4, with only slight differences (8% for the worst case) on some BII-rich steps. Thus, the first 100ns of production (from 50 to 150ns), while not sufficient to ensure a complete convergence, surprisingly offer a rather good estimation of the backbone behavior, supporting the expectation that the simulations are essentially converged for practical purposes with respect to the BI/BII balance on the 450ns time scale.
We recall that the MDs were performed restraining the Watson-Crick hydrogen bonds in the first and last base pairs. We chose this protocol since, with unrestrained MDs, convergence issues were observed in conjunction with fraying events involving larger than expected motions of the terminal regions, consonant with previous reports [5,13,48]. Since the structural signature of these long-lived fraying events in the unrestrained MDs is not supported by the NMR measurements (see "Restrained base pairing on the first and last base pairs" in Materials and Methods), Watson-Crick pairing restraints of the first and last base pairs were applied in the analyzed MDs. This remedied the convergence concerns otherwise observed in the unrestrained MDs (see the example of restrained and unrestrained 1 μs P-MDs of Oligo 4 in S4 Fig). Importantly, however, the BII percentages calculated on the central part (N 3 !N 10 •N 15 !N 22 ) of the dodecamers are quasi identical (correlation coefficient of 0.98) in both restrained and unrestrained MDs. This indicates that the behavior of the internal steps was reproducible in different conditions, another reassuring element concerning the convergence of the simulations presented here.
Overall, we observe that the DNA backbone dynamics is essentially converged with MDs of several hundredth ns. This convergence timeframe is realistic considering that the phosphate groups undergo rapid (nano-picosecond timescale) conformational exchange according to NMR [49,50]. Convergence in terms of sampling of the backbone states allows us to concentrate the following analysis on the influence of the force-fields.

Definition of the BI and BII states for MD analyses
In free DNA X-ray structures the distribution of the pseudo-angle (ε-z) is characterized by two major peaks centered around (ε-z) = -90°(BI, ε in trans and z in g-) and (ε-z) = 90°, (BII, ε in gand z in trans) (S5 Fig). Between these two maxima, a region covering (ε-z) values from -60 to +70°contains phosphate linkages with ε:trans typical of BI and z:trans typical of BII. Therefore this region may be considered ambiguous in terms of BI/BII categorization. The separation between BI and BII is commonly set at the minimum of the (ε-z) distribution, which is close to (ε-z) = 0 in the X-ray distribution (S5 Fig). BI and BII are thus usually characterized by negative and positive (ε-z) values, respectively.
The pattern observed in the X-ray structures for the (ε-z) distribution is globally preserved in P-MDs and C-MDs, while influenced by the force-fields (Fig 2). Thus, the operational definition of the BI and BII states in MDs must be carefully scrutinized, and possibly adapted. In addition to the (ε-z) histograms, the sugar populations in the south, east and north puckers (e.g. Oligo 4 sugars in S6 Fig) were considered, since this criterion is relevant to the definition of BI and BII. Indeed, crystallographic and NMR investigations established that BI is tolerant in terms of surrounding 5' and 3' sugar puckers, while BII is restricted to south puckers, especially with respect to 5' sugars [15,18,50].
The (ε-z) histogram of C-MDs has a minimum at (ε-z) = 30°, located at a tail of the ε/z: trans/trans region (Fig 2). 5' south sugars are observable in both BI and BII regions; 5' east sugars fall in (ε-z) from -110 to -50°, inside the conventional BI region; 5' north sugars are associated to a larger range of (ε-z) values, but are suppressed above (ε-z) = 30° (Fig 2). This sugar behavior and the minimum of (ε-z) at 30°offer an analogy with the X-ray observations, such that (ε-z) = 30°was deemed suitable to separate BI from BII with CHARMM36. Using instead the conventional cutoff (ε-z = 0) to separate BI from BII would not result in a dramatically different description, since the average BII population inferred with (ε-z) > 0°would only be 4% higher than that based on (ε-z) > 30°. However, in view of the distribution of 5' north sugars, the criterion (ε-z) > 30°was preferred to define BII with CHARMM36. Incidentally, the present C-MD data confirm the strong association between BII and 5' south sugars.
With Parmbsc0 εzOLI , no clear minimum is observed in the (ε-z) distribution between the BI and BII peaks, which are separated by a flat (ε-z) distribution (Fig 2). This intermediate region, centered around (ε-z) = 0, contains ε/z:trans/trans conformers (Fig 2), which represent 10% of the snapshots. In absence of a clear minimum in the (ε-z) distribution, histograms of ε and z were considered. This approach was adopted in previous Parmbsc0 trajectories [15,18,50,51], where the BII linkages were defined relative to the minimum of the distribution. Here, with the minimum of the z distribution at z = 230°, this approach would designate as BII the range above (ε-z) = -50°, a strongly negative value. Conversely, the transition from BI to BII would be at (ε-z) = 40°if chosen to be at the minimum of the ε histogram (ε = 240°). So, the ε and z histograms do not provide a coherent definition of BI and BII ranges for P-MDs here. In addition, the sugar dynamical regime is of little help since Parmbsc0 εzOLI generates only a few Distribution of backbone (ε-ζ) values and 5' sugar behavior in MD snapshots. Top panels: the frequencies (N) of (ε-ζ) values were extracted from P-MDs (left panel, blue line) and C-MDs (right panels, red line) snapshots. The BI and BII regions are indicated, in accordance with the analysis presented in the main text. The hashed regions correspond to the distribution of ε/ζ:trans/trans, the categorization of which is ambiguous with regard to BI or BII. Bottom panels: pseudorotation phase angle (P) of the 5' sugar puckers as a function of (ε-ζ) values (°), extracted from P-MDs (left panel) and C-MDs (right panel). The sugar ring conformations were in north (pseudorotation phase angle 0 to 50°), east (50 to 120°) or south (120 to 220°). The (ε-ζ) region assigned to BII does not contain 5' north sugars. The color gradient is indicative of the density, with the highest densities in yellow. The vertical lines indicate the (ε-ζ) values used here to separate BI from BII in P-(blue lines) and C-MDs (red lines). north sugars (Fig 2 and S6 Fig). In absence of any convincing more specific rationale to assign the ε/z:trans/trans snapshots to either BI or BII with Parmbsc0 εzOLI , we adopted the common BII definition, (ε-z) > 0°. Such a decision is somewhat arbitrary but the uncertainty it introduces is limited. Indeed, shifting the (ε-z) dividing value by 20°((ε-z) = -20°or +20°) only changed the BII population by +/-2%. In sum, BII percentages were extracted using (ε-z) > 0°f or P-MDs and (ε-z) > +30°for C-MDs throughout this work.
These considerations illustrate the difficulty in defining BI and BII unambiguously, in a manner which would be meaningful and transferable across different structural models. It also draws the attention to some differences between Parmbsc0 εzOLI and CHARMM36 regarding their representations of sugars and the (ε-z) distribution. Yet, the following sections show that a consistent overall picture emerges from CHARMM36 and Parmbsc0 εzOLI .

Overall BII populations from MD simulations compared to NMR
The four dodecamers studied here correspond to 72 dinucleotide steps, excluding the terminal steps. The corresponding 72 31 P chemical shifts (δPs) were measured and converted to BII percentages, BII% from NMR , using an empirical procedure based on a calibration involving a comparison of NMR and X-ray structural data [24] (see also Materials and Methods). In this procedure, δPs of pure BI and pure BII states are assumed to be sequence-independent, even if they could be modulated by the dinucleotide sequence, as suggested by a computational study [52]. However, previous studies showed that neglecting subtle sequence effect on δPs of pure BI and pure BII produced reasonable estimations [27,53], for instance with points where BI and BII are expected to be equally populated [27]. Another indication of the protocol reliability is the consistency between the average BII percentage either derived from the average δPs of the 72 steps considered here (19% of BII, from δP av = -4.20 ppm at 30°) or inferred from statistics of X-ray structures (20% of BII) [15].
A first test of the force fields is to compare the NMR-inferred and simulated BII populations, averaged on the 72 dinucleotides. The simulated overall average BII percentages are 11% in P-MDS and 18% in C-MDs. Thus, Parmbsc0 εzOLI somewhat underestimated the BII populations, as noted before [13]. The excellent agreement of the CHARMM36 value with experiment is an obvious improvement compared to CHARMM27, which severely underrepresented BII [9,11].
That the force fields, in particular CHARMM36, produce overall BII population commensurate with experimental data is very encouraging, considering that the treatment of the backbone by previous force-fields fell outside the experimental range. Since the dinucleotides have markedly different propensities to populate BII [15,27,28,51,53], reproducing the sequence effects is a more stringent test of the force-fields, examined in the following.

Sequence-dependent BII propensities from simulations versus NMR
A previous dataset of 323 measured δPs has established that the 16 dinucleotides (NpN steps for a single strand in a duplex context) composing B-DNA are associated with specific δP values [28]. Since δP translates into a BII propensity it implies that the BI/BII populations are primarily controlled by the dinucleotide sequence [28]. The additional 72 δPs considered here conform to this sequence pattern, validating the notion of dinucleotide-specific BII propensity [27]. Thus, the sequence-dependent BII populations derived from δPs provide a rare opportunity to test the sequence-dependent behavior of DNA force-fields in solution. One notes that adjustments made to Parmbsc0 εzOLI and CHARMM36 to increase the BII populations were not tailored depending on the base sequence, in contrast with, for instance, the CMAP approach [54]. In other words, the same backbone force-field parameters are applied to any sequence. Therefore, differences in the backbone behavior during simulations can only be ascribed to intrinsic sequence-dependent properties.
To examine whether Parmbsc0 εzOLI and CHARMM36 reproduce the effect of sequence on BII populations, the simulated BII percentages were compared to their experimental counterparts, considering the individual phosphates (BII% from MD versus BII% from NMR , given in S1 Table). BII% from MD and BII% from NMR are overall moderately correlated ( Table 2 and  The simulated BII percentages of half of the 72 steps (53% for both P-MDs and C-MDs) are within BII% from NMR ±10%, where the 10% interval corresponds to the tolerance allowed around the NMR-based BII percentages (see Materials and Methods). The comparison between BII% from NMR and BII% from MD is shown in Fig 3 for each non-terminal phosphate of the four dodecamers.
A more detailed analysis, in particular on the dinucleotides present in several occurrences in the dodecamers, shows that Parmbsc0 εzOLI and CHARMM36 correctly reproduce the low or moderate BII% from NMR (< 20%) of CpT, GpT, ApC, ApG, ApA and TpT (Fig 3 and Table 3 for the steps present in several occurrences in the dodecamers). However, this overall agreement suffers some shortcomings. The BII population of TpA tends to be either underestimated or overestimated with Parmbsc0 εzOLI and CHARMM36, respectively. GpC steps in BII are quasi systematically underestimated by both force fields, as well as one of the three CpA•TpG in Oligo 4. Parmbsc0 εzOLI generated too low BII percentages on CpC and GpG in Oligo 2 and most CpG (seven on a total of ten). In C-MDs, the representation of CpC, GpG and CpG is reasonable whereas inversions of BII% occurs at two complementary CpG•CpG steps. Indeed, the NMR gives asymmetrical BII percentages for C 2 pG 3 •C 22 pG 23 in Oligo 1, with 79% of BII for C 2 pG 3 and 42% for C 22 pG 23 . C-MD gave the reverse, with 30% and 75% of BII for C 2 pG 3 and C 22 pG 23 , respectively. A similar situation arises for C 10 pG 11 •C 14 pG 15 in Oligo 3.
Our results confirm that adjustments specifically aimed at enhancing access to the BII state produce convincing, positive effects, especially perceptible in CHARMM36. That the increase in simulated BII populations is not distributed uniformly along the sequences (Fig 3) is not trivial since, as noted above, the computational models were not parametrized to reproduce the BII% for specific dinucleotides, but were only adjusted to be generically more permissive to BII. Admittedly, discrepancies still exist between the experimental sequence effect on the BI$BII equilibrium and Parmbsc0 εzOLI or CHARMM36. However, an essential point is that where the vertical bar means logical "or"). The populations of these combinations are especially meaningful from the point of view of B-DNA mechanics, because inter base pair parameter values are associated to the conformational states of two facing phosphate linkages [15,16,24,40,[55][56][57][58], as also addressed below. The behavior of facing phosphate linkages cannot be deduced from δP measurements, which report time and ensemble-averaged BII percentages for individual phosphates. The present simulations offer the opportunity to inspect the dynamic behavior of phosphate linkages in complementary dinucleotides and to estimate possible correlation. Indeed, several steps in C-MDs, in particular CpG•CpG, CpC•GpG and TpA•TpA, adopt the three combinations, BI•BI, BI•BII|BII•BI or BII•BII (Table 4 and Fig 4). In P-MDs, BI•BI and BI•BII|BII•BI are also  (21) The BII percentages inferred from experimental δPs or extracted from the simulations with Parmbsc0 εζOLI   (Fig 5).
The populations of BI•BI, BI•BII|BII•BI and BII•BII can be addressed with simple elements from probability theory, summarized here before comparison to the MD data. P i (BII) is the probability that phosphate i is in BII, the complementary event has probability P i (BI) = 1 -P i (BII). The states of facing phosphate pairs i,j are characterized by pair probability distributions, P i,j (BI•BI), P i,j (BI•BII), P i,j (BII•BI) and P i,j (BII•BII). Because the facing phosphate groups are either BI or BII, the probabilities satisfy the relations: P i;j ðBIBIÞ þ P i;j ðBIBIIÞ ¼ P i ðBIÞ P i;j ðBIBIÞ þ P i;j ðBIIBIÞ ¼ P j ðBIÞ P i;j ðBIIBIÞ þ P i;j ðBIIBIIÞ ¼ P i ðBIIÞ P i;j ðBIBIIÞ þ P i;j ðBIIBIIÞ ¼ P j ðBIIÞ Summing the last two equations gives: The first term on the left in (1) is the probability of BI•BII|BII•BI, P i,j (BII•BI|BI•BII). Note that here P i,j (BII•BI|BI•BII) does not denote a conditional probability, but simply the probability of states BII•BI or BI•BII. So Eq 1 is equivalent to P i;j ðBIIBIjBIBIIÞ þ 2 P i;j ðBIIBIIÞ ¼ P i ðBIIÞ þ P j ðBIIÞ: Eq 2 is general, as it follows directly from the definitions and it does not rely on any assumption about the independence (correlation) of facing phosphate groups.
One now examines the case when the states of the two facing phosphate groups are independent of each other. Then, the pair probabilities factorize:   (2) and (3) and from the relations: P i (BI) = 1-P i (BII) and P j (BI) = 1-P j (BII).
Eqs 3 and 4 mean that, under the assumption of statistical independence of the two individual facing phosphates, the knowledge of the single phosphate probabilities P i (BII) and P j (BII) is sufficient to find the probabilities of BII•BI|BI•BII, BII•BII, and then also of BI•BI by using equation: P i;j ðBIBIÞ ¼ 1 À ½P i;j ðBIIBIjBIBIIÞ þ P i;j ðBIIBIIÞ: The next step was to test the possibility of uncorrelated facing phosphates against data collected from the MDs. Thus, P i,j (BII•BI|BI•BII), P i,j (BII•BII), P i (BII) and P j (BII) were evaluated as the proportions of these states in the MD trajectories; P i,j (BII•BI|BI•BII) was compared to [P i (BII) + P j (BII)−2P i (BII) P j (BII)] in P-MDs and C-MDs; P i,j (BII•BII) was compared to [P i (BII) P j (BII)] in C-MDs only, since they generate sizable BII•BII populations in contrast with P-MDs. The agreement between the compared quantities is clearly visible in Fig 6, with correlation coefficients of 0.99. That is, the distribution of BII steps between the BI•BII|BII•BI and BII•BII combinations in complementary dinucleotide matches Eqs 3 and 4 very well. Thus, the conformational states of the facing phosphates are statistically independent of each other.
In sum, the ability of both Parmbsc0 εzOLI and CHARMM36 to generate phosphates visiting BII enabled to gain new insights into their dynamics and populations. Thus, simultaneous transitions of two facing phosphate groups are very rare. The two force-fields unambiguously support the notion of statistical independence of the conformational states of individual, facing phosphates. This means that the populations of the three combinations of facing phosphates can be simply expressed from the BII propensities of individual phosphate groups, in particular from experimental data, as developed in the next section.

Quantifying the facing phosphate combinations from NMR
Considering that the notion of uncorrelated facing phosphate is convincing, the probabilities of states BI•BII|BII•BI and BII•BII were calculated using Eqs 3 and 4, respectively, and equating P i (BII) and P j (BII) to P i from NMR (BII) and P j from NMR (BII) (equivalent to BII% from NMR given in S1 Table). This has the advantage to use the experimental data directly (δP-derived BII percentages) to quantify the phosphate states, bypassing the limitations in the simulated estimates of the phosphate state populations. The resulting experimentally inferred BI•BII|BII•BI and BII•BII populations along the four dodecamers are shown in Fig 7 and the values are given in S2 Table. According to this approach, most CpG•CpG and GpC•GpC, as well as the only CpC•GpG, are characterized by high percentages (45% and more) of BI•BII|BII•BI (Fig 7). CpG•CpG in Oligos 1 and 3, CpC•GpG in Oligo2 and CpA•TpG in Oligo 4 are in addition more than 20% in BII•BII (Fig 7). Overall, BI•BI is not the most frequent state in 12 steps, out of a total of 36, in the four dodecamers.
As seen above (Fig 3), the individual BII percentages extracted from MDs differ from those inferred from NMR; accordingly, the corresponding respective populations of BI•BII|BII•BI and BII•BII are not identical. However, the match between C-MD and experimentally inferred data is reasonable (S8 Fig), with correlation coefficients of 0.62 for BI•BII|BII•BI and 0.57 for BII•BII. So, CHARMM36 appears to represent the sequence-dependent behavior of the pairs of facing phosphates better than that of individual phosphates ( Table 2). This improvement reflects in part compensatory effects between the two strands of CpG•CpG steps, in which the asymmetric individual BII percentages are inversed in C-MDs and NMR (see the above section "Sequence-dependent BII propensities from simulations versus NMR").
Overall, the realization that the states of facing phosphates are independent enables to derive their populations from δP-based BII percentages. Applying this approach reveals that all the complementary dinucleotides in the four dodecamers populate both BI•BI and BI•BII| BII•BI, some of them also display significant percentages of BII•BII (Fig 7 and S2 Table). This prevalence of BII-containing steps is of real importance for the DNA intrinsic mechanics, as examined next.
Conformational combinations of facing phosphate and inter base pair parameters BI•BI, BI•BII|BII•BI or BII•BII are associated to different values of slide, roll and twist in X-ray structures [15,16,39]. However, the requirement to select only very high resolution X-ray structures to ensure the accuracy of backbone dihedral angles [59] drastically limits the data for analysis. A previous study [28] underlined that BII conformers in such X-ray structures occur almost exclusively in CpG, CpA, TpG, GpG, and GpC; furthermore, the BII•BII combination was only observed in CpA•TpG. The improved representation of the DNA backbone with Parmbsc0 εzOLI and CHARMM36 offers the opportunity to broaden the analysis of the helicoidal parameters associated to the facing phosphate combinations for a larger variety of complementary dinucleotides than in X-ray datasets. Consistent results between both force fields would of course strengthen the conclusions.
The mean values of the six inter base pair parameters (shift, slide, rise, slide, roll and twist) were calculated for each conformational combination of the facing phosphate groups, after  (Table 5) contrary to invariant shift and tilt (S3 Table). As in a previous study using Parmbsc0 [40], rise variations are observed, but the change between BI•BI and BII•BII does not exceed 0.2 Å in both P-MDs and C-MDs (S3 Table).  Table. doi:10.1371/journal.pcbi.1004631.g007 It is striking that there is almost quantitative agreement on the changes of slide, roll and twist across BI•BI, BI•BII|BII•BI and BII•BII with both force-fields (Table 5). Yet, CHARMM36 does not increase the twist as much as Parmbsc0 εzOLI from BI•BI to BII•BII. Considering the 36 individual complementary dinucleotides of the four oligomers (Fig 8) confirms this concordance. Only isolated departures between Parmbsc0 εzOLI and CHARMM36 appear when the variation of the helical parameters is examined in individual complementary steps (examples in S9 Fig). In P-MDs, the rolls of TpA•TpA are systematically 5±2.5°larger than in C-MDs, for all backbone combinations; the twist of CpG•CpG and CpA•TpG in BII•BII is 6.5±1.5°higher in P-MDs than in C-MDs.
The MD results not only systematically documented the couplings, but they enabled comparison of the variability (standard deviations) of the helicoidal parameters for BI•BI, BI•BII| BII•BI and BII•BII. In both P-MDs and C-MDs, the slide and twist variabilities are greater in BI•BI compared to BI•BII|BII•BI or BII•BII (Fig 9). A similar, but attenuated, trend is observed for the roll in C-MDs (where the roll standard deviations are higher than in P-MDs). Thus, the simulations suggest that BII containing combinations are stiffer than BI•BI, at least for slide and twist. This, combined with the suppression of north sugars in 5' of the BII linkages, might entropically disfavor the BII conformers. However, other contributions will affect the net balance of the BI $ BII equilibrium. Indeed, the above quantitative analysis makes clear that BII is frequently populated, and is the dominant conformer at some base steps.
The concordant results from P-MDs and C-MDs considerably strengthen and extend the view of the couplings between the facing phosphate states and inter base pair parameters gleaned from X-ray structures. Here, MDs inform about the behavior of a large range of dinucleotides, comprising those that are moderately or even barely propitious to BII. They reveal a general mechanical property of free DNA. Thus, BII containing complementary dinucleotides are characterized by more positive slide, more negative roll and higher twist than those in BI. As most steps have access to the BI•BII|BII•BI states (see the preceding section), the DNA deformation cost upon protein binding could be less important than expected when BII-like features are required for recognition. This can be illustrated by the TTAAA sequence in Oligo 3. This segment is considered as one of the strongest anchoring points in the nucleosome assembly [60][61][62][63][64], by forming multiple interactions with histones H3 and H4. In the X-ray structures of nucleosome containing the sequence 601 (PDB entries 3ZL0, 3ZL1 [63] and 3MVD [61]), TTAAA•TTTAA displays rather variable but globally negative rolls (-7 ±6°). According to both NMR and MDs, in their free state, these steps are mainly in BI•BI, associated to rolls of 4.4±3°. However, they also explore the BII•BI|BI•BII states, with rolls of -2.5±1°. So, the free TTAAA sequence spontaneously visits conformations closer than expected to its bound counterpart.

Discussion
Assessing the extent to which MD simulations correctly represent B-DNA structural features in solution, their sequence dependency and populations remains an ongoing challenge and a necessary step to gain confidence in the role that DNA simulations may play in biophysics and structural biology. Part of the difficulty is to obtain experimental data in solution, suitable for comparison with simulations. Here, Parmbsc0 εzOLI [13] and CHARMM36 [11], specifically developed to improve the representation of the DNA backbone, were tested with respect to the sequence-specific BI and BII populations in four dodecamers, derived from 31 P chemical shifts (δPs) [24].
The results show that the Parmbsc0 εzOLI and CHARMM36 potentials produce substantial BII populations, closer to their experimentally inferred counterpart than those obtained with preceding force fields [5,9,40,41]. Many simulated BII propensities of the four dodecamers compare satisfactorily to experiment, a very encouraging achievement. This provides the foundation to understand the factors underpinning the differentiated BI $ BII equilibrium behavior across base steps. In particular, this context may be better adapted to investigate the quantum-mechanical origin of the phosphate chemical shifts [51].
However, the experimental sequence effect on BII propensities is still imperfectly reproduced by simulations, each force field displaying its own weaknesses. Parmbsc0 εzOLI , as reported by its developers [13], globally underestimates the BII propensities. The CHARMM36 biases include generating too high BII percentages on TpA or, conversely, suppressing the BII character of GpC and some CpA and TpG. The procedure translating experimental δP to BII% is not devoid of uncertainties [51], but they would not account for the most severe discrepancies. For instance, the simulated TpA being BII-richer than GpC is clearly inconsistent with both NMR and X-ray data [15,24,28]. There was no evidence that the residual discrepancies in the sequence effect on the BI and BII populations resulted from insufficient sampling. The BII percentages were found converged well before the half microsecond timescale under monitored MD length increase. Since the BI$BII exchange occurs in the pico to nanosecond time range in RMN experiments [21,49], which is short compared to current simulation times, one does not expect that the BI$BII equilibrium distributions would be significantly affected by increasing the sampling time. However, one cannot exclude the existence of hypothetical and currently unknown slower motions, on a longer time scale not probed by the present simulations, which might influence the BI $ BII equilibrium. Instead, progress is likely to require further refinements of the potentials. Considering the charged nature of the phosphate groups, it is possible that polarisable DNA force-fields will be required to reach a more satisfactory treatment of the sequence-dependent DNA properties [7,65].
Despite some limitations, the present MDs are very helpful to scrutinize the DNA backbone dynamics, especially the behavior of facing phosphate groups within complementary dinucleotides. In that respect, Parmbsc0 εzOLI or CHARMM36 yielded similar features despite strong differences in their conception and parametrizations. This convergence strengthens the results. First the simulations indicate that concomitant BI $ BII transitions on two facing phosphates are much rarer than transitions involving only one phosphate. Second, statistical analysis of the simulations established that the conformational states of the two individual phosphates within a complementary dinucleotide were independent of each other. As a consequence, the BI•BI, BI•BII|BII•BI and BII•BII populations can be assessed from the individual BII percentages inferred from δPs, using straightforward equations. Importantly, this approach reveals that there is a sizable number of steps where BII-containing states dominate. Thus, more than one fourth of the 36 complementary dinucleotides spend more time in BI•BII|BII•BI and BII•-BII than in BI•BI; all the complementary dinucleotides explore BI•BII|BII•BI in addition to BI•BI, with various populations of BI•BII|BII•BI; however, BII•BII is more restricted, apparently only significantly populated in a few types of BII-rich steps.
Since the behavior of facing phosphates was uncorrelated in all the 36 complementary steps studied here, one can reasonably infer that this is a general property of any B-DNA. Thus, according to the general and predictable sequence effect on experimental BII propensities [27,28], the BII-containing combinations (BI•BII|BII•BI and BII•BII) are expected to be largely represented or even statistically dominant in CpG•CpG, CpA•TpG, GpC•GpC and GpG•CpC. The steps less propitious to BII, GpA•TpC, ApN•NpT (N: any base) and TpA•TpA, favor BI•BI but they also present modest fractions of BI•BII|BII•BI.
Such findings are of fundamental importance because of the strong couplings between these fine-grained backbone states and the inter base pair parameters of slide, roll and twist, consistent with initial observations on X-ray structures [15,16]. Such couplings are not only confirmed here, but further characterized in solution for a broader range of steps, offering a unifying theme underpinning the intrinsic mechanics of B-DNA. Given the recurrent occurrence of BII-containing combinations, it follows that the accessible conformational landscape of most complementary dinucleotides extends into a region characterized by positive slide, negative roll and high twist ("BII profile").
This enhanced intrinsic malleability is relevant to the reading of DNA by proteins, since it increases the repertoire of states which may be critical to initiate selective recognition by facilitating local, structural DNA adjustments upon protein binding. The implication of BII-rich steps in indirect readout mechanisms, via their ability to modulate the DNA shape, has been previously highlighted [9,34,43,66]. In addition, the present work touched upon the counterintuitive example of the BI-rich (positive rolls) TTAAA segment in Oligo 3, which nevertheless also accesses negative rolls (BII•BI|BI•BII) in solution, reminiscent of the pattern of negative rolls observed in its nucleosome-bound form. So, the energetic penalty induced by the DNA deformation upon protein binding could be less than expected in many cases, especially when BII-like features are involved for the structural fit between the partners. Thus, the present characterization of free DNA is conceptually relevant to a deeper understanding of the selective recognition of DNA. The investigated force-fields Parmbsc0 εzOLI or CHARMM36 may also prove advantageous when simulating such events.

DNA sequences
Four oligodeoxyribonucleotides of 12 base pairs (bp) (sequences in Table 1) were studied by NMR and MD simulations. These sequences, placed end to end after discarding the terminal base pairs, recompose a continuous 39 bp segment corresponding to the 5' part of the non-palindromic sequence 601, selected from SELEX experiments for its very high-affinity for association with the histone octamers [67].

BII propensities from NMR
Sample preparation and NMR spectroscopy protocols were reported in a previous study [27]. All the NMR data are available in the Biological Magnetic Resonance Bank, entry 19222.
BII percentages (BII%) of the phosphate linkages along the four dodecamers were inferred from the phosphate chemical shifts (δPs, referenced to trimethyl phosphate) collected at 30°, using the equation BII(%) = 143 δP + 621 [24]. This equation is based on an empirical procedure that assumes the same δPs for purely BI or BII states of every dinucleotide, which is unlikely to be strictly correct [51]. Although previous studies showed that it is a reasonable approximation [27,53], we decided to allow a large tolerance of ±10% on the BII percentages inferred from the experimental δPs to take into account uncertainty on the translation procedure.
MDs with Parmbsc0 εζOLI and CHARMM36 force fields MD simulations were performed with the Parmbsc0 εzOLI force-field [13] using the AMBER 14 program [68], or the CHARMM36 force-field [11] with program NAMD [69]. Parmbsc0 εzOLI and CHARMM36 simulations were carried out following protocols as comparable as possible. Yet, with Parmbsc0 εzOLI and CHARMM36, we used the counterion parameters classically associated to the Amber [70] and CHARMM [71] force-fields, respectively.
Parmbsc0 εzOLI and CHARMM36 simulations were performed at constant temperature (300K) and pressure (1bar) using the Berendsen algorithm [72]. The integration time-step was 2fs and covalent bonds involving hydrogen were constrained using SHAKE [73]. The non-bonded pair-list was updated heuristically. Long-range electrostatic interactions were treated using the particle mesh Ewald (PME) approach [74]. Non-bonded interactions were treated with a 9Å direct space cut-off in AMBER and with a force-shift function from 10 to 12 Å [75] with CHARMM36. In AMBER, the centre-of-mass motion was removed every 10ps.
With both Parmbsc0 εzOLI and CHARMM36, each dodecamer in initial standard B-DNA conformation was neutralized with 22 Na + ions (minimal salt condition,~50 mM Na + ), in explicit TIP3P water molecules [76]; the primary boxes were truncated octahedrons with solvent extending 15Å around the DNA. The water molecules and counterions were energy-minimized and equilibrated at 100K around the constrained DNA for 100ps in the NVT ensemble; the entire system was then heated from 100 to 300K in 10ps by 5K increments with harmonic positional restraints of 5.0 kcal/mol/Å 2 on the DNA atoms. The molecular dynamics simulations were continued in NPT, without notable change in volume. The positional restraints were gradually removed over 250ps and followed by the production phase. During the simulations, distance restraints were applied between base atoms of the first and last base pairs of each dodecamers, to prevent their opening. No restraint was applied on any of the internal nucleotides. The application of restraints on the terminal base pairs is justified in the next section, which highlights the benefits of conducting DNA simulations work alongside experimental characterization. MD snapshots were saved every 1 ps.

Restrained base pairing on the first and last base pairs
During the simulations with Parmbsc0 εzOLI and CHARMM36, distance restraints were applied to maintain the Watson-Crick base-pairing in the first and last base pairs of each dodecamers, to prevent their opening. These restraints were applied on the terminal base pairs between base atoms involved in Watson-Crick hydrogen-bonding (Distance donor/acceptor = 2.9±0.2Å) via a parabolic potential with a force-constant of 10 kcal/mol/Å 2 . The application of these restraints was motivated by the behavior of terminal base-pairs in unrestrained simulations, which are not presented in details here. We only give a summary of the unrestrained terminal base-pairs simulations compared to relevant NMR data, to justify the application of restraints in the presented MDs.
In the unrestrained simulations with Parmbsc0 εzOLI , the first (N 1 :N 24 , N for any base type) and last (N 12 :N 13 ) base pairs were generally open. These terminal bases, once extruded, got involved in various structural patterns that persisted during several hundreds of nanoseconds. In the most prevalent conformations, these bases interact with the penultimate phosphate group, insert into the minor groove or mispair with an antepenultimate base. These conformations impact some χ angles, are associated with unusual backbone dihedrals in N 1 pN 2 , N 11 pN 12 , N 13 pN 14 or N 23 pN 24 , and break the stacking with the 3' or 5' neighbors (N 2 , N 11 , N 14 or N 23 ). With CHARMM36, the first two base pairs opened after a few nanoseconds and, as in Parmbsc0 εzOLI MDs, adopted multiple non-canonical structures. In these unrestrained MDs, base pair opening only occurred at the termini of the dodecamers and did not propagate further. Such behavior is not specific to our simulations since it was previously described for MDs with Parmbsc0 and Parmbsc0 OLI [5,13,48] or CHARMM36 [5], which used DNA sequences and simulation protocols different from the ones used here.
The re-orientation of the terminal bases towards the internal double stranded part of the DNA are not supported by the NMR data collected at 20 and 30°C on the four dodecamers. In one-dimensional 1 H spectra at 30°(303K), the imino proton resonances are lost in the terminal base pairs while they are clearly observable in all the internal base pairs (from N 2 :N 23 to N 11 : N 14 ). This excludes long-lived disruption of the Watson-Crick hydrogen bonds in the penultimate base pairs (MDs with CHARMM36) or mispairing between a terminal base and an antepenultimate base (MDs with Parmbsc0 εzOLI ). The glycosidic bonds of the terminal nucleotides, probed by the intranucleotide distances H1'-H6/8, adopt the anti conformation. Furthermore, the numerous sequential NOEs between the penultimate and antepenultimate residues (N 2 pN 3 , N 10 pN 11 , N 14 pN 15 or N 22 pN 23 ) do not support extensive break of their stacking or abnormal structural features. NMR measurements also give information about the terminal steps, N 1 pN 2 , N 11 pN 12 , N 13 pN 14 and N 23 pN 24 . The corresponding 31 P chemical shifts are in the range of the internal phosphates. Intense, well defined 31 P-1 H4' couplings testify that the 3' terminal phosphate groups conform to usual backbone conformation, since these couplings are observable only when α/β/γ are in g-/trans/g+ [21,77], the typical conformation of B-DNA. Finally, sequential NOE connectivities, clearly visible in all the terminal steps, imply that the fraying events do not generate large distance between open terminal bases and the penultimate residues.
In agreement with a detailed study of this topic [48], our NMR data indicate that current force fields do not yet provide a satisfactory description of the fraying of the terminal base pairs. The convergence issues induced by the behavior of the terminal regions in our unrestrained MDs are discussed in the Result section.
DNA structures were analyzed with Curves5 [78] and 3DNA [79]. Both programs produced almost identical helical parameter values. The inter base-pair parameters presented here for complementary dinucleotides NpN•NpN are those from Curves5. Only the 10 central basepairs of each dodecamer were analyzed.