Blind Predictions of DNA and RNA Tweezers Experiments with Force and Torque

Single-molecule tweezers measurements of double-stranded nucleic acids (dsDNA and dsRNA) provide unprecedented opportunities to dissect how these fundamental molecules respond to forces and torques analogous to those applied by topoisomerases, viral capsids, and other biological partners. However, tweezers data are still most commonly interpreted post facto in the framework of simple analytical models. Testing falsifiable predictions of state-of-the-art nucleic acid models would be more illuminating but has not been performed. Here we describe a blind challenge in which numerical predictions of nucleic acid mechanical properties were compared to experimental data obtained recently for dsRNA under applied force and torque. The predictions were enabled by the HelixMC package, first presented in this paper. HelixMC advances crystallography-derived base-pair level models (BPLMs) to simulate kilobase-length dsDNAs and dsRNAs under external forces and torques, including their global linking numbers. These calculations recovered the experimental bending persistence length of dsRNA within the error of the simulations and accurately predicted that dsRNA's “spring-like” conformation would give a two-fold decrease of stretch modulus relative to dsDNA. Further blind predictions of helix torsional properties, however, exposed inaccuracies in current BPLM theory, including three-fold discrepancies in torsional persistence length at the high force limit and the incorrect sign of dsRNA link-extension (twist-stretch) coupling. Beyond these experiments, HelixMC predicted that ‘nucleosome-excluding’ poly(A)/poly(T) is at least two-fold stiffer than random-sequence dsDNA in bending, stretching, and torsional behaviors; Z-DNA to be at least three-fold stiffer than random-sequence dsDNA, with a near-zero link-extension coupling; and non-negligible effects from base pair step correlations. We propose that experimentally testing these predictions should be powerful next steps for understanding the flexibility of dsDNA and dsRNA in sequence contexts and under mechanical stresses relevant to their biology.


Introduction
Nucleic acids play central roles in biological processes including transcription, translation, catalysis and regulation of gene expression [1,2]. Double-stranded RNA and DNA (dsRNA and dsDNA) stretch and twist when interacting with proteins [3,4] and when forming compact structures such as nucleosomes [5] and packaged viruses [6,7]. Understanding such deformations is critical for a fundamental understanding of nucleic acids in their biological contexts and for efforts to rationally engineer nanostructures built from dsRNA and dsDNA helices. High precision experimental data are becoming increasingly available from measurements using optical and magnetic tweezers [8][9][10][11][12][13][14][15][16][17][18][19][20] that measure end-toend lengths and linking numbers of kilobase-length single molecules upon variation of solution condition, sequence, applied force and torque. In principle, these data offer rigorous challenges that can falsify or validate -and thereby advance -models of nucleic acid flexibility. However, such direct comparison of model predictions and experimental observables remains incomplete.
On one hand, fits to analytical equations based on worm-like chain (WLC) or elastic rod models are in common use for interpreting single-molecule manipulation data [14,[21][22][23][24], but they lack the power of predicting new experimental results and involve numerous approximations (see below). On the other hand, high-resolution approaches that integrate all-atom energy functions and crystallographic knowledge [25][26][27][28][29][30][31][32] offer the prospect of predictive calculations, but the computational costs to simulate kilobase-scale helices remain prohibitively large. Coarse-grained models, such as the base-pair level models (BPLMs) pioneered by Olson and colleagues [33] as well as models that use reduced representations for each base (rather than base-pair) [34][35][36], provide mesoscopic ''bridges'' between simple analytical models and atomic-level simulations. In this work, we focus on BPLMs as they have fewer degrees of freedom than single-base level models, enabling efficient calculations, and their parameterization can be more easily refined by the growing data of crystallographic structures [33,[37][38][39][40][41][42][43][44][45][46][47]. It is worth noting that BPLM is only expected to be applicable to duplexes at low-to-medium tension.
Despite continuing advances, BPLM simulation methods have not yet been used to make direct comparisons with single-molecule experiments. BPLM simulations have focused on helices up to hundreds of base-pairs, significantly smaller than the kilobase lengths probed in single-molecule experiments at which helix bending and twisting may play significant roles in the measured properties. In addition, BPLM calculations have been primarily developed for B-DNA duplexes; growing crystallographic knowledge for dsRNA helices has not been integrated into the BPLM framework. Finally, accurate methods for computing and constraining the twist, writhe, and link of discrete, open-ended helices have not been established until recently [53][54][55][56] and have not been integrated into BPLM modeling.
Here, we describe a blind prediction challenge, where developers of modeling algorithms (FCC, RD) predicted unreleased data on the mechanical properties of dsDNA and dsRNA helices measured by a team of experimenters (Lipfert et al., unpublished data). More specifically, the torsional properties and stretch modulus of dsRNA have not been previously reported (only the bending persistence length of dsRNA was measured previously [13]; the stretch modulus of dsRNA was published during the modeling [20]). This challenge motivated the development of a software package HelixMC, first presented in this work, to close the methodological gaps described above and thus enable simulations of force vs. extension, effective torsional persistence vs. force, link vs. force, and extension vs. link experiments. The goal of calculating actual experimental observables necessitated several systematic studies to check widespread but poorly tested modeling assumptions, including simulation-based validations of the Moroz-Nelson formula for torsional persistence length [21,22]. Most importantly, the rigorous comparison between blind predictions and data revealed how current BPLMs largely succeed in modeling stretching and bending but apparently miss physics necessary for understanding dsDNA and dsRNA torsional properties. Finally, HelixMC predictions for previously unmea-sured properties of two biological important variants, poly (A)/ poly (T) dsDNA and Z-DNA, delineate future experiments that will allow incisive evaluation and revision of current modeling approaches.

Brief overview of the simulation
Before presenting the results of the blind prediction, we present an overview of the simulation system and algorithm. Detailed descriptions are given in the Methods section. BPLMs [33,[37][38][39][40][41][42][43][44][45][46] abstract the entire duplex into multiple base-pairs stacking on top of each other. The coordinate transformation between two neighbor base-pairs (i.e. a base-pair step) is conventionally described with six standard step parameters (shift, slide, rise, tilt, roll, and twist). The internal interactions between neighbor basepairs can therefore be described using the distribution of these parameters drawn from the Protein Data Bank (PDB) in sixdimensional (6D) space. Typically, these 6D distributions are approximated with 6D multivariate Gaussians to allow continuous sampling of the conformation space. We also tested an alternative scheme which samples directly from existing parameters in the database, without assuming Gaussianity.
The duplexes, represented in BPLM, are then simulated with a Metropolis Monte Carlo (MC) method, with stretching forces and torsional constraints incorporated into the energy function. By default we simulated dsDNA/dsRNA of 3,000 base-pairs at room temperature (298K). At the end of each cycle of Monte Carlo updates, the helix extension and the linking number are recorded. For direct comparison to single molecular tweezers analysis, these data from simulations at different forces and torsional constraints are then used to compute global mechanical properties including bending persistence length, stretch modulus, torsional persistence length and link-extension coupling, by fitting to analytical equations based on the elastic rod model.

Setup of blind prediction challenges
Single-molecule tweezers experiments allow accurate measurements of the extension and the linking number of long molecules under externally applied stretching forces and torques. Typical experiments include force vs. extension, effective torsional persistence vs. force, link vs. force, and extension vs. link measurements. The published literature on dsDNA mechanical measurements is extensive (see e.g. [10,11,18,19]), but magnetic tweezers data directly probing the torsional properties of dsRNA had not been published at the time of this study (only the bending of dsRNA has been previously studied [13]). Instead, a comprehensive experimental portrait (Lipfert et al., unpublished data) had been acquired by one of us with colleagues but was not publicly released. This situation therefore permitted blind prediction tests of the BPLM approach. Our modeling challenges were to simulate the different experimental setups, to test the applicability of phenomenological formulae used for curve-fitting, and to make quantitative predictions with estimated errors for the following standard constants: bending persistence length A, stretch modulus S, torsional persistence length C, and link-extension coupling g.

Accurate recovery of helix bending
Drawing on extensive prior work [33,54,56], we were able to simulate dsDNA (for validation of the algorithm) and dsRNA (for blind prediction) under applied force using HelixMC. Fig. 1 gives example simulation frames with random sequences, with BPLMs parameterized on crystallographic data with diffraction resolutions better than 2.8 Å and without proteins. (Other BPLM variants are Author Summary DNA and RNA are fundamental molecules in the central dogma of molecular biology. Many biological behaviors of double-stranded DNA and RNA -including transcription/ translation by proteins and packaging into compact structures -depend on their ability to flex and twist. Single-molecule tweezers now provide accurate mechanical measurements of DNA and RNA helices under force and torque but have not been used to rigorously falsify and thereby advance computational models. Here we present the first such blind challenge, involving recent dsRNA tweezers data that were kept hidden from modelers and a new HelixMC toolkit that resolves challenges in simulating long double helices from basepair level models. The predictions gave excellent agreement with bending and stretching measurements of dsRNA but failed to recover twisting properties, pinpointing a critical area of future investigation. HelixMC also predicted that poly(A)/poly(T) and Z-DNA-biologically important variants whose elastic responses have not been studied with tweezers-will have distinct mechanical properties. These results open a route to iteratively falsifying and refining computational models of long nucleic acid helices, as is necessary for attaining a predictive understanding of their biological behaviors. described below.) For both dsDNA and dsRNA, higher stretching force leads to longer end-to-end extensions and smaller fluctuations orthogonal to the stretching direction, qualitatively consistent with theoretical predictions and experimental observations.
Measurements of the mean end-to-end extension as a function of force give quantitative data for how nucleic acid helices bend, and we first tested if HelixMC recovered the bending persistence length seen in experiments for dsDNA. The simulated data fit well to standard models used in interpreting tweezers experiments, including the extensible worm-like chain (WLC) model proposed by Bouchiat et al. [23] (Fig. 2A; A = 54.760.6 nm), the inextensible WLC model [23] (A = 5361 nm), and an alternative extensible WLC fitting model developed by Odijk [57] (A = 5561.0 nm); see Table S1 and Fig. S1. The agreement of all three fits to each other and to more direct estimates of A by averaging the base-pair step transforming matrix [33] (A = 53.061.0 nm) confirmed the robustness of A as a comparison metric between experimental and simulated data. To bracket systematic error, we further performed simulations using BPLMs with a high-resolution subset of crystallographic data (2.0 Å vs. 2.8 Å diffraction resolution cutoff), without using a Gaussian approximation for the BPLM distributions, and symmetrizing the base-pair step parameters; these variations gave less than 10% changes in A (Table 1 and S2). We did however find that inclusion of protein/DNA crystallographic structures, which include more distorted helical conformations, led to reduction of A by 30% to 39 nm. Given this level of systematic error, the agreement of the HelixMC calculation and the experimental value for dsDNA (A in the range of 44-49 nm at near-physiological salt concentrations [16,20,58,59]) was reasonable.
The agreement for dsDNA suggested that the prediction of the dsRNA bending would be similarly accurate. The HelixMC prediction for dsRNA was 66 nm, greater than the value for dsDNA, with a systematic error of ,30%, again based on an alternative BPLM parameterization including protein/RNA crystallographic models ( Table 2 and Fig. 2B). Experimental dsRNA tweezers measurements gave values of A = 5762 nm (Lipfert et al., unpublished data) and 5963 nm [20], greater than the value for dsDNA and in quantitative agreement with the HelixMC value.

Stretch modulus and 'springiness'
In addition to enabling fits of the bending persistence length A, force/extension curves give estimates of the stretch modulus S, particularly at high force where the helix is pulled straight without bends. For dsDNA simulations with several variations, the HelixMC calculations gave estimates of S = 2000 pN. As with the bending behavior, inclusion of protein/DNA structures produced lower stretch modulus values, corresponding to more flexibility (S = 1500 pN; Table 2). These calculations overestimated the experimentally measured value for dsDNA of S in the range of 900-1400 pN [20,60,61], slightly beyond our estimated error.
The HelixMC prediction for the stretch modulus of dsRNA was S = 980 pN, with a systematic error of 25%. This estimate was also supported by using an alternative model to fit the simulation stretch modulus (Table S3 and Fig. S1). Given the dsDNA results above, we expected this HelixMC value to overshoot the experimental measurement. Nevertheless, beyond this error in absolute values, we strongly expected that dsRNA would give a relative stretch modulus significantly lower than dsDNA. Unlike the nearly straight axis curve of dsDNA, the base-pair centers of dsRNA trace a 'spring-like' axis curve, twirling in circles of radius 8 Å . We developed a novel ''springiness'' hypothesis, that this ''spring-like'' property of dsRNA would render it more pliable to stretching, analogous to a spring's lower stretch modulus compared to a straight wire (Fig. 3). Indeed, the experimental measurements for the dsRNA stretch modulus was 3506100 pN (Lipfert et al., unpublished data), more than two-fold less than for dsDNA, in agreement with our prediction. An independent experimental dsRNA measurement released at the time of modeling gave a similar value lower than dsDNA (500-683 pN) [20]. Additional simulation-based tests of the 'springiness' hypothesis are described in Supplementary Results and Table  S4, S5.

Discrepancies in torsional persistence length
The development of magnetic tweezers with increasingly sophisticated geometries has enabled torsion-sensitive measurements of dsDNA [16,[62][63][64] and, most recently, measurements on dsRNA that were included in our blind challenge. Before describing the blind comparison, we present HelixMC simulations that were necessary to shed light on puzzling prior results on dsDNA torsional stiffness. Measurements based on topoisomer distributions of closed dsDNA circles, fluorescence polarization anisotropy of intercalated dyes, and x-ray scattering of tethered gold nanoparticles give lower values for torsional persistence length (C = 25-80 nm [47,[65][66][67][68]) than measurements from optical and magnetic tweezers experiments (C = 100-120 nm [12,16,17,21,59]) from several different laboratories and with different tweezers geometries. One potential resolution to these discrepancies is that the apparent torsional stiffness of dsDNA is enhanced beyond its intrinsic value due to tethering constraints that attenuate torsional fluctuations in single-molecule experiments [44]. However, testing this hypothesis has been complicated by a prior inability to integrate link (number of helix turns) in basepair-level simulations. Additional concerns have stemmed from the poor quality of fits to infer C from single molecule experiments with the analytical Moroz-Nelson formula [21,22], which assumes the Fuller writhe expression and negligible self-avoidance effects.
To address these problems, we reasoned that the direct simulations enabled by HelixMC would reveal any systematic overestimation of intrinsic torsional persistence length due to tethering constraints or to the inaccuracy of the Moroz-Nelson model. First, we simulated link fluctuations in dsDNA helices as a function of force, analogous to experiments in references [12,17], and computed the effective torsional persistence length C eff by dividing the contour length of the polymer by the variance of the link ( Table 2 and Fig. 4A-B). We first observed that the asymptotic value of C eff (29-40 nm) in our simulation was within error of the 'intrinsic' value computed from a normal mode analysis (37.5 nm [43]), suggesting that C is not overestimated due to the tethering setup in single molecule experiments. We also tested the effects of x-y constraints (perpendicular to the direction of pulling) that might dampen torsional fluctuations, although such constraints are negligible in magnetic tweezers setups (and would also be expected to have a suppressive effect on bending fluctuations). Applying a harmonic x-y restoring force with strength of 0.025 pN/nm gave no significant change in C eff (Fig.  S2), disfavoring tether constraints as an explanation for the high C anomaly. Second, to test the use of the Moroz-Nelson formula, we fit these simulation data to the Moroz-Nelson model, and found excellent agreement with the same C values as described above. The rarity of self-clashing conformations (Supplementary Results and Table S6) and validity of the Fuller writhe formula above 0.4 pN further supported the use of this analytical fit. As a final crosscheck, we also computed the torsional persistence length using the slope of torque vs. number of turns in independent linkconstrained simulations at 7 pN, analogous to an alternative experimental approach [16,59,62] (Fig. 4C-D, Supplementary Methods). This second simulation method gave torsional persistence length values that agreed well with the first method (within 1%, Table S7), confirming the robustness of the simulation method and Moroz-Nelson fits for inferring C in a way that matches experimental procedures.
Given the checks above, the discrepancy between the simulated dsDNA torsional persistence length C = 28.8 nm and the value in single molecule experiments C = 109 nm cannot be easily explained by systematic errors in the modeling. Furthermore, the deviation of experimental measurements from the Moroz-Nelson formula [16,17] does not appear to be due to inaccuracies in this phenomenological model, given the successful fits of the model to simulated data. The discrepancies in C value and fitting curve strongly indicate either missing physics in modeling dsDNA in both the BPLM and simpler elastic-rod frameworks or currently unknown systematic errors in the experiment (see below, Discussion). Given these issues, we expected that our blind prediction for the torsional persistence length of RNA (C = 53 nm) might be an underestimate of the value measured from magnetic tweezers experiment. Indeed the experimental value was two-fold higher, with C = 100 nm. However, as with the dsDNA measurements, the Moroz-Nelson formula fit these experimental measurements relatively poorly (Lipfert et al. unpublished data), suggesting that some basic assumption of the BPLM approach is violated (see Discussion below).

A stringent test from link-extension (''twist-stretch'') coupling
The first measurements of helix mean end-to-end distance versus mean linking number for dsDNA highlighted gaps in theories of DNA elasticity [14,15]. We thus expected that our final blind challenge, to predict analogous experiments for dsRNA, would provide a highly stringent test for HelixMC and the BPLM approach.
Before presenting the blind comparison, we describe simulationbased tests of assumptions made in the experimental inference of the link-extension coupling g (also described as twist-stretch coupling). In previous work, the coupling has been estimated from two different kinds of experiments: (1) stretching the polymer at different forces and observing how the linking number changes in the process [14,69], and (2) setting up a constant stretching force and observing the polymer's extension as increasing numbers of turns are introduced [14,15]. In both cases, bending fluctuations at low force (,15 pN) should, in principle, cause deviations from the linear relationships assumed to fit the experimental data (Supplementary Results, Fig. S3, S4). Nevertheless, linear relationships have been empirically observed for link and force (in experiment type 1) and of link and extension (in experiment type 2, but not in experiment type 1) for experiments on dsDNA. Furthermore, linear fits from these independent types of experiments gave consistent results (g = 290620 pN?nm and 270620 pN?nm, respectively); due to the convention in use, the negative sign corresponds to over-winding of the double helix upon extension ( Table 2 and Fig. 5). This empirical relation was indeed confirmed in our simulations. We discovered linear correspondences between dsDNA link and extension in both types of simulated experiments, despite non-linear relationships of the underlying variables. The simulated dsDNA data gave couplings of g = 2130 pN?nm and 2 150 pN?nm, respectively, for the two types of experiments, with  The values in parenthesis are the corresponding fitting errors. 1 Computed from the slope of link vs. force plots. 2 Computed from the slope of extension vs. link plots. 3 Data from Lipfert et al., unpublished data. 4 Data from Gore et al. Nature 2006 (ref. [14]). 5 See Table 1 for detailed description for each parameter set. 6 ''_frag'' stands for using fragment picking sampling method. doi:10.1371/journal.pcbi.1003756.t002 systematic errors of 630 pN?nm, based on alternative BPLM parameterizations ( Table 2). The dsDNA calculations were therefore in agreement with experimental values within the estimated errors.
For dsRNA, the HelixMC-predicted g value was 2120 pN?nm (from simulations of both types of experiments), with errors of 640 pN?nm based on alternative BPLM parameterizations ( Table 2 and Fig. 5). This predicted dsRNA value is the same,  within error, as the dsDNA simulations. Nevertheless, separation of the link into twist and writhe components in the simulation suggested a different physical picture of link-extension coupling to dsRNA than for dsDNA. The simulated writhe vs. force slope is negative for dsRNA but nearly zero for dsDNA. This effect can be again attributed to the ''springiness'' of dsRNA axis curve, which carries an intrinsic writhe. Stretching dsRNA unwinds this writhe, while stretching dsDNA has little impact on its already straight axis curve. This behavior would result in a positive link-extension coupling g value, opposite in sign to dsDNA. However in the HelixMC dsRNA simulations, the helix twist, the other component of link, rises with extension and overpowers the writhe decrease to produce a net negative link-extension slope, matching the sign of dsDNA simulations.
The dsRNA tweezers experiments gave a value of g = + 47614 pN?nm, different from the value given by blind prediction (2120 pN?nm). This discrepancy is well beyond the error associated with different BPLM parameterizations, providing strong evidence against the current BPLM framework for modeling the torsional flexibility of dsRNA. Since the linkextension slope for RNA is a result of cancellation between a positive twist-extension correlation and a negative writhe-extension correlation, the predicted slope is quite sensitive to changes of many of the parameters of the underlying Gaussian potential (Supplementary Results, Table S8, S9). Indeed, by modification of the parameters, we were able to recapitulate the experimentally measured link-extension coupling, as discussed extensively in the experimental paper associated with this work (Lipfert et al., unpublished data). However we note here that this reparameterization is not unique, because the number of parameters (15, for a 6D covariance matrix) is far greater than the number of experimental measurements (four, i.e. bending persistence, stretch modulus, torsional persistence and link-extension coupling).

Design of future tests
To understand the sequence-dependence of the mechanical properties being studied, and to propose future tests of the BPLM approach, we performed additional simulations of poly(A)/poly(T) and poly(G)/poly(C) for both DNA and RNA (which has U instead of T). Stretches of these homopolymer sequences play critical roles in accessibility of chromatin to RNA polymerase and transcription factors [70,71]. We also performed simulations on Zform DNA, which has been hypothesized to occur during DNA transcription to absorb torsional stress [72]. The results are listed in Table 2. For sequence-dependent simulations, we found that for poly(A)/poly(T) DNA, using the default dataset, all the measured mechanical properties increased by 1.5-to 3-fold compared to the random-sequence simulations. However if we used BPLM parameters from the 2.8_all dataset, which includes proteinbinding DNA structures, the poly(A)/poly(T) results were not significantly different from the random-sequence results. The difference of predicted stiffness can be explained by the different underlying base-pair step parameters (Supplementary Results,  Table S10). We also found smaller but measurable differences between other sequence-specified and random-sequence simulations, and between sequence-specified simulations performed with different base-pair step parameter sets. Further experimental comparisons between sequence-specific and random-sequence DNA/RNA will provide stringent tests of these predictions and to help discriminate which dataset (if any) is more accurate in modeling the sequence-dependence of the mechanical properties.
Simulations of Z-DNA gave dramatically higher bending and torsional persistence lengths (175 nm and 125 nm, respectively) compared to random B-DNA (55 nm and 29 nm, respectively). Again, this higher stiffness is encoded in the underlying step parameters (Supplementary Results, Table S10). Furthermore, the link-extension coupling is estimated to be near zero; this value arises from a complicated cancellation of twist and writhe, and is difficult to explain with simple arguments. Our simulation results agree with data obtained by Thomas and Bloomfield [73] indicating Z-DNA to be much stiffer than B-DNA, with a bending persistence length of 200 nm. However, previous studies on Z-DNA using light scattering, electron microscopy and fluorescence anisotropy have led to inconsistent results, with bending persistence length ranging from 21 to 200 nm and an extremely low torsional persistence length of 7 nm [73][74][75]. These studies did not agree on whether Z-DNA is stiffer then B-DNA. Additional single-molecule tweezers experiments on Z-DNA appear necessary to resolve these issues, and would provide stringent tests of the BPLM approach.

Discussion
We have presented a set of fundamental tests of how well base-pair level models predict the flexibility of double-stranded nucleic acids, motivated by a desire for improved rigor in this field and by recent single-molecule measurements of dsRNA helices that were blinded to the modelers. A new software package HelixMC that integrates rigorous treatment of twist, writhe, and link allowed direct simulations of dsDNA and dsRNA tweezers experiments with base-pair level models. By fitting the simulated observables with the same analytical models used in experimental measurements, we were able to make direct comparisons of simulation and theory for properties including the bending persistence length, stretch modulus, torsional persistence length and link-extension coupling. We obtained predictions that match some experimental observations, particularly in the ratios of dsRNA to dsDNA values for mechanical properties like bending persistence length. However, we observed quantitative discrepancies for torsional persistence length at high force and the incorrect sign of the link-extension coupling constant for dsRNA. An extensive set of simulations checked that assumptions such as the effects of tethering, the Moroz-Nelson model of torsional persistence length, the curation of the database used to parameterize the BPLM, and the fitted relation of force and link could not account for these discrepancies.
The discrepancies between the BPLM model and tweezers measurements could be due to at least five reasons. First, electrostatic repulsion may account for some discrepancies, but it is difficult to see how corrections needed to increase the torsional stiffness of simulations by three-fold would not also substantially increase the simulated bending stiffness beyond the current values, which agree well with experiments. Experiments with different ionic conditions (particularly highly screening conditions) would help bound these effects. A second possibility is that the base-pair step distributions observed in crystallized nucleic acids do not reflect the fluctuations of nucleic acids in solution [47]. In this case, however, neither a simple overall scaling nor the parsimonious adjustment of a few parameters suffices to bring simulated data into agreement with experiments. Large changes in multiple BPLM parameters are required, in different directions for dsDNA vs. dsRNA and beyond the systematic deviations seen in different curated crystallographic databases, especially to account for a sign change in dsRNA linkextension coupling while retaining the experimental value for dsDNA link-extension coupling (Supplementary Results and Table  S8, S9). A third explanation might involve thermal fluctuations involving bulges or non-Watson-Crick pairs, as have been resolved recently albeit with rare population [76]; the population of these alternative structures could be potentially enhanced during torsional stress. Due to the energetic cost of such fluctuations, we would predict that they would lead to a strong temperature dependence of torsional properties. Fourth, the conformation of each base-pair step may affect neighboring base-pair steps. Recent Au-SAXS scattering experiments and crystallographic analyses have suggested the importance of such correlations [47,77]. Preliminary tests with multi-base-pair fragments in HelixMC indicate that such correlations may have up to 2-fold effects on predicted tweezers-measured properties (Supplementary Results and Fig. S5, S6).
A final explanation for the discrepancy involves the applied tension in single molecule tweezers experiments. On one hand, the tweezers data at low force (,5 pN) are used to infer the bending persistence length A and low-force effective torsional persistence lengths C eff . These parameters are sensitive to both bending as well as intrinsic torsional persistence length via fluctuations captured by the Moroz-Nelson model. In this low force regime, BPLM gives predictions for both parameters with less-than-two-fold discrepancies, for both dsDNA and dsRNA. On the other hand, forces higher than 4 pN are required to suppress bending fluctuations and thereby to isolate stretch modulus S, intrinsic torsion persistence length C, and link-extension coupling g. For these values, the BPLM predictions do not agree with dsDNA or dsRNA measurements. Indeed, there is a more fundamental discrepancy: while the Moroz-Nelson model accounts for the predicted torsional persistence length vs. force from BPLM calculations over a wide range of model parameters, the experimental measurements of C eff at forces .2 pN cannot be fit by this analytical model. These high-force discrepancies could be rationalized by a model in which tensions greater than 1 pN favor structural states that are more pliant to stretching but torsionally stiffer than the ensemble of conformations seen in crystallized dsRNA and dsDNA. Nucleic acids in solution under constant tension or strong torque, as might be provided by solutionbased tweezers [78] or circularization, may enable bulk experimental methods like NMR or Au-SAXS to test this model. It is also possible that single-molecule tweezers experiments on alternative polymers such as poly(A)/poly(T) or Z-form DNA (simulated above) will agree well at all forces with BPLM predictions and thereby offer a baseline for comparison to the mixed sequence dsDNA and dsRNA cases. Alternatively if atomic-level computational methods could predict the structure of the putative weakly stretched state and design sequences or atomic modifications that favor it, the HelixMC toolkit should be able to integrate predictions for long helices that can then be precisely tested through future tweezers experiments.

System setup and summary of the algorithms
The BPLM framework has been described in detail in previous studies [33]. Briefly, each base pair in the nucleic acid is represented by a vector representing the base-pair center and by a coordinate frame representing the orientation of the base-pair [42]. The degrees of freedom of the system are the base-pair steps, defined by the transformation of coordinates from one base-pair to the next base-pair. Each step is described by six parameters (shift, slide, rise, tilt, roll and twist) [79]. The transformation of the step parameters to Cartesian coordinates follows the Calladine and El Hassan Scheme (the CEHS definition) [80], which is also the convention used in the 3DNA package [81,82]. The 'technical details' section of the 3DNA manual offers comprehensive examples of this scheme.
In HelixMC, the origin and the frame of the first base-pair is placed at the origin of the global coordinate system. That is, the base-pair center is placed at the coordinate origin; the normal vector of the base-pair is aligned with the z-axis; and the long-axis of the base-pair lies on y-axis. In terms of experimental setup, this placement is analogous to fixing one end of the nucleic acid to a surface (i.e. the xy-plane in our simulation), an approach routinely employed in magnetic and optical tweezers studies.
Once the origin and the frame of the first base-pair are set, the coordinates of the entire helix can be computed from the six basepair step parameters. In HelixMC, the conformation of helix is stored and updated in this space of the step parameters, instead of in the Cartesian space. This is similar to describing protein conformations with the internal torsion angles instead of using the Cartesian coordinates of the atoms. For each base-pair step, we assumed the six step parameters form a multivariate normal distribution, of which the parameters were derived by surveying the existing RNA crystal structures (see below). This assumption is equivalent to assuming that positions and orientations of adjacent base-pairs are constrained by a six-dimensional harmonic potential [33].
In this work, the BPLM system was simulated using the Monte Carlo (MC) algorithm. A typical MC run consists of tens of thousands of cycles. A sample, which includes the current extension and linking number of the helix, was extracted at the end of each cycle (i.e. number of cycles equals to number of samples in the simulation). For each cycle, the base-pair steps of the entire helix was updated sequentially starting from the first base-pair step. For each update, a proposed move was generated by modifying only the conformation of the target base-pair step, while keeping the conformation of the rest of the helix intact. Note that the term ''conformation'' here refers to the six step parameters of each base-pair step, not the Cartesian coordinates of the base-pairs. Because we assumed the step parameters follow a multivariate normal distribution, this proposed conformational move can be efficiently achieved by drawing a random sample from the distribution.
The standard Metropolis criterion [83] was then used to whether to accept the proposed MC move: Here DE equals the energy after the proposed move minus the energy of the initial conformation, T is the temperature and k B is the Boltzmann constant. Because the internal interactions between the base-pair steps are included in the multivariate Gaussian sampling, the DE in Eq. (1) only reflects the applied torque and force, as described next. For cases where external forces and torques are absent (free helix), the DE is always zero and the acceptance rate is 100%. For cases with external forces and torques, since each update is applied to one base-pair step only, the new proposed conformation is usually similar to the previous conformation. Therefore the acceptance rates are reasonable in the force and torque range used in this work (8% (40 pN) to 55% (1 pN) for dsDNA, Table S11).
We performed two types of simulations. In the first type of simulation, a stretching force along the z-direction was applied to the free end of the nucleic acid (the other end was fixed to the origin), and no torsional constraint was applied to the system. The energy of the system due to the applied force was

E~{zF ð2Þ
Here F is the applied stretching force, and z is the helix extension. This simulation was equivalent to the measurement of forceextension curves in typical single-molecule magnetic tweezers or constant-force optical tweezers experiments [8,13,62,[84][85][86].
In the second type of simulation, the nucleic acid was subjected to a fixed stretching force and was required to maintain a link (which is equivalent to the bead rotation) close to a target value through a harmonic potential. The energy of the system was: Here k rot is the stiffness of the torsional trap (200 pN?nm by default), Lk is the helix link, and Lk t is the target link of the trap. This type of simulation corresponded to torsion-trapped tweezers experiments [14][15][16][17].
In both types of simulations, we computed the base-pair center and the coordinate frame of the terminal base-pair as well as the overall link of the helix after each full-helix MC update. The number of base pairs in the simulated double helices was set to 3,000 (3 kbp) in this work unless stated otherwise.
At the beginning of the simulation, we initialized the helix by assuming that all base-pair steps have step parameters equal to their average values in the input parameter database. We then performed by default 120 cycles of full-helix MC updates to relax the helix under the specified stretching force (but no linkconstraint). For link-constrained simulations, we performed further relaxation steps analogous to the torsional trap experiments, which involve slowly rotating magnets of the torsional traps to bring the helix from zero-turn state to a highly twisted state. We first turned on the link constraint, but set initial target link equal to the current link of the helix. Then we performed the following cycles: 1. If the current target link was not within 20 degrees of the desired link (input by user), we changed the current target link by 20 degrees towards the desired link. Otherwise we set the target link to be the desired link and exited the loop. 2. We performed MC updates on the helix until the link of the helix was within 20 degrees of the current target link, then went back to step 1.
After this ''trap-ramping'' step, we further relaxed the helix under the specified force and link constraint for 50 cycles. These relaxation steps ensured that the state of the helix at the beginning of the simulation was random and representative of the specified force and link constraint, without memory of the initial conformation.
In the HelixMC package, all the parameters discussed above, including the number of base-pairs and the applied external forces and link constraint, can be modified by user inputs. The details of the setup of the HelixMC calculations reported in this work are given in Supplementary Methods. We set the number of samples collected during our simulations to ensure that the standard errors of the average extensions and links were below 0.2% (Table S12).

Computation of link: From BPLM to ribbon model
Computing torsional properties and modeling torque in HelixMC required the integration of mathematical formulae developed in a number of separate papers by different authors. To document our final approach, we describe these equations and their connections here in some detail.
The observed bead rotation in a single-molecule tweezers experiment is mathematically described by the link (also known as the linking number). The original definition for the link of circular dsDNA is based on a closed continuous ribbon model [87][88][89][90][91]. A ribbon is defined by two mathematical objects: an axis curve, which is a smooth non-self-intersecting closed curve following the axis of the polymer; and a set of ribbon vectors, which are unit normal vectors everywhere along the axis curve that are perpendicular to the axis curve and pointing to reference points on the polymer [91]. To compute the link, we followed previous work by Britton et al. [56] to convert the BPLM to a ribbon model (Fig. 6A). Here we defined the axis curve to be the line connecting the base-pair centers (black vectors, also known as the base-pair centerline), and the ribbon vectors to be the long-axis of the basepair (red vectors). This discretization scheme leads to a polygonal axis curve where multiple straight lines are joined by sharp bends (at the base-pair centers), and the ribbon vectors are defined only at each bend. While this discretization is simple and easy to manipulate numerically, it leads to two problems that forbid direct applications of the formulations for the closed continuous ribbon model to the BPLM. First, the discretization leads to an axis curve with discontinuous first derivatives at each bend. Therefore the tangent vectors at these bends are ill-defined, and the corresponding ribbon vector is in general not perpendicular to both the axis curve segments connected to the bend. This behavior invalidates the original assumption that the axis curve is smooth and the ribbon vectors are always perpendicular to the axis curve. Second, the BPLM we studied here is for open duplexes, different from the closed curve assumption in the conventional treatment.
By the Cȃlugȃreanu theorem (also known as the White's formula, or the Cȃlugȃreanu-White-Fuller theorem), link equals the sum of writhe and twist [87][88][89][90]. Intuitively, writhe represents the degree of coiling of the ribbon axis curve, and twist represents the amount of internal twist stored in the ribbon due to the local rotations of ribbon vectors. The sum of coiling and internal twist gives the overall bead rotation of the ribbon. In the following sections, we discuss separately how to compute the writhe and twist for such an open, polygonal ribbon.

Writhe calculation
Before discussing the writhe calculations for the BPLM, we first review the original definition of writhe, which described the coiling of the axis curve. The writhe of a smooth closed ribbon can be computed using the Gauss linking integral: Here r 1 and r 2 are the Cartesian coordinates of the axis curve, r 12 = r 1 2r 2 is a vector connecting points r 1 and r 2 , and we compute writhe (and, below, link and twist) in units of radians.
Note that writhe only depends on the axis curve of the ribbon. Fuller proposed a simplified version of this integral [91]: Here e z is a unit vector aligned with z-axis, and t is the tangent vector of the axis curve. The Fuller writhe simplifies the original double integral into a single integral but is only correct modulo 4p.
Here the expression ''a;b (mod n)'' means Mathematically speaking, a and b are said to be congruent modulo n.
The calculation of writhe of BPLM in this work is based on previous studies on polygonal open curves [53][54][55]. In the section below, we will derive the formulas for computing writhe in BPLM, mainly following the approach developed by Rossetto and Maggs [54].
Constructing a closed curve. To apply Eqs. (4) and (5) to the BPLM, first we need to convert the open axis curve into a closed curve. This is achieved through the following steps (Fig. 6B) [54]. First, we attached two extension segments l 1 and l 2 , to the lower and upper ends of the helix axis curve h. l 1 and l 2 are parallel to the z-axis, and are extended towards z = 2' and z = + '. Second, we connect l 1 and l 2 with a curve C, such that C, l 1 and l 2 lie on the same plane. We also let C be far apart from the original axis curve, such that the distance between any point on h and any point on C approaches infinity. In this way, we can apply the above equations to a closed curve L = h+l 1 +l 2 +C. For Eq. (4), we get j j 3 , also vanish, because C, l 1 and l 2 lie on the same plane, and therefore (dr 2 6dr 1 )? r 12 = 0. Therefore we have By the above construction, we extended the writhe definition for closed curves to open curves. In the next step, we will show how to evaluate Eq. (9) and (10) for a polygonal curve.
Evaluation of the Fuller writhe. We first compute the Fuller writhe (Eq. (10)) in our system. In this integral, the tangent vector t is a unit vector that starts as e z (tangent vector of l 1 ), moves along the helix axis curve h, and ends up as e z again (tangent vector of l 2 ). If we translated the starting points of all the vectors t to the origin, t traces out a closed curve on a unit sphere, starting and ending at the zenith. Fuller proved that the spherical area enclosed by this closed curve equals the value of integral [91]. Based on this geometrical analogy, previous studies [54,55] have shown that the spherical area enclosed by the tangent vectors of a polygonal line can be computed by breaking down the area into spherical triangles: Here V(e z , t i , t i+1 ) is the area (solid angle) of the spherical triangle with vertices defined by the three vectors, t i~r iz1 {r i r iz1 {r i j j is each tangent vector, and N is the total number of base-pairs in the model. Note that V is a signed area, with sgn(V) = sgn((e z 6t i ) ? t i+1 ). Here sgn is the sign function: The absolute value of the V equals the spherical excess of the triangle: Here A, B, C are the angles at the vertices a, b, c. These angles can be evaluated as A~arccos a|b a|b j j : a|c a|c j j ð14Þ B and C can be evaluated in the same way. Evaluation of the exact writhe. Now we evaluate the exact writhe formula (Eq. (9)). The first integral is a double integral involving only the helix axis curve h. This integral can be evaluated by noticing that it equals the spherical area swept by the unit vector r 12 r 12 j j .
Similar to the above computation of Fuller writhe, we can break down this area for a polygonal line into spherical quadrangles [53]:  Here V ij is the area of spherical quadrangle with vertices defined by the four unit vectors. Note that V ii = 0 and V i,i+1 = 0, therefore these terms are neglected from the summation. The quadrangle area can be computed similarly using its spherical excess: A~arccos a|b a|b j j : a|d a|d j j ,B~arccos b|c b|c j j And sgn(V ij ) = sgn((r j,j+1 6r i,i+1 ) ? r ij ).
To evaluate the second integral, we first translate our closed curve such that the lower end of h is at the origin. Because writhe is a geometrical property, it remains constant to such a translation. We call the new helix axis curve and extension segments h9, l 1 9 and l 2 9. In this new coordinate system, the segment l 1 9 overlaps with the 2z axis, which simplifies the calculations. We now evaluate the integral: Note here we let r 1 to be the variable of outer integral and r 2 to be the variable of inner integral. We know that dr 2 dz~e z and r 2 = (0, 0, z). We also let r 1 = (a, b, c), where a, b, c can be any real number. For the inner integral in Eq. (17), Note this final expression is analogous to the Fuller writhe integral (Eq. (10)), therefore we can evaluate this discretized integral using the same algorithm. We can apply the same strategy to evaluate the third integral: Here h0 is the translated helix axis curve, such that the upper end of h0 is at the origin.
In the above paragraphs we demonstrated how to compute the Fuller writhe and the exact writhe for a polygonal open curve. The Fuller writhe involves computing a single sum, so the computational complexity is of O(N) (N is the number of base-pairs). However Fuller writhe is only correct modulo 4p, and it has been shown to give inaccurate results in low force and high torque situations ( [54,92] and results herein). On the other hand, the evaluation of the exact writhe involves computing a double sum, and is of O(N 2 ). Therefore it is currently difficult to perform linkconstrained simulations for long helices using the exact writhe formula, where the link is evaluated in every update. In the section below, we will show that the Fuller formula can be used in linkconstrained simulations as it gives correct answers in the force and link range used here. For simulations with low stretching force, however, the exact formula is needed to obtain accurate answer. Both writhe computation formulas are implemented in HelixMC. We note here that a sweep line algorithm may reduce the computational complexity to approximately O(N log(N)) [93], but this algorithm is not yet implemented in HelixMC. Such accelerations will likely be necessary for HelixMC to model high link scenarios in which the Fuller formula breaks down due to, e.g., plectoneme formation.

Twist calculation
The twist for a smooth ribbon can be computed as Here t is the tangent vector of the axis curve, and l is the normalized ribbon vector. Unlike writhe, twist is a local identity, well defined on a curve segment of arbitrary length. Therefore twist is well defined for a smooth open curve. In addition, twist is additive. For our polygonal ribbon, the overall twist of the ribbon equals the sum of the twists of all the line segments. As an example, consider a straight line segment parallel to z-axis of length L (Fig. 6C). The ribbon vector starts as l 0 , varies smoothly and ends as l 1 . Using the fact that the tangent vector t = e z and the ribbon vectors are perpendicular to t, Eq. (21) can be evaluated as Here we used the property that l6dl is parallel to e z . Geometrically, this integral is twice the area on unit circle swept by l throughout the integration. Therefore the twist of a straight line segment is just the angle (in radians) between the vectors l 0 and l 1 . This result is consistent with the conventional definition of twist parameter in a base-pair step. However, applying the above result for straight line segments to our polygonal ribbon is nontrivial, because here the ribbon vectors are not necessarily perpendicular to the straight line segments. A naïve strategy would be to simply sum the twist parameters of all base-pair steps in the helix to obtain the overall twist, but this sum turns out to be inconsistent with the ribbon twist considered in the Cȃlugȃreanu theorem. It thus cannot be added with writhe to produce a link that corresponds to the actual experimental observable of, e.g., bead rotation in a magnetic tweezers experiment. As pointed out by Britton and colleagues [56], the ribbon twist of dsDNA ('twist' discussed below refers to the ribbon twist, unless stated otherwise) is different from the conventional definition of twist parameter for a base-pair step, necessitating a new procedure to calculate twist for base-pair steps.
The main challenge in computing twist for the discrete chains of the nucleic acid helix is that the ribbon vector at each base pair, l i , is not in general, normal to the continuous axis curve traced by base pair centers r i , as is assumed in the mathematical treatment of ribbons. Our strategy therefore is to first define at each base pair a 'reference' ribbon vector b i that obeys this mathematical convention, and to compute a reference twist. We will then compute additional twist contributions by l i using its angle with b i . Fig. 6A illustrates the polygonal ribbon model. The choice of where t i21 and t i are unit vectors pointing into and out of r i , guarantees normality of b i to the axis curve. Then we can compute the reference twist based on the above result for straight line segments: Here N is the total number base-pairs in the model, Tw 1 and Tw N21 is the twist contribution of the first and the last base-pair steps (N21 base-pair steps in total), where b's are not defined. b i is the signed angle between the reference ribbon vector b i and b i+1 . Note that because both b i and b i+1 are orthogonal to (Fig. 6A, inset). The use of alternative reference ribbon vectors to compute the twist can be justified with the following thought experiment. Imagine holding the two ends of a continuous ribbon, and then change the ribbon vectors by rotating the ribbon in the middle. As long as the two ends stay fixed, such changes of ribbon vectors do not affect the overall number of turns of the ribbon (i.e. the link). In addition, the writhe stays constant because it only depends on the axis curve, which is unmodified in this process. By the Cȃlugȃreanu theorem, we can conclude that the twist, which equals the link minus writhe, remains unchanged. Therefore in a continuous ribbon we may modify any ribbon vector except the two ends without affecting the overall twist. However for a discretized ribbon (as in our BPLM), such modifications of ribbon vectors may change the twist by 2np, where n is an integer (Fig. 6D). In general, we have the following modulo congruence relation between the true twist and reference twist (see Eq. (7) for definition of modulo congruence): To address the modulo 2p ambiguity, we must take into account whether the original ribbon vectors l i sweep out additional turns around the axis curve relative to the reference ribbon vectors b i .
Here we calculate the local twist of each base-pair step as: Here a i is a signed angle between l i and b i ; T i is folded into the range [2p, p) upon the modulo 2p operation. For the terminal base-pair steps, we first attach virtual segments to both ends, pointing towards 2z and +z respectively, to obtain the corresponding b i , then Eq. (25) can be employed to compute T 1 and T N21 (illustrated in Fig. 6D). The overall twist can then be calculated by summing all the T i : As an additional consistency check, Eq. (26) satisfies Eq. (24), as shown below.
The factors a 1 and a N correspond to the twist contribution from two ends of the helix; all internal factors cancel. Our twist definition is similar to the definition proposed by Britton and colleagues [56]. The main difference is in the definition of a angle. The previous proposal defined the tangent vector at base-pair i ast While this definition is mathematically correct and gives results equivalent to our definition, the expression can become ill-defined when there is a sharp bend in the ribbon. Fig. 6E illustrates such cases. In the first example, the original ribbon vector l 1 is parallel to the tangent vectort t 1 , therefore the projection gives a null vector d 1 , making a ill-defined. In the second example, we take l 1 9 as the ribbon vector, then the projection gives d 1 9 parallel to b 1 , leading to a zero a even though l 1 9 and b 1 are quite distinct from each other. In both cases, our definition just sets a equals to the angle between l 1 and b 1 and the angle between l 1 9 and b 1 , leading to a well-defined result. While this type of sharp bend does not occur in natural dsDNA, in our system the line segment of the last base-pair step and the added virtual segment can form such sharp bends, making the previous definition unsuitable for HelixMC. The definition in Eq. (25) has two additional convenient properties. First, if the axis curve of the dsDNA/dsRNA is perfectly straight and pointing towards +z, and the base-pairs are all parallel to the xy-plane, the calculated ribbon twist equals to the sum of the base-step twist parameters [56]. Second, in our system setup, if the normal vector of the last base-pair aligns along +z, the computed link corresponds exactly to the bead rotation observed in single molecule tweezers experiments.

Curation of the base-pair step parameters
The multivariate Gaussian distributions for sampling are constructed using the base-pair step parameters from crystallographic models in the PDB. To ensure the quality of the data in the default parameter sets, we used models derived from data with resolutions better or equal to 2.8 Å . Protein-binding DNA/RNA structures were excluded from the dataset since protein binding may affect the deformability of the nucleic acids. We also tested several other selection schemes to estimate the systematic error, including using higher resolution cutoff (2.0 Å ) or including protein-binding structures. We then used the 3DNA software [81,82] to extract the base-pair step parameters for the canonical Watson-Crick basepairs (i.e. not including G-U wobble base-pairs and other noncanonical base-pairs). Parameter sets with twist #5u (due to Z-DNA conformations), with rise $5.5 Å (due to ligand intercalation), or with any value more than four standard deviations away from the mean were discarded as outliers. For the dsDNA datasets, we noticed that there were two major clusters in the data, corresponding to the A-form and B-form dsDNA (except for the 'DNA_2.8_all' dataset, where the protein binding rendered A-DNA and B-DNA inseparable by clustering). We used the k-means algorithm to separate the two clusters and only used the B-DNA parameters in sampling (Table 1, Supplementary Methods and Fig. S7). For Z-DNA, the dataset is composed of two distinct base-pair step distribution for the GC steps and the CG steps. The population distribution for the Z-DNA dataset is shown in Fig. S8. Detailed statistics and population distribution of the curated dataset are shown in Table 1, S13 and Fig. 7. Fig. S9 shows the correlation plots between each base-pair step parameter for the default dataset.
For each type of base-pair step (16 in total, e.g. 59-AT-39/59-AT-39, 59-CA-39/59-TG-39, etc.), a multivariate Gaussian was fitted based on the corresponding six base-pair step parameters, enabling sequence-dependent simulations. We note here that one can also categorize the base-pair parameters into 10 independent sequence-specific categories using the symmetry of the base-pair steps [40,80]. This symmetrization is not the default option in HelixMC; symmetrization gives minor changes in the predicted mechanical properties (Supplementary Methods and Table S2).
For simulations with random sequence, in each update we randomly picked a distribution from the 16 types of step parameters and drew samples from it. In this sampling scheme, we effectively averaged the 16 types of parameters, so all base-pair steps follow the same parameter distribution [40]. The distribution can be further approximated with a single multivariate Gaussian (Supplementary Results). The approximation leads to a reduced number of parameters in the model, and therefore facilitates the understanding the effect of each parameter on the observed mechanical properties. However, we note that this sampling scheme may lead to unrealistic base-pair step combinations (for example, 59-AT-39/59-AT-39 followed by 59-GC-39/59-GC-39), therefore the sequence of the RNA is not always well defined in each simulation snapshot. To justify that our sampling scheme indeed gave reasonable estimates of the mechanical properties of a random RNA, we also performed simulations with a single randomly generated RNA sequence (Table S14). The obtained mechanical properties using a single random sequence agreed within simulation error to our default random sequence simulation.
In addition, we observed that some of the population distributions in our dataset did not appear Gaussian (Fig. 7). To test the validity of the Gaussian approximation, we also tested a different sampling scheme, by randomly picking parameter sets existing in the database without assuming Gaussianity, and obtained nearly undistinguishable results (see Results). All the curated parameter sets and sampling schemes used in this work are available and further documented in the HelixMC package.

Software availability
The bottleneck steps of HelixMC have been optimized in C (using Cython); a typical single-point HelixMC calculation for a DNA/RNA helix of experimental length (few kilo-base-pairs) takes minutes to hours on a standard desktop computer (Table S15). HelixMC is coded in Python in an object-oriented fashion that allows easy modification and extension, is free and open-source (http://github.com/fcchou/helixmc), and enables fast and accurate predictions with available computational power.

Validation of the link calculations
We numerically tested the validity of the link calculations above by comparing simulated link values to the bead rotations (Fig. 8A). Here the ''bead rotation'' is defined as the angle between the y-axis of the global coordinate and the projection of the ribbon vector of the last base-pair to the xy-plane. This is equivalent to attaching a virtual bead along the ribbon vector of the last base-pair and observing its rotation, analogous to recent single-molecule tweezers experiments [17]. For better comparison, in Fig. 8A we folded the computed link into the range of [2p, p), and found that the experimentally observed bead-rotation indeed corresponds to the link. The match between the link and the bead rotation was close but not exact (RMSD of 4.5u), because the normal vector of the last base-pair did not point exactly to +z during the simulation; this discrepancy induces negligible error in computed helix mechanical properties.
As discussed above, the Fuller formula is faster but gives exact writhes only in certain conditions. The formula breaks down if the helix path fluctuates so that segments point away from the applied force (towards 2z). Fig. 8B shows a plot of exact writhe vs. Fuller writhe in a simulation of 3 kbp dsDNA at 0.1 pN stretching force. It is apparent that in this setting the Fuller formula is only correct modulo 4p (two turns; spacing between parallel lines). To test under which conditions the Fuller formula was accurate, we computed its RMSD error to exact writhe across simulations. For force-extension simulations, Fuller writhe is effectively exact if the force is larger than 0.4 pN for dsDNA and 1 pN for dsRNA (Fig. 8C). For link-constrained simulations, the Fuller formula holds in the current simulated link-range, but breaks down when the target link exceeds 640 turns for DNA and 620 turns for dsRNA (corresponding to supercoiling densities of 0.022 and 0.012; Fig. 8D).

Fitting model for simulated and experimental data
As with experimental measurements, the simulated data were summarized through fits to the elastic rod model, which assume that the total energy of the helix without external force and torque can be expressed using the above parameters by an integral along the helix axis curve s: Here L is the helix contour length, k B is the Boltzmann constant and T is the temperature. The constants are bending persistence length A, B = S/k B T the stretching stiffness (where S is the stretch modulus), torsional persistence length C, and D = g/k B T is the unit-less link-extension coupling (here we use the convention in ref.  [14], where g has units of pN?nm). The three quantities b, z and h describe the deformations per unit length of a short rod segment. b is the bending deformation that measures how the tangent vector changes along the rod, z is the extensional deformation that measures the change in the length of the segment, and h is the torsional deformation that determines how the each segment is rotated around the rod axis with respect to adjacent segment. The analytical equations used to fit experimental measurements, derived from this model, are compiled in the Supplementary Methods.      and therefore not shown. 1 sin a~L eff L axis , where L axis is the length of the axis curve, L eff is the effective helix contour length and a is the super-helical pitch angle. L eff~( 1z F =S) L, where F is the applied stretching force, S is the stretch modulus, and L is the helix contour length. See Supplementary Methods for more information. 2 The first value is the average parameter, followed by the corresponding Z-score.

Supporting Information
(DOC)   Table 1). 2 Halving or doubling the variance of 'shift' parameter in the covariance matrix. 3 Reverse the sign of the shift-slide covariance in the covariance matrix.
(DOC) Table S10 Standard deviations of parameters for random DNA, poly(A)/poly(T) and Z-DNA. 1 Z-DNA has a minimum repetitive unit of two base-pairs, therefore it has two distinct step parameter set (GC and CG). (DOC)