Ab initio predictions for 3D structure and stability of single- and double-stranded DNAs in ion solutions

The three-dimensional (3D) structure and stability of DNA are essential to understand/control their biological functions and aid the development of novel materials. In this work, we present a coarse-grained (CG) model for DNA based on the RNA CG model proposed by us, to predict 3D structures and stability for both dsDNA and ssDNA from the sequence. Combined with a Monte Carlo simulated annealing algorithm and CG force fields involving the sequence-dependent base-pairing/stacking interactions and an implicit electrostatic potential, the present model successfully folds 20 dsDNAs (≤52nt) and 20 ssDNAs (≤74nt) into the corresponding native-like structures just from their sequences, with an overall mean RMSD of 3.4Å from the experimental structures. For DNAs with various lengths and sequences, the present model can make reliable predictions on stability, e.g., for 27 dsDNAs with/without bulge/internal loops and 24 ssDNAs including pseudoknot, the mean deviation of predicted melting temperatures from the corresponding experimental data is only ~2.0°C. Furthermore, the model also quantificationally predicts the effects of monovalent or divalent ions on the structure stability of ssDNAs/dsDNAs.

Introduction DNA can adopt many structures beyond the right-handed B-form double-helices, which takes it far beyond being the molecule that stores and transmits genetic information in biological systems [1,2]. Some non-B-form DNAs within the human genes, such as hairpins, triplexes, Z-DNA, quadruplexes, and i-motifs, have been proposed to participate in several biologically important processes (e.g., regulation and evolution), leading to mutations, chromosomal translocations, deletions and amplifications in cancer and other human diseases [1][2][3][4]. Furthermore, self-assembled functional DNA structures have proven to be excellent materials for designing and implementing a variety of nanoscale structures and devices, including interlocked, walkers, tweezers, shuttles, logic circuits, and origami, which have promising applications ranging from photonic devices to drug delivery [5][6][7][8]. Since short double-and singlestranded DNA (dsDNA and ssDNA) structures (e.g., duplex, hairpins, pseudoknots, and junctions) are essential to build blocks for the construction of non-B-form DNAs and various nano-architectures, advancement in the knowledge of structures and key properties (e.g., thermodynamics and mechanics) for these DNAs will be helpful to understand and ultimately control their biological functions and guide the production and development of novel materials [7][8][9].
Although several experimental methods such as cryo-electron microscopy, X-ray crystallography, NMR spectroscopy, and other single-molecule techniques (e.g., optical/magnetic tweezers and atomic force microscopy) can be used to determine three-dimensional (3D) structures or elastic properties of DNAs [10][11][12][13][14][15], there are still full of challenges (e.g., timeconsuming and expensive) to experimentally provide insight into DNA folding/hybridization. Thus, the field of computer simulation is rapidly evolving to provide finer details on key features of DNA biophysics compared to experimental approaches [16][17][18][19]. For example, allatom molecular dynamics (MD) simulations based on force fields, such as CHARMM and AMBER, have been widely used to investigate dynamics, flexibility, mechanics, or form transition of dsDNA helices at the microscopic level [20][21][22][23][24]. However, due to the innumerable degrees of freedom, the MD simulations are limited to small DNAs and to short times even with an advanced-sampling approach and parallel tempering scheme [16,[24][25][26].
On the other hand, the simple continuum DNA models (e.g., worm-like chain model), which treat the double helix as a continuous elastic rod with bending and torsional stiffness, are commonly used to well describe mechanical behavior or elastic bending of dsDNA on long length-scales [27][28][29][30]. Correspondingly, the nearest-neighbor model can predict secondary structures and melting profiles (e.g., free energy and melting temperature) for ssDNA and dsDNA through the combination of free energy minimization, partition function calculations, and stochastic sampling [9,31]. However, these simple models are unable to provide any 3D structure information on DNAs.
Therefore, many coarse-grained (CG) DNA models, which represent DNA using a reduced number of interaction sites while striving to keep the important details, have been developed in recent years to model 3D structures or thermodynamic and structural properties of DNAs [32][33][34][35][36][37][38][39]. For example, by mapping each nucleotide into six to seven CG beads, the Martini model combined with MD simulations opens the way to perform large-scale modeling of complex biomolecular systems involving DNA, such as DNA-DNA and DNA-protein interactions [40,41]. Very recently, a three-bead CG model, MADna, was developed to reproduce the conformational and elastic features of dsDNA, including the persistence length, stretching/torsion moduli, and twist−stretch couplings [42]. However, since the two models need constraints (e.g., predefined elastic network or bonded interactions for paired bases) to maintain a double helix, they cannot be used to study DNA hybridization, melting, and hairpin formation [40][41][42].
Moreover, some other Go-like models, including 3SPN, oxDNA, and TIS, have been proposed to fill the gap [43][44][45][46][47][48][49][50]. The 3SPN model, which reduces the complexity of a nucleotide to three interactions sites (i.e., phosphate, sugar, and base), can successfully capture DNA denaturation/renaturation and provide a reasonable description of other thermomechanical and structural properties for DNAs (e.g., persistence length, bubble formation, major and minor groove widths, and local curvature) by involving in base-stacking and base-pairing interactions [43][44][45]. The oxDNA model uses three collinear sites and a vector normal to the base site to construct the angle-dependent potentials including coplanar base-stacking and linear hydrogen bonding interactions, which are parametrized to accurately describe basic structural, mechanical, and thermodynamic properties of ss/dsDNA [46][47][48]. More significantly, with fine-tuned structural parameters, the model can also treat large DNA nanostructures, such as DNA origami and nanotweezers [48,49]. The TIS-DNA is another robust three-interaction-site CG model, and using a set of nucleotide-specific stacking parameters obtained from thermodynamic properties of dimers, the model can reproduce the sequence-dependent mechanical, as well as thermodynamic properties of DNAs, covering the force-extension behavior and persistence lengths of poly(dA)/poly(dT) chains, elasticity of dsDNA and melting temperatures of hairpins [50][51][52]. The use of Go-like interactions (e.g., non-bonded potentials to penalize deviations from a reference structure) can effectively restrict the range of conformations that may be sampled by the CG model, and simultaneously, it also limits the possibility of the model on structure prediction from the sequence.
Recently, the 3dRNA/DNA web server was further developed based on the 3dRNA to build three-dimensional (3D) structures of RNA and DNA from template segments with very high accuracy using sequence and secondary structure information [53][54][55]. Similarly, a pipeline presented by Jeddi and Saiz can also be used to predict DNA hairpins by integrating the existed 2D and 3D structural tools (e.g., Mfold, Assemble, and MD) [56]. However, the two structure prediction methods are dependent on secondary structures, while there is still a problem to exactly predict secondary structures of DNAs [31]. Fortunately, a minimal physics-based CG model of nucleic acids named NARES-2P was proposed to fold dsDNA from separate strands without any Go-like potentials and secondary structure information. Although the model was constructed using the bottom-up strategy, where each component of the energy function was fitted separately to the respective potential of mean force obtained from all-atom potentialenergy surfaces, it can reproduce many properties of double-helix B-DNA, such as duplex formation, melting temperatures, and mechanical stability [57][58][59]. Contrary to the oversimplification of NARES-2P (i.e., two sites per nucleotide), the HiRE-RNA is an empirical CG model for RNA and DNA, whose resolution is high enough (i.e., six or seven beads for each nucleotide) to preserve many important geometric details, such as base pairing and stacking. Without imposing preset pairings for the nucleotides, the HiRE-RNA can investigate both dynamical and thermodynamic aspects of dsDNA assemblies, as well as the effect of sequences on the melting curves of the duplexes [60,61]. Despite the advances, the parameters of the two models may need further validation for quantifying thermodynamic and 3D structure to accord with experiments, especially for ssDNA.
In addition, due to the polyanionic nature of DNAs, metal ions (e.g., Na + and Mg 2+ ) in solutions can play an essential role in DNA folding and dynamics [12][13][14][62][63][64][65]. Although several of the existing models such as 3SPN, oxDNA, TIS, and NARES-2P have taken the electrostatic interactions into account using the Debye-Huckel approximation or mean-field multipole-multipole potentials and reproduced some monovalent salt-dependent structural properties (e.g., persistence length, torsional stiffness or melting temperatures) of DNAs [45,48,50,58], quantitatively predicting the 3D structure and thermodynamic stability for DNA including both ssDNA and dsDNA in ion solutions (especially divalent ions) from the sequence is still an unresolved problem. Recently, we have proposed a three-bead CG model to simulate RNA folding from the sequence, and with an implicit electrostatic potential, the model can make reliable predictions on 3D structures and stability for RNA hairpins, pseudoknots, and kissing complexes in ion solutions [66][67][68][69][70]. However, due to the differences in geometry, base stacking strength, and flexibility between DNA and RNA, the present model cannot be directly used to simulate DNA folding.
In this work, we further developed an ab initio CG model of DNA to predict the 3D structure, stability, and salt effect for both dsDNA and ssDNA. First, the bonded and nonbonded potentials were parameterized based on the statistical analysis of known DNA 3D structures, as well as experimental thermodynamic parameters and melting data. Afterward, the model was validated through 3D structure and stability predictions for DNAs including double helices, hairpins, and pseudoknots with different lengths and sequences, as compared with the extensive experimental data. Furthermore, we also showed that the effects of monovalent and divalent ions on DNA structure stability predicted by the present model are in accordance with the corresponding experiments.

CG structure representation for DNAs
To be consistent with our previous RNA CG model [66], each nucleotide in DNA is also simplified into three beads: P, C, and N, to represent the phosphate group, sugar ring, and base plane, respectively. For simplicity, the three beads are placed at the positions of existing atoms (i.e., P, C4', and N1 for pyrimidine or N9 for purine) (Fig 1) and are treated as van der Waals spheres with the radii of 1.9Å, 1.7Å and 2.2 Å, respectively [66,70]. One unit negative charge (-e) is placed on the center of P bead to describe the highly charged nature of DNA.

Energy functions
The total energy U in the present DNA CG model is composed of the following eight components: The first three terms are bonded potentials for virtual bonds U b , bond angles U a, and dihedrals U d , respectively, which are used to mimic the connectivity and local geometry of DNA chains. The function forms of these terms are listed in S1 Text, which can also be found elsewhere [44,46,50,66]. The remaining terms of Eq 1 describe various pairwise, nonbonded interactions. The U exc represents the excluded volume interaction between the CG beads and it is modeled by a purely repulsive Lennard-Jones potential. The U bp in Eq 1 is an orientation-dependent base-pairing interaction for the possible canonical Watson-Crick base pairs (G-C and A-T). The formula of U bp is similar to the form of hydrogen-bonding interaction used in the TIS model [50], and the backbone dihedrals are replaced by two simpler distances between CG beads in pairing nucleotides to describe the orientation of hydrogen-bonding interactions; see Eq S6 in S1 Text. The sequence-dependence base-stacking interaction U bs between two nearest neighbour base pairs is given by where σ st is the optimum distance of two neighbour bases in the known DNA helix structures. G i,i+1,j-1,j in Eq 2 is the strength of base-stacking energy, and it can be calculated by G i,i+1,j−1,j = ΔH−T(ΔS−ΔS c ). Here, T is the absolute temperature in Kelvin, ΔH and ΔS are the DNA thermodynamic parameters derived from experiments [9,71], and ΔS c is the conformational entropy change that is naturally included in the Monte Carlo (MC) simulations, during the formation of one base pair stacking; see more details in Eq S7 in S1 Text as well as the previous works [66,70,72,73]. In addition, the coaxial-stacking interaction U cs between two discontinuous neighbour helices is also taken into account by the present model through calculating the stacking potential of the interfaced base-pairs, and the expression can be found in Eq S10 in S1 Text.
The last term U el in Eq 1 is used to calculate the electrostatic interactions between phosphates groups (e.g., i-th and j-th P beads), and it is given by where e is the elementary charge, r ij is the distance between i-and j-th P beads, and N P is the total number of P beads in a DNA. l D is Debye length, which defines the ionic screening at different solution ionic strengths. ε 0 and ε(T) are the permittivity of vacuum and an effective temperature-dependent dielectric constant, respectively [50,66,67]. Q is the reduced charge fraction derived based on Manning's counterion condensation theory and the tightly bound ion model [74][75][76]; see Eq S11 in S1 Text. Due to the inclusion of the U el , the present model can be used to study DNA folding in pure (e.g., Na + ) as well as mixed (e.g., Na + /Mg 2+ ) ion solutions.

Parametrization
The initial parameters of bonded potentials (i.e., U b , U a, and U d in Eq 1) were derived from the Boltzmann inversion of the corresponding CG atomistic distribution functions, obtained by the statistical analysis on experimental DNA structures in the Protein Data Bank (PDB) (http://www.rcsb.org/pdb/-home/home.do) (Fig A in S1 Text). First, 752 pure DNA structures (10nt-200nt) with resolution <3.5Å were collected from the PDB, and then, the DNAs with triplex, quadruple helices or unnatural structures were further removed. In addition, excluding the DNA structures with sequence identity >80% and the ssDNAs/dsDNAs used for model validation on 3D structure prediction, there were only 138 DNA structures can be used to parameterize the energy potentials, and the PDB codes of these DNAs are listed in Table A in S1 Text. Since the known DNA structures are generally double helices, the initial parameters from these structures could not be reasonable to describe DNA chains during folding processes. In our previous RNA model, two sets of parameters (Para helical and Para nonhelical ) were calculated from stems and loops in experimental structures, respectively [66,67], and the Paranonhelical ones were used to successfully describe the folding of an RNA from a free chain. However, due to the limitation of the number of loop regions in known DNA structures, obtaining suitable parameters for DNA-free chains directly from these structures is unrealistic. Although we also did MD simulations for unstructured ssDNA and tried to extract the bonded parameters from the conformations (Fig B in S1 Text), because of some differences in optimum values of several angles between experimental and MD simulated structures, we gave them up. Eventually, based on the distributions of bond length/angle/dihedral for nonhelical parts in RNA structures are just slightly broader than that of helical parts [66], we simply set the strengths of DNA bonded potentials in Para nonhelical as one-half of that in Para helical . Note that, the Para nonhelical is used in the folding process, and the Para helical is only used for stems during folded structure refinement. Whereafter, the initial parameters were further optimized through the comparisons between the simulated and experimental bond length/angle distributions [34,77], and in this process, there are only two dsDNAs (PDB codes: 1agh, 3bse) and two ssDNAs (PDB codes: 1ac7, 1jve) were used. For nonbonded potentials, the geometric parameters in base-pairing/stacking functions were obtained from the known structures; see Fig C in S1 Text. The strength of base-stacking was estimated from the combination of the experimental thermodynamics parameters and MC simulations; see Eqs S7-S9 and Fig D in S1 Text. The strength of base-pairing (i.e., ε bp in Eq S6 in S1 Text) was determined by comparing the predicted melting temperatures (T m 's) of four ss-/dsDNAs with corresponding experiments. That is, for two ssDNA hairpins (sequences: GCGCTTTTTGCGC and GGAGCTTTTTGCTGC; ion condition: 1M NaCl; see Table 1) and two dsDNAs (sequences: GCTAGC/GCTAGC and GGGACC/GGTCCC; strand concentration: 0.1mM and 0.4mM, respectively; ion condition: 1M NaCl; see Table 2), we used the present model to predict their T m 's through continuously adjusting the ε bp until the agreement between predicted and experimental data is satisfactory.
The detailed descriptions, as well as the parameters of all the potentials in Eq 1, can be found in S1 Text.

Simulation procedure
During DNA folding from sequence without any preset constraints, it is easy to fall into a metastable state with local minimum energy. To effectively avoid that, the MC simulated annealing algorithm, whose capacity has been proved in protein/RNA folding, was used to sample conformations for ssDNA or dsDNA [66,78,79]. For each DNA, a random chain configuration is generated from its sequence, and for dsDNA, the two chains are separately placed in a cubic box, the size of which is determined by the concentration of a single strand. Afterward, the simulation of a DNA system with a given monovalent/divalent ion condition is performed from a high temperature (e.g., 120˚C) to the target temperature (e.g., room/body temperature). At each temperature, conformational changes are accomplished via the translation and pivot moves, which have been demonstrated to be rather efficient in sampling conformations of polymers [80,81], and the changes are accepted or rejected according to the standard Metropolis algorithm [66,70]. The equilibrium conformations at different temperatures during the cooling process are used to analyze the stability of the DNA. In structure prediction, the last conformation at the target temperature is taken as the initial predicted structure, which can be further refined to better capture the geometry of helical parts by introducing the bonded parameters of Para helical for consecutive base-pairing regions. After structure refinement, an ensemble of structures would be obtained, and the mean RMSD (the averaged value over the whole structure ensemble) and minimal RMSD (corresponding to the structure closest to the native one) calculated over CG beads from the corresponding atoms in the native PDB structure is used to evaluate the reliability of the present model on DNA 3D structure prediction.

Calculation of melting temperature
At each temperature, the fractions of folded (F, consistent secondary structures with predicted at lowest temperature) and unfolded (U, no more than one base pair) states could be fitted to a two-state model through the following equations [9,66]: where T m1 and T Here, f I is the fraction of the number of denatured base pairs when the fraction for the I state is maximum. And then, the df/dT (the first derivative of f with respect to temperature) profile can be calculated to compare with the corresponding experimental data. It should be noted that for simple hairpins and short duplexes used in this work, the I state almost never occurs, and the f I in Eq 6 could be set to 0, which means that f U (T) is approximately equal to 1−f F (T) and only one T m can be obtained.
To improve the simulation efficiency for dsDNA with low strand concentrations c s (e.g., the derivation of which can be found in S1 Text.

Results
Based on the parameterized implicit-solvent/salt energy function and the MC simulated annealing algorithm, the present CG model can be used to predict 3D structures for dsDNA as well as ssDNA at different ion conditions and temperatures from the sequence. In this section, we tested the present model on the 3D structure and stability predictions for extensive DNAs with various lengths/sequences. As compared with the experimental structures and thermodynamics data, the present model can make overall reliable predictions.
DNA 3D structure prediction from sequence For dsDNAs. As described in the section of "Material and methods", for each dsDNA, two random chains were generated from its sequence (e.g., structure A in Fig 2A), which were further randomly placed in a cubic box, ensuring that there is no overlap. To guarantee no significant effect of the box on 3D structures, the strand concentration was set as 1mM (i.e., box side length of 149 Å) for short dsDNA (<10bp) and 0.1mM for longer ones. Due to the lack of the ion conditions for the experimental structures determined by X-ray crystallography, for simplicity, we only predicted the 3D structures for all DNAs at high ion concentrations (e.g., 1M NaCl), regardless of possible ion effects. As shown in Fig 2A, for a dsDNA with a five-adenine bulge loop (PDB: 1qsk; 29nt, 12bp), the energy of the system reduces with the formation of base pairs as the temperature is gradually decreased from 120˚C to 25˚C, and the initial random chain folds into its native-like double-stranded structures (e.g., structure C in Fig 2A). Following that, another MC simulation (e.g., 1×10 5 steps) is performed at target temperature based on the final structure predicted by the preceding annealing process, and the two sets of bonded potential parameters Para nonhelical and Para helical are employed respectively for the single-strands/loops and base-pairing regions to better capture the geometry of the helical part. As shown in the inset of the bottom panel of Fig 2A, the mean and minimum RMSDs of the dsDNA between predicted structures and its native structure are~3.2Å and~1.8Å, respectively, and the corresponding predicted 3D structures are as shown in Fig 3A. According to the above process, we employed the present model to predict the 3D structures of 20 dsDNAs (18nt-52nt) including helix with bulge loops, and the detailed descriptions (e.g., sequence, length, and structure feature) of these dsDNAs are listed in Table C in S1 Text. For the 20 dsDNAs, the overall mean and minimum RMSD values are~3.2 Å and~1.9 Å, respectively; see Fig 3A and Table C in S1 Text, which suggest that the present model can make reliable predictions for 3D structures of dsDNA just from the sequence, despite a certain deviation (especially at the two ends) between the predicted and experimental structures for large dsDNA (e.g., PDB: 1mnm and 5t1j). Fig 3A also shows the predicted 3D structures (ballstick) with the mean and minimum RMSDs and the experimental structures (cartoon) for three typical dsDNAs with different lengths and sequences, intuitively indicating the ability of the model.
For ssDNAs. Compared with most of the existing models, the present model cannot only predict the double helix structure of dsDNA, but it can also make a prediction on the 3D structure for more flexible ssDNA. Similarly, a random chain generated from one ssDNA sequence can fold into native-like structures with temperature dropping; see Fig 2B for an example of a DNA hairpin (PDB: 1jve; 27nt, 12bp), which could primarily benefit from the use of the soft parameters (Para nonhelical ) of bonded potentials and sequence-dependent base-stacking interactions in the present model. As shown in Fig 3D, for 20 ssDNAs (7nt-74nt) used in this work including hairpins with bulge/internal loops (Table D in S1 Text), the overall mean (minimum) RMSD between the predicted and experimental structures is~3.5Å (~2.0Å), which strongly suggested that the present model can successfully predict 3D structures for simple ssDNA.
Since the structures of the largest hairpin (i.e., 6x68_2) from the piggyBac DNA transposon (PDB: 6x68, a synaptic protein-DNA complex) has a significant bending possibly influenced by protein [84], our predictions without regard to protein has a certain deviation (mean RMSDs of 5.6Å) from the experimental structure; see Fig 3D. It is worth noting that beyond DNA hairpins, we also tried to predict the 3D structure for a DNA three-way junction using the present model. As shown in Fig E in S1 Text, the structures (two hairpins at two ends) predicted from the sequence are pretty inconsistent with experimental ones. To find why, we further performed a MC simulation using the present model for the ssDNA starting from its PDB structure, and found that there is no significant difference in energies between predicted and simulated conformations (Fig E in S1 Text), which suggests that some tertiary interactions including the noncanonical base-pairing and base-backbone hydrogen bonding [85] and a more efficient algorithm (e.g., replica-exchanged MC) should be further taken into account in the model.
Comparisons with other models. To further examine the ability of the model on predicting 3D structures of DNAs (ssDNA and dsDNA), we also made comparisons with available results from the existing models. First, we employed the 3dRNA/DNA web server (http://biophy.hust. edu.cn/new/3dRNA/create), which is an automatic, fast, and high-accuracy RNA and DNA tertiary structure prediction method [53][54][55], to predict 3D structures for all DNAs used in this work using the default options (e.g., Procedure: best; Loop Building Method: Bi-residue; # of Predictions: 5) and experimental secondary structures, and calculated the mean RMSD of returned conformations for each DNA over the atoms of P, C4' and N1/N9 from the corresponding atoms in the experimental structures. As shown in Fig 3, for 20 dsDNAs, the overall mean RMSD (~3.2 Å) from the present model is not worse than that (~3.3 Å) from the 3dRNA/DNA, and for 20 ssDNAs, our prediction (overall mean RMSD:~3.5 Å) is slightly smaller than predicted result (~4.0 Å) from the 3dRNA/DNA. Furthermore, we also made comparisons with the predictions from Refs. 56 and 59. Scheraga et al. also proposed a physics-based rigid-body CG model (3-bead) of DNA, and used it to successfully fold 3 dsDNAs (PDBs: 1bna, 3bse, and 2jyk) from complementary strands with only weak constraints between them [59]. The all-bead RMSDs of the three lowest-energy predicted structures with respect to experimental references are 2.1Å, 3.1Å, and 4.2Å, respectively, which are close to the mean RMSDs (2.2Å, 2.6Å, and 4.8Å, respectively) predicted from the present model (Fig 3A). Jeddi and Saiz presented a pipeline that integrates sequentially building ssDNA secondary structure from Mfold, constructing equivalent 3D ssRNA models by Assemble 2, transforming the 3D ssRNA models into ssDNA 3D structures, and refining the resulting ssDNA 3D structures through MD simulations [56]. As shown in Fig 3D, for 15 ssDNA hairpins, the average RMSD (over the sugar-phosphate backbone) for the best structures predicted by the pipeline is~3.7Å, a visibly larger value than the overall mean/minimum RMSD (~3.2Å/~2.2Å) from our predictions. Therefore, the comparisons with the other models fully show that the present model can successfully fold simple dsDNA/ssDNA from the sequence without the help of any secondary structure information.

Stability of various DNAs
Beyond 3D structure predictions, the present model can also be used to predict the thermal stability for dsDNA and ssDNA in ion solutions. In order to verify the effect of the model, we further used it to predict the melt temperatures for extensive DNAs.
For dsDNA with various lengths/sequences. The melting temperature (T m ) of each dsDNA can be calculated by the present model based on 3D structures predicted at different temperatures; see Fig 4A and the section "Material and methods". For example, for the sequence (GGACGTCC) 2 at 1M [Na + ], the melting curve of the dsDNA with a high strand concentration of 1 mM was predicted according to the fractions of unfolded state at different temperatures (Eqs 5-7), and the melting curve, as well as the T m of the dsDNA at low experimental strand concentration (0.1 mM), can be obtained through Eq 8; see Fig 4A and 4B. The predicted T m of the sample sequence at c s = 0.1mM is~56.0˚C, which is only 0.9˚C higher than the corresponding experimental value (~55.1˚C) [71]. We further performed simulations for the dsDNA at c s = 0.1mM to directly predict its T m at experimental strand concentration, and found that there is no significant difference between two melting temperatures, while the melting curve inferred from c s h is slightly broader than that predicted at c s (Fig 4B). In addition, as shown in Fig 4C, the predicted T m 's for three different dsDNAs at different strand concentrations are also in good accordance with the experiments [71], proving that it is feasible to infer the T m at low c s from the high ones (c s h ) [82,83]. To examine the sequence effect, 27 dsDNAs (8-36nt) with different sequences have been studied with the present model. The sequences, strand concentrations, and the predicted/ experimental melting temperatures are listed in Table 1 [71,[86][87][88][89]. Here, all dsDNAs are assumed at 1M [Na + ] to make comparisons with corresponding experimental data. As shown in Table 1, the T m values of extensive dsDNAs from the present model are very close to the experimental measurements with a mean deviation of 1.5˚C and maximal deviations < 3.0˚C, which indicates that the present model with the sequence-dependent base-stacking/pairing potential can make successful predictions on the stability for dsDNA of extensive sequences/ lengths. Furthermore, due to the involvement of coaxial stacking potential, the present model can also provide reliable stability for dsDNA with bulge/internal loops, For example, for 4 dsDNAs with bulge loops and 5 dsDNAs with internal loops, the mean deviation of predicted T m 's from the experiments is only 1.8˚C; see Table 1, and the predicted melting curves for the dsDNAs with internal loops of different lengths are also in line with the experiments [89] ( Fig  4D). Fig 4D also shows that the predicted curves of dsDNA with large internal loops (e.g., N = 6) are slightly broader than the experiments, and the possible reason could be that the way of melting temperature calculation used in the present model ignores the difference between melting curves at low and high strand concentrations; see Fig 4D. For ssDNA with various lengths/sequences. Beyond the dsDNA, the stability of ssDNA can also be captured by the present model [36,[90][91][92][93][94][95][96]. As shown in Fig 5, for DNA hairpins (GCGC(T) N GCGC) with different loop lengths (N = 3-9), the predicted thermal unfolding curves at 0.1M [Na + ] agree reasonably with the experiments, despite that the predicted T m (~78˚C) for the hairpin with a small loop (e.g., N = 3) is rather lower than the experimental value (~80.7˚C), while it is a little higher (~58.1˚C vs~56.2˚C) for large hairpin loops (e.g., PLOS COMPUTATIONAL BIOLOGY N = 9) [92]. Moreover, 24 ssDNAs including pseudoknot are used to verify the ability of the present model for sequence effect on stability; see Table 2. In order to compare with experiments [36,[90][91][92][93][94][95][96], all these predictions are at corresponding experimental ion conditions. As shown in Table 2, for 24 ssDNAs with different sequences and lengths (11-34nt), the mean/ maximal deviation of T m between predicted and experimental is~2.1˚C/~3.8˚C, which suggests that the effect of sequence on ssDNA stability can also be well described by the present model. It is worth noting that due to the lack of stacking interactions between unpaired bases, the present model cannot distinguish the stability of DNAs with same stems but different loop sequences (e.g., GCGC(T) 5 GCGC vs GCGC(A) 5 GCGC), and yet the stability of them generally differ somewhat [9,91,95].
Specifically, we made additional predictions for the stability of two more complex ssDNAs: a pseudoknot and a chain with two hairpins at two ends; see Fig 6A. As shown in Table 2 and Fig 6, for the ssDNA with two hairpins, two melting temperatures (T m1 and T m2 ) of the corresponding transitions are successfully predicted by the present model, with the deviations of 2.1˚C and~1.1˚C from experimental data, respectively. Since the hairpin at 3' end contains fewer G-C pairs than the other (Fig 6A), it melts at a significantly low temperature in comparison to the 5' end hairpin [92]. For the DNA pseudoknot at 0.1M [Na + ], the predicted T m1 and T m2 are~48.8˚C and~72.0˚C, respectively, which also agree well with the experimental data (~52.6˚C and~70.7˚C) [90]; see Table 2, and the comparison between predicted and experimental thermal unfolding curves can be found in Fig 6C. In the predicted curve, the first transition that is from folded pseudoknot state to intermediate hairpin state is more significant than that form experiment. One possible reason is that noncanonical interactions such as triple base interactions between loops and stems and self-stacking in loop nucleotides, which are common in RNA/DNA pseudoknots [69,90], are neglected by the present model, leading to a relatively simple unfolding energy surface. Even so, the comparison with the experiment still suggests that the present model can be reliable in predicting thermal stability for DNA pseudoknots in monovalent ion solutions, and it is noted that the present model can also provide 3D structures for the pseudoknot at different temperatures from the sequences.

Monovalent/Divalent ion effects on stability of dsDNA/ssDNA
Due to the high density of negative charges on the backbone, DNA stability is sensitive to the ionic condition of the solution, while the effect of ions, especially divalent ions (e.g., Mg 2+ ), is generally ignored in the existing DNA CG models [43][44][45][46][47][48][49][50]. Here, we employed the present model to examine the monovalent/divalent ion effects on the thermal stability of dsDNA and ssDNA.
Monovalent ion effect. For each of the three dsDNAs with different lengths (6bp, 10bp, and 15bp), we performed simulations over a broad range of monovalent ion concentrations ([Na + ]: 0.01M-1.0M), and calculated the melting temperatures at different [Na + ]'s. As shown in Fig 7A, the increase of [Na + ] enhances the dsDNA folding stability due to the stronger ion neutralization [62,63], and the predicted melting temperatures for the three dsDNAs are well in accordance with the experiment results [87,97], with the mean deviation of~1.4˚C. Fig 7A  also shows that the [Na + ]-dependence of T m is stronger for longer dsDNA, which could be caused by the larger buildup of negative charges during base pair formation of longer dsDNA [63,76].
Although Table 2 has indicated that the present model can make reliable predictions for ssDNA stability at various [Na + ]'s, we further used a simple DNA hairpin (GCGC(T) N GCGC) with different loop lengths (N = 5, 7, and 9) to test monovalent ion effect on stability in the present model. As shown in Fig 8A, for the hairpin with small loops (e.g., N = 5 and 7), the difference of predicted T m from the experiments over a wide range of [Na + ]'s is very small (e.g., mean/maximal deviation of~1.5˚C/~1.0˚C), and for the loop length of 9, our predictions are slightly larger than the experimental data only at high [Na + ]'s; e.g.,~4.0˚C higher at~0.1M [Na + ] [90]. The results on the stability of ssDNA and dsDNA in monovalent ion solutions reveal that it is a very effective way of involving the electrostatic interaction for DNA in the present model through the combination of the Debye-Huckel approximation and the concept of counterion condensation, which has also been validated by the TIS model [50,51].
Divalent ion effect. Remarkably, one important feature of the present model is that combining the counterion condensation theory and the results from the TBI model; see Eq 3, it can also be used to simulate DNA folding in mixed monovalent/divalent ion solutions. For one Furthermore, the competition between Na + and Mg 2+ on DNA stability can also be captured by the present model. For example, for dsDNA at 0.012M [Na + ] (Fig 7B), when [Mg 2+ ] is very low (e.g., <0.3mM), Na + dominates the stability of the dsDNA, while the increase of [Mg 2+ ] enhances the stability significantly. This is because the bindings of Na + and Mg 2+ are generally anti-cooperative and Mg 2+ -binding is more efficient in stabilizing DNA structures [63,75]. Naturally, as [Na + ] increases, the negative charge on DNA is strongly neutralized, and consequently, the effect of Mg 2+ appears weak. In particular, as shown in Fig 7B, there is a significant deviation between predicted and experimental T m 's for the dsDNA at 1M [Na + ] and various [Mg 2+ ]'s. The possible reason could be that when the ion concentration is high enough (e.g., >1M [Na + ]), the effect of electrostatic interaction on DNA stability is quite negligible, and the competition between Na + and Mg 2+ could be dominated by the entropy changes of ions, which is difficult to be precisely described by the implicit ion model used in the present model [62,67,76].

Discussion
In this work, we have proposed a novel three-bead CG model to predict 3D structure and stability for both ssDNA and dsDNA in ion solutions only from the sequence. As compared with the extensive experiments, we have demonstrated that, (1) The present model can successfully predict the native-like 3D structures for ssDNAs and dsDNAs with an overall mean (minimum) RMSD of~3.4Å (~1.9Å) from corresponding experimental structures, and the overall prediction accuracy of the present model is slightly higher than the existing models; (2) The present model can make reliable predictions on stability for dsDNAs with/without bulge loops and ssDNAs including pseudoknots, and for 51 DNAs with various lengths and sequences, the predicted melting temperatures are in good accordance with extensive experiment data (i.e., mean deviation of~2.0˚C); (3) The present model with implicit electrostatic potential can also reproduce the stability for ssDNAs/dsDNAs at extensive monovalent or mixed monovalent/ divalent ion conditions, with the predicted melting temperatures consistent with the available experiments.
Nonetheless, the present model has several limitations that should be overcome in future model development. For example, the present model failed to predict native-like structures for more complex DNAs such as that with triplexes, quadruplexes or n-way junction and cannot distinguish the stability for DNAs with different loop sequences, which suggest that possible noncanonical interactions (e.g., noncanonical base-pairing, base triple interactions between loops and stems, self-stacking in loop nucleotides and special hydrogen bonds involving phosphates and sugars) should be further taken into account [2,50,85]. Furthermore, a more efficient sampling algorithm such as replica-exchanged MC or MC with umbrella sampling, as well as suitable structure constraints should be introduced to the model for assembly of large DNAs (e.g., nano-architectures) [46][47][48][49]50], and accordingly, an accurate score function like statistical potential used for RNA and protein could be required to evaluate predicted DNA candidate structures [98][99][100][101][102][103]. In addition, the 3D structure predicted by the present model is at the CG level, and it is still necessary to reconstruct all-atomistic structures based on the CG structures for further practical applications. After these further developments, a user-friendly web server would be further freely available, allowing users to predict 3D structure and stability for DNAs in ion solutions from sequence or given constraints.