Figures
Abstract
Understanding the three-dimensional (3D) structure and stability of DNA is essential for elucidating its biological functions and advancing structure-based drug design. Here, we present an improved coarse-grained (CG) model for ab initio prediction of DNA folding, integrating a refined electrostatic potential, replica-exchange Monte Carlo simulations, and weighted histogram analysis. The model accurately predicts the 3D structures of DNA with multi-way junctions (e.g., achieving a mean RMSD of ~8.8 Å for top-ranked structures across four DNAs with three- or four-way junctions) from sequence, outperforming existing fragment-assembly and AI-based approaches. The model also reproduces the thermal stability of junctions across diverse sequences and lengths, with predicted melting temperatures deviating by less than 5 °C from experimental values, under both monovalent (Na⁺) and divalent (Mg2⁺) ionic conditions. Furthermore, analysis of the thermal unfolding pathways reveals that the overall stability of multi-way junctions is primarily determined by the relative free energies of key intermediate states. These results provide a robust framework for predicting complex DNA architectures and offer mechanistic insights into DNA folding and function.
Author summary
Beyond the familiar double helix, DNA can fold into complex three-dimensional (3D) structures such as multi-way junctions that are essential in biological processes and widely used in nanotechnology and drug design. Predicting how these structures form and how stable they are, especially under salt conditions, remains a major challenge. In this study, we introduce a coarse-grained model that can accurately predict both the 3D structures and thermal stability of three- and four-way DNA junctions directly from their sequence. By integrating a refined treatment of electrostatic interactions and advanced sampling techniques (e.g., REMC and WHAM), the model reproduces experimental 3D structures and melting temperatures across a wide range of sequences and salt conditions including monovalent (Na⁺) and divalent (Mg2⁺) ions. We also find that the stability of these DNA junctions is governed by the relative energies of intermediate folding states. This work provides a practical tool for researchers to model complex DNA architectures and deepens our understanding of how DNA sequence and ions together control DNA structure and function.
Citation: Wang X, Shi Y-Z (2025) 3D structure and stability prediction of DNA with multi-way junctions in ionic solutions. PLoS Comput Biol 21(8): e1013346. https://doi.org/10.1371/journal.pcbi.1013346
Editor: Amar Singh, KU: The University of Kansas, UNITED STATES OF AMERICA
Received: February 14, 2025; Accepted: July 21, 2025; Published: August 18, 2025
Copyright: © 2025 Wang, Shi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant computational data are within the paper and its Supporting information files. Experimental data are publicly available from published papers (DOI: https://doi.org/10.1039/c7cp08329g) cited in the work. The package of the present model is freely available at https://github.com/RNA-folding-lab/DNAfold2.
Funding: This work was supported by grants from the National Science Foundation of China (No. 11605125 to YZS), the China Scholarship Council (No. 202208420104 to YZS), and the Guizhou Medical University High-Level Talent Scientific Research Startup Fund (No. 26242020163 to XW). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
DNA is a fundamental macromolecule in living organisms, essential for the storage and transmission of genetic information [1], regulation of protein synthesis [2], and control of gene expression [3]. Beyond its classical B-form double helix, DNA can adopt a variety of complex 3D structures, such as hairpins, junctions, quadruplexes, and other non-canonical forms, which are increasingly recognized as functional elements involved in gene expression regulation, genome stability, and epigenetic control [4–7]. These dynamic conformations are not only critical for biological processes, e.g., facilitating enhancer-promoter communication or transcription factor binding through structure folding, but also form the structural basis for DNA-based nanotechnology [8]. Among them, DNA with multi-way junctions represents a higher level of structural complexity compared to linear or duplex DNA [9], and is thought to contribute to molecular recognition, signaling, and the assembly of nucleic acid-based nanodevices [10]. Therefore, gaining deeper insights into the 3D structure and stability of such junction-containing DNAs is essential for both understanding their biological relevance and advancing DNA-based material design.
The DNA 3D structures can be experimentally resolved through experimental techniques such as X-ray crystallography, nuclear magnetic resonance spectroscopy, and cryo-electron microscopy [11,12]. However, due to the high cost and technical difficulty of these methods, the number of determined DNA 3D structures deposited in the Protein Data Bank (PDB) remains limited [11–14], especially in contrast to the vast number of DNA sequences available in GenBank [15,16]. To complement experimental approaches, several theoretical and computational methods have been developed to predict/model DNA 3D structures [14,17]. These methods can be classified into three categories: deep learning-based, template-based, and physics-based approaches.
Deep learning-based approaches have achieved remarkable success in protein structure prediction and are increasingly being applied to nucleic acids [18–22]. For example, AlphaFold3 [18,19] has recently demonstrated the ability to model nucleic acid structures by learning from large datasets encompassing proteins, RNA, and DNA. These models use neural network architectures to directly infer structural patterns from sequence data, enabling rapid and scalable predictions. However, their performance on diverse DNA/RNA topologies remains limited due to the relatively sparse and biased training data for nucleic acids (e.g., dominated by canonical double-helical structures), compared to the extensive and diverse datasets available for proteins [23,24]. On the contrary, template-based fragment assembly methods offer a flexible framework for constructing 3D structures of DNA with arbitrary topologies including multi-way junctions by assembling known structural fragments based on the secondary structure. A representative example is the 3dDNA, which extends the original 3dRNA platform to support DNA modeling [25,26]. It assembles DNA 3D structures with high accuracy by aligning input sequence and secondary structure information with a curated template library of experimentally determined structure segments [26]. However, these methods rely heavily on accurate secondary structure input, which remains challenging for DNAs with noncanonical or complex folds [27]. Additionally, the limited diversity of template libraries could restrict their effectiveness in de novo predictions for novel sequences.
Physics-based approaches can model DNA folding and structure formation by simulating fundamental physical interactions, without requiring structure templates or secondary structure information. All-atom molecular dynamics (MD) simulations using well-established force fields such as CHARMM [28,29] and AMBER [30–32] have been widely applied to study the microscopic behavior of dsDNA, including its dynamics, flexibility, and structure transitions. However, their high computational cost restricts these simulations to small DNA fragments and short time scales (falling short of capturing folding processes). To address these limitations, a variety of coarse-grained (CG) models have been developed by reducing the degrees of freedom while retaining essential physical and thermodynamic characteristics [33–43]. For example, the oxDNA model represents nucleotides as rigid bodies with backbone, stacking, and hydrogen bonding interactions, and has been widely used to simulate both single- and double-stranded DNA behavior (e.g., mechanics and thermodynamics), as well as large-scale DNA nanostructures such as origami and nano-tweezers [39]. The 3SPN model adopts a three-site representation (phosphate, sugar, base), incorporating base-pairing and stacking to capture DNA denaturation, persistence length, and local curvature [40]. Similarly, another three-interaction-site model of TIS-DNA uses sequence-specific stacking energies derived from dimer thermodynamics, enabling accurate modeling of force-extension behavior, elasticity, and melting temperatures of simple DNAs [41]. Despite their successes, these models typically rely on Gö-like potentials that encode native structural information, limiting their capacity for structure prediction directly from sequence. To overcome this, several physics-based CG models have been proposed to fold DNAs without using the predefined structure restraints. For instance, the NARES-2P, with two CG beads for one nucleotide, can reproduce duplex formation, melting temperatures, and mechanical stability of dsDNAs just from two separate strands [42]. The HiRE-RNA, using six to seven beads per nucleotide, enables the study of both structural dynamics and sequence-dependent melting profiles [43]. However, the parameters of these models may need further validation for quantifying thermodynamic and 3D structure to accord with experiments, especially for ssDNA.
In addition, DNA folding and dynamics are further influenced by its polyanionic nature, making ionic conditions (e.g., Na+ and Mg2+) crucial for accurate structural modeling [44–48]. Several existing CG models, including 3SPN, oxDNA, TIS-DNA, NARES-2P, and HiRE-RNA, incorporate electrostatic interactions using the Debye-Hückel approximation or mean-field multipole–multipole potentials to reproduce monovalent salt-dependent structural properties of DNA, such as persistence length, torsional stiffness or melting temperatures [49–53]. However, these models often neglect the significant influence of divalent ions on DNA/RNA folding. Recent studies have shown that explicitly including divalent cations, like Mg2+, enhances the accuracy of ion-nucleic acid and ion-ion interaction predictions, providing a more precise representation of the complex ionic environment around nucleic acids and better insights into the divalent ion effects on folding thermodynamics [54,55]. Very recently, we have also proposed a three-bead CG model to predict 3D structure folding for DNAs in monovalent/divalent ion solutions from sequence [13]. By integrating sequence-dependent base-pairing, base-stacking, and coaxial stacking interactions, along with an implicit electrostatic potential, the model accurately folded 20 dsDNAs (≤52 nt) and 20 ssDNAs (≤74 nt) into native-like structures (mean RMSD < 4Å), and quantitatively predicted melting temperatures (mean deviation < 3.0 °C) for 27 dsDNAs (including those with bulge loops and internal loops) and 24 ssDNAs (including hairpins and a pseudoknot). Despite these advances, accurately predicting the 3D folding of complex DNAs with multi-way junctions directly from sequence, especially under ionic conditions, remains a significant challenge for current models, including ours.
In this work, we have further refined our previously developed CG model by incorporating a structure-based electrostatic potential, an improved sampling algorithm (Replica Exchange Monte Carlo, REMC), and with the weighted histogram analysis method (WHAM). Specifically, the updated model includes: (i) a refined energy term to account for electrostatic energy term to capture electrostatic interactions between DNA and monovalent/divalent ions; (ii) the use of a more efficient REMC to enhance conformational sampling efficiency compared to conventional simulated annealing [13,56,57]; (iii) WHAM to analyze the thermal stability of DNA; and (iv) an all-atom reconstruction algorithm to recover atomistic structures from CG predictions. We applied the model to predict the 3D structures and thermodynamic stability of DNA with multi-way junctions in both monovalent and divalent ion solutions, and further analyzed their thermal unfolding pathways.
Materials and methods
Coarse-grained representation for DNA structures
In the present model, each DNA nucleotide is represented by a set of three CG beads [13]. Specifically, one bead is assigned to the phosphate group (P), another to the sugar moiety centered at the C4ʹ atom (C), and the third to the nucleobase (N), represented by the N1 atom in pyrimidines or the N9 atom in purines (Fig 1A). This streamlined model retains the essential structural and chemical properties of DNA, such as backbone connectivity and base-pairing interactions [55,57]. The three types of beads are modeled as spheres with van der Waals radii of 1.9 Å (P), 1.7 Å (C), and 2.2 Å (N), respectively, and the P bead carries a unit negative charge to account for the polyanionic nature of DNA.
(A) The folding progress of DNA is based solely on sequence information, incorporating the coarse-grained representation of DNA, the initial conformation, REMC simulations, and a structure ensemble derived from ten different temperature replicas to explore conformational diversity. (B) Refinement of the DNA structure, beginning with the identification of the most probable conformation through clustering of low-energy conformations, followed by the reconstruction of an all-atom model.
Coarse-grained force field
In the present model, the total energy of a DNA chain is given by:
where ,
, and
represent the bonded interactions corresponding to bond lengths, bond angles, and dihedral angles, respectively (see Eqs. S2–S4 in S1 Text). These terms were parameterized by statistically analyzing the distributions of corresponding geometric features from experimentally resolved DNA structures (S1 Table). Notably, most resolved DNA structures are dominated by well-formed helical regions, whose geometric characteristics are not ideal for capturing the folding behavior of flexible, unstructured DNA chains. To overcome this limitation, we separately analyzed base-paired helical regions and unpaired loop regions, deriving two distinct sets of bonded parameters: Parahelix and Paraloop (S2 Table), respectively. During the folding simulations, only Paraloop is used to better model the conformational dynamics of single-stranded DNA. In contrast, Parahelix is employed exclusively in the final refinement stage to base-paired regions, ensuring the formation of canonical helical geometries with continuous base pairing.
The remaining terms in Eq. 1 describe non-bonded interactions; detailed formulations and parameterizations are provided in S1 Text. accounts for excluded volume repulsion between CG beads (Eq. S10).
,
, and
describe base-pairing (Eq. S9), base-stacking (Eq. S6), and coaxial-stacking (Eq. S11) interactions, respectively. The final term,
accounts for electrostatic interactions between phosphate groups (i.e., P beads), and is modeled using a Debye-Hückel approximation that incorporates both the counterion condensation theory and the tightly bound ion (TBI) model [13,58,59]:
Here, denotes the elementary charge,
is the distance between
- and
-th P beads, and N is the total number of P beads in a DNA chain.
is the Debye length, which characterizes the ionic screening effect and depends on the ionic strength of solution.
and
denote the vacuum permittivity and the effective temperature-dependent dielectric constant, respectively [41,60]. In the present model, each P group is initially assigned a unit negative charge (−1 e). When cations are present in the environment, the effective charge of the phosphate groups is reduced. The reduced charge fraction
is defined as
, where
denotes the ion neutralization fraction for the i-th P bead. Given that complex DNA structures, particularly junctions or loop regions, exhibit distinct charge distributions compared to helical regions, the present model applies a structure-based correction to
to more accurately reflect local electrostatic effects. For a solution containing pure ν-valent ions,
is calculated as:
where is the average neutralization fraction derived from counterion condensation theory, b is the average charge spacing along the DNA backbone, and lB is the Bjerrum length. The electrostatic potential ϕi at the i-th P bead is approximated as:
. The reduced charge fractions
determined through an iterative procedure: (1) set
, and compute initial
; (2) calculate
using the current
; (3) update
and
; (4) repeat steps (2) and (3) until convergence. To reduce computational cost,
is updated every 100 MC steps unless the secondary structure changes or replica exchange occurs, in which case it is updated immediately to reflect the new conformational environment. For mixed monovalent and divalent ions,
, where
is the binding fraction of v-valent ions, and x denotes the contribution fractions of monovalent ions, which is derived from the TBI model
.
and [
represents the bulk concentration of
-valent ion [48]; see more details in S1 Text.
Replica-exchange Monte Carlo simulations
The previous version of our CG model, which used MC simulated annealing algorithm for conformational sampling, successfully predicted the 3D structures of simple DNA topologies such as hairpins, duplexes, and minimal H-type pseudoknots [13]. However, it often became trapped in intermediate states when applied to more complex structures, particularly those with multi-way junctions (≥ 3-way), due to the rugged energy landscape and limited sampling efficiency [57]. To address this limitation, the present model employs a more efficient replica exchange Monte Carlo (REMC) algorithm [57,61], which enhances conformational sampling by enabling exchanges between parallel simulations (replicas) at different temperatures. Specifically, 10 replicas are simulated in parallel at temperatures ranging from 25°C to 110°C (e.g., 25°C, 31°C, 37°C, 45°C, 54°C, 64°C, 74°C, 86°C, 98°C, and 110°C). Each replica undergoes pivot-based conformational updates according to the Metropolis criterion, and adjacent replicas periodically attempt exchanges with probability , where
with
and
denoting the temperature-dependent potential energy of conformation x at temperature T. This generalized Metropolis acceptance criterion ensures that the detailed balance condition is satisfied even when the potential energy function explicitly depends on temperature [62,63].
Identifying top-scored coarse-grained 3D structures
To identify the top-scoring 3D structures from the predicted CG structure ensemble, we first selected 1000 conformations with the lowest CG energies from the predicted ensemble at the lowest temperature (i.e., 25°C). We then applied clustering algorithms to group similar structures by calculating the RMSD values between all pairs of structures within the selected ensemble. The cluster with the largest number of structures, within a predefined RMSD threshold was identified, and its members were removed from the initial set. This process was iteratively repeated until all structures were assigned to a cluster [64], Here, a clustering threshold of 0.1 Å times the sequence length (e.g., 5 Å for a 50-residue sequence) was used here [64]. Finally, the medoids of the three largest clusters, along with the decoy with the lowest energy, were selected as the initial predictions.
Rebuilding all-atom structure
For the top-scoring CG structures (top-1 or top-n), a simple structure refinement process was performed using a MC sampling at room temperature. In this process, the parameters of Parahelix and Paraloop were used to calculate the bonded interactions for base-pairing and single-stranded regions, respectively, and other nonbonded interactions were calculated as described earlier. Afterwards, the refined structure was subjected to all-atom reconstruction for practical application.
First, we constructed a small library of five non-redundant all-atom structures for each type of base pairing (G-C, A-T, C-G, T-A) as well as single nucleotides (A, G, C, T) using clustering according to their mutual RMSD values. Then, for each target CG structure, the corresponding nucleotide or base pair was aligned with same sequence fragments from the pre-built library based on CG atom positions. The fragment with the smallest RMSD to the CG atoms was selected as the all-atom replacement for that nucleotide or base pair. This process was repeated for all CG nucleotides until the complete all-atom 3D structure is constructed [14]; see S2 Fig. Finally, to eliminate potential steric clashes and chain breaks in the rebuilt all-atom structures, another refinement step was performed using the QRNAS method [65].
Thermal stability analysis using WHAM
In addition to predicting DNA 3D structures, we applied the WHAM [66] to analyze REMC trajectories and assess DNA thermal stability by calculating the population fractions of structural states across temperatures. This approach enabled us to characterize key thermal properties such as melting temperatures and unfolding pathways. For simplicity, for the simulated DNA, we first categorized the conformations in the DNA ensemble from REMC into different structural states based on the predicted secondary structures, specifically focusing on the retention of the stems. For instance, a DNA structure with a 3-way junction can be divided into 8 states: folded (F), unfolded (U), and 6 intermediate (I) states corresponding to different stem configurations; see S3 Fig for details. Then, the conformational space was then discretized into bins along two reaction coordinates: structural states
and energy levels
. Each bin
defines a micro-state for WHAM analysis.
The probability of each microstate at the target temperature
was computed iteratively using [66]:
where M is the number of replicas, is the count of microstate
at
,
is the total number of conformations at
, and
is the temperature bias factor. The relative fraction of each structural state
at any temperature T, denoted
, was obtained by summing over energy bins:
For simplicity, the fractions of the folded state () and unfolded state (
) were fitted to a two-state transition model [13,67]:
Here, and
denote the melting temperatures for the folded-to-intermediate (F → I) and intermediate-to-unfolded (I → U) transitions, respectively. Detailed definitions of the structural states and further information on the WHAM procedure are provided in S1 Text. It should be point out that the temperature dependence of the potential function in the present CG model may impact the WHAM results [66]. However, the simplifications in the structural states and the two-state transition fitting likely mitigate its effects.
Results and discussion
In this work, we applied our newly developed CG model to predict the 3D structures of complex DNA molecules, including those with three-way and four-way junctions, extending beyond the simpler ssDNA and dsDNA structures. We then evaluated the thermal stability of these complex DNA structures in both monovalent and divalent ion solutions, surpassing the simple ssDNA, dsDNA, and pseudoknot structures explored in our previous work [13]. Finally, we performed a comprehensive analysis of the thermally unfolding pathways for DNA molecules containing three-way and four-way junctions.
Predicting 3D structures of DNA with multi-way junctions
Overview of 3D structure prediction framework.
As shown in Fig 1, the present model predicts DNA 3D structures from its sequence through five key steps: (1) An initial random conformation of the DNA chain is generated based on the given sequence, utilizing bonded potential and the excluded-volume potential
(Eq. 1); (2) This conformation undergoes a REMC simulation for conformational sampling, during which only the parameters of Paraloop for bonded potential are applied to simulate the flexible nature of the DNA chain; (3) Top structures from the REMC ensemble at the lowest temperature (e.g., 25°C) are selected using the CG energy and a clustering algorithm [68]; (4) A further structure refinement (i.e., MC simulation at 25°C) is performed on the top-scoring conformations by applying two distinct sets of bonded potential parameters: Paraloop for the loop regions and Parahelix for the base-pairing regions, enabling a more accurate representation of the helical geometry of the stems [69]; (5) Finally, the refined CG structures are reconstructed into all-atom structures.
To evaluate the performance of the present model, two primary metrics (RMSD and F1-score) are calculated for the predicted structures. The RMSD metric, including both the RMSD of the top-ranked structure (top-1 RMSD) and the minimum RMSD observed in the ensemble (RMSDmin), quantifies the global deviation between the predicted and native structures [70,71]. Both top-1 RMSD and RMSDmin are used to assess the structural prediction accuracy of the model. In addition to the RMSD of the top-1 structure (i.e., top-1 RMSD), the structure with minimum RMSD (i.e., RMSDmin) from the predicted structure ensemble is also used to assess the performance of model. To assess the accuracy of predicted DNA secondary structures, the F1-score, which measures the consistency of base-pairing interactions between the predicted and native structures [72,73], is also calculated by , where precision (PR) is defined as
and sensitivity (SN) as
, with TP, FP, and FN denoting true positives, false positives, and false negatives, respectively.
Validation using experimental DNA junction structures
In this work, we predicted 3D structures of four DNAs containing three-way (PDB IDs: 1snj, 3hxq, 7qb3) and four-way (PDB ID: 2f1q) junctions under standard salt conditions (1M NaCl) [74–76]. As shown in Fig 2, both secondary and tertiary structures were compared with experimentally determined structures. The predicted secondary structures (parsed by DSSR from 3D models with Top-1 RMSD and RMSDmin) largely recapitulate the native base-pairing patterns across all four DNAs, as evidenced by high F1-scores ranging from 0.84 to 0.95. This highlights the ability of model to recover key structural motifs from sequence alone. However, discrepancies remain in certain local and non-canonical base pairs (e.g., C11-G14/C21-G24 in 1snj, G23-T35 in 3hxq, G17-T23 in 7qb3, and C11-G14/C19-G22/C28-G31 in 2f1q). These deviations mainly arise from model assumptions that restrict base pairing to nucleotides separated by at least three positions and exclude non-canonical interactions such as G-T pairs, which should be further improved in future.
(A-D) F1 scores and RMSD values for the minimum RMSD and top-1 RMSD predicted structures of DNAs with three-way junctions (A-C) and four-way junctions (D). The predicted 3D structures with minimum RMSD (brown) and top-1 RMSD (pink) are superimposed on the corresponding native structures (blue). Secondary structures were visualized using VARNA [92], and 3D structures were rendered in PyMOL [93].
All four DNA molecules lack long-range tertiary base pairs (Fig 2), which poses a significant challenge for accurately predicting the relative orientations between stems connected by junctions. Nevertheless, the best predicted 3D structures (RMSDmin) achieve RMSD values ranging from 5.6 to 8.3 Å, while the top-ranked models (Top-1) fall within 8.1-9.2 Å. These results likely benefit from the coaxial stacking interactions in the present model, as well as efficient sampling enabled by the pivot move-based algorithm [56,57].
Collectively, these results demonstrate that the present model can accurately capture both secondary and tertiary structures of DNAs with multi-branched architectures, offering a powerful tool for modeling DNA structures beyond simple duplexes/hairpins.
Comparison with existing methods
To benchmark the present model, we compared it with two representative methods for DNA 3D structure prediction: 3dRNA/DNA [77] and AlphaFold3 [18], using only sequence information as input for all approaches. For 3dRNA/DNA, we used its webserver (http://biophy.hust.edu.cn/new/3dRNA/create) with the default ‘RNAfold’ settings for secondary structure prediction and the ‘Optimization’ option for 3D structure refinement. For AlphaFold3, end-to-end structure predictions were obtained via its webserver (https://alphafoldserver.com/).
Fig 3A and 3C display the RMSDs and F1-scores of the top-ranked (Top-1) structures predicted by three methods for each DNA molecule. For the four DNAs, the RMSDs predicted by the present model are consistently lower than those obtained from 3dRNA/DNA, with two of the predictions outperforming those from AlphaFold3 (Fig 3A). As shown in Fig 3B, for the four DNAs, the average RMSD for the top-ranked (Top-1) predictions from our model is approximately 8.9 Å, representing an improvement of about 43.6% compared to 3dRNA/DNA (~15.6 Å). This improvement is largely attributed to the significantly higher accuracy of our predicted secondary structures (Fig 3C and 3D), highlighting the limitations of fragment assembly approaches that heavily depend on secondary structure inputs [26]. AlphaFold3 achieves an average RMSD of ~9.2 Å, which is comparable to our Top-1 results, but still higher than the best structures predicted by our model (RMSDmin). It is important to note that the experimental structures of these four DNAs were released between 2004 and 2021, while AlphaFold3 was trained on all macromolecular structures available in the PDB prior to January 2023 [18]. This suggests that these DNAs were likely included in its training set, which may explain the near-perfect agreement between its predicted and native base-pairing patterns (Fig 3C and S4 Table). Nevertheless, the performance of AlphaFold3 on these DNAs is notably lower than its typical accuracy (< 3 Å) for protein structure prediction [18,19]. This discrepancy underscores the limited availability of high-quality DNA structural data and highlights the continued need for physics-based modeling approaches.
(A, C) RMSD values (A) and F1 scores (C) for four DNAs with multi-way junctions predicted by our CG model, 3dRNA/DNA, and AlphaFold3. (B, D) Average RMSD values (B) and F1 scores (D) across the four DNAs for each method. Error bars represent the standard deviation of the RMSD and F1 score values.
Predicting thermal stabilities of DNA junctions under ion conditions
DNA functionality depends not only on its 3D structure but also on its thermal stability [13]. Beyond 3D structure prediction, the present model enables quantitative prediction of the thermal unfolding behavior of DNAs containing complex motifs such as three-way (3WJ) and four-way junctions (4WJ) in monovalent and divalent ionic environments.
Predicting thermal stabilities of DNA with multi-way junctions.
We tested the present model on six DNAs, three with 3-way junctions and three with 4-way junctions, whose sequences and structural properties are summarized in Table 1 and S4 Fig. As described in the Materials and Methods section, for each DNA, we combined REMC simulations with the WHAM to compute two distinct melting temperatures: one corresponding to the transition from the folded state to an intermediate state (), and the other from an intermediate state to the unfolded state (
).
As illustrated in Fig 4A–4C, taking a 37-nt DNA (the 3WJ with sequence of 5’-GAAATTGCGCTTTTTGCGCGTGCTTTTTGCACAATTTC-3’) as an example [78], the REMC simulations at 0.1M NaCl yield predictions of its secondary structure (Fig 4A). Based on the base-pairing configurations observed in REMC trajectories at different temperatures (Fig 4B), the conformational ensemble can be grouped into eight microstates, including the folded state, the unfolded state, and six intermediate states (S3 Fig). WHAM was then used to compute the fractional populations of each state across temperatures. Finally, using the temperature-dependent fractions of the folded and unfolded states ( and
), we fit two-state transition models to extract the two melting temperatures (Fig 4C), representing the folded-to-intermediate and intermediate-to-unfolded transitions, respectively. The predicted two melting temperature:
= 34.4°C and
= 64.6°C, are close to experimental data (
= 33.6°C and
= 70.3°C), with deviations within ~2–6 °C (Table 1). In addition, the Fig 4D–4F also present the predicted results for another DNA structure featuring a 4-way junction (4WJ, 53nt; sequence: 5’-GAAATTGCGCTTTTTGCGCATATCTTTTTGATAGGTGCTTTTTGCACAATTTC-3’) [78]. Compared to the 3-way junction case, this system exhibits more complex structural fluctuations (Fig 4E), but follow similar thermal behavior. The predicted melting temperatures are
= 28.6°C and
= 72.1°C, showing notable deviation from the experimental values of
= 51.2°C and
= 65.0°C, especially in the first transition temperature. This discrepancy is likely due to the current simplification of model, which considers only folded and unfolded states while neglecting the stability differences among intermediate states.
(A, D) Predicted secondary and 3D structures of the 3WJ (A) and 4WJ (D) by our CG model. (B, E) Time evolution of the base-pair fractions for 3WJ (B) and 4WJ (E) at different temperatures (110°C, 74°C, 45°C, 25°C, from bottom to top). (C, F) Temperature-dependent fractions of the folded (F, green) and unfolded (U, red) states for 3WJ (C) and 4WJ (F) in 1M NaCl.
Table 1 further summarizes the predictions for four additional DNAs (L-3WJ, R-3WJ, L-4WJ, and R-4WJ) [78]. For the three DNAs with three-way junctions (3WJ, L-3WJ, and R-3WJ), the predicted melting temperatures ( and
) show good agreement with experimental data. The mean deviations are
1.8°C for
and
4.9°C for
, suggesting that the model reliably captures the thermal stability of DNA with three-way junctions [78]. Additionally, for R-3WJ, the predicted melting temperature
of 50.4°C is lower than experimental value (54.6°C). This could be because the experimental
corresponds to the melting of two stems, while the model defined
refers to the melting of a single stem. To address this discrepancy, we redefined the intermediate state containing two stems as the folded state, yielding a revised model prediction of 54.2 °C, closely matching the experimental result.
For DNAs with four-way junctions (4WJ, L-4WJ, and R-4WJ), a similar challenge arises. In experiments, the melting transitions often involve the cooperative melting of multiple stems [78], whereas our model treats transitions at the single-stem level. As a result, the predicted values are consistently lower than the experimental ones, while
values tend to be higher (Table 1). This is because the experimentally determined melting temperatures likely correspond to more stable intermediate states involving more than one stem, whereas our model simplifies these to a single-stem melting event. Nevertheless, the overall thermal transition behavior is well captured, supporting the predictive reliability of the model.
Effect of monovalent ion on the stability of DNAs with multi-way.
DNA 3D structures and their thermal stabilities are highly sensitive to the ionic conditions due to the polyanionic nature of the DNA backbone [44,45,79]. In this work, we examined the effects of the monovalent salt (Na+) on DNA stability using two representative systems: a 3-way junctions (3WJ) and a 4-way junctions (4WJ) [78].
We further extended our predictions to a wider range of Na+ concentrations (see S5 and S6 Tables). As shown in Fig 5A, the stabilities of both the folded and intermediate hairpin structures increase with rising [Na+], which can be attributed to the stronger electrostatic screening at higher [Na+], particularly beneficial for compactly folded or intermediate conformations. Interestingly, we observed that the increase in (unfolding of intermediate hairpin to single strand) with Na+ concentration is slightly pronounced than that of
(transition from fully folded to intermediate state). This is likely due to the fact that the difference in charge density between the intermediate hairpin and fully unfolded state is typically larger than that between the fully folded state and the intermediate state with a single stem. As shown in S5A Fig, for the 3WJ, the average radius of gyration (Rg) of the folded state is 16.9Å, which is only slightly smaller than that (17.6Å) of the intermediate state with one melted stem. In contrast, the Rg of the fully unfolded conformations (25.2Å) is significantly larger than that of the intermediate hairpin state (22.9Å), indicating a more extended and less compact structure. In addition, as shown in S5B Fig, the average ion neutralization fraction (
) of the folded state (0.54) is only marginally higher than that of the one-stem-melted intermediate (0.48), whereas the fully unfolded conformations have a notably lower f value (0.37) than the hairpin intermediate (0.46). Similarly, for the 4WJ (S5C and S5D Fig), the folded state exhibits an average Rg of 19.2 Å, also slightly smaller than that of the one-melted-stem intermediate state (19.8 Å). In contrast, the fully unfolded conformations (31.4Å) exhibit a much larger Rg compared to the intermediate hairpin state (27.6Å). Regarding ion neutralization, the folded state shows a slightly higher
(0.62) than the single-melted-stem intermediate (
= 0.60), whereas the fully unfolded state displays a significantly lower value
(0.47) relative to the three-stem-melted intermediate (0.52). These suggests that the electrostatic stabilization effect of Na+ is more prominent during the transition from intermediate to fully unfolded states, thereby explaining the sharper increase in
with ion concentration.
(A) Melting temperatures of the folded-to-intermediate (F → I, ) and intermediate-to-unfolded (I → U,
) transitions for 3WJ as a function of [Na+]. (B, C) Melting temperature (
and
) for 3WJ as a function of [Mg2+] at fixed [Na+] of 10 mM (B) and 100 mM (C). (D) Melting temperature for 4WJ of the F → I (
) and I → U (
) transitions as a function of [Na+]. (E, F) Melting temperature of 4WJ as a function of [Mg2+], with [Na+] fixed at 10 mM (E) and 100 mM (F).
Effect of divalent ion on the stability of DNAs with multi-way junction.
To further investigate the ionic effects on DNA folding, we analyzed the predicted melting temperatures ( and
) of DNA with multi-way junctions across a wide range of [Mg2+], while fixing [Na+] at 10 mM (Fig 5B) and 100 mM (Fig 5C), respectively. The results reveal that the present model successfully captures the competitive and cooperative interactions between monovalent (Na+) and divalent (Mg2+) ions in modulating the stability of DNA with three-way junctions (3WJ). At low [Mg2+] (e.g., 1 mM), the stability of 3WJ is dominated by the background Na+ concentration, and the predicted temperatures (
and
) closely resemble those observed in pure Na+ solutions. As [Mg2+] increases (~>1 mM), the stability of the 3WJ structure is significantly enhanced due to the stronger charge neutralization of Mg2+. This stabilizing effect saturates at higher [Mg2+] (~>100 mM). This phenomenon is due to the anti-cooperative binding between Na+ and Mg2+, coupled with the more efficient stabilization provided by Mg2+. A similar trend is observed for four-way junctions (4WJ), as shown in Fig 5D–5F, further supporting the robustness of the model in describing the thermodynamic effects of divalent cations.
These findings indicate that the present model can quantitatively capture the influence of both monovalent and divalent ions on the thermal stability of complex DNA junction structures. However, recent studies [47,80,81] have shown that at very high ionic concentrations, excessive ion binding can lead to overcharging effects, which in turn destabilize DNA structures. Since the implicit ion model used in our model assumes a maximal screening effect, effectively neutralizing the DNA backbone charges, it is inherently unable to capture such overcharging phenomena. This limitation suggests that incorporating explicit ion representations, particularly for multivalent ions, may be essential for accurately modeling DNA stability under extreme ionic conditions [41,54,55].
Thermally unfolding pathway of DNAs with multi-way junctions
Understanding the unfolding pathways of DNA multi-way junctions is essential for elucidating their structure-function relationships [78]. The present model not only accurately predicts the melting temperatures of DNA three- and four-way junctions but also reveals the detailed thermally induced transitions among folded, intermediate, and unfolded states based on their temperature-dependent fractions calculated from REMC trajectories using WHAM. In this work, we focus on two typical DNA with multi-way junctions structures 3WJ (DNA with three-way junction) and 4WJ (DNA with four-way junction) to analyze their unfolding pathways beyond the minimal ssDNA and dsDNA [13]. Here, the intermediate states are denoted as I1, I2,..., corresponding to the number of melting stems, while states without labels or those labeled with ‘’, ‘
’, ‘
’, etc., represent states with the highest, second-highest, third-highest fraction, and so on.
For a DNA with three-way junctions (3WJ).
At 1 M NaCl, the 3WJ structure exhibits a stepwise unfolding mechanism involving three key intermediates (I1, I2, and I2) (Fig 6). As shown in Fig 6A, at low temperatures (<~30°C), the 3WJ remains predominantly in the fully folded state (F). As the temperature increases from ~30°C to ~50°C, the fraction of F decreases from ~99% to ~33%, while the fractions of intermediate states I1, I2, and I2
gradually increase to ~31%, ~ 30%, and ~5%, respectively. Between ~50°C and ~70°C, the fractions of the F and I1 further decline to ~2% and ~9%, whereas I2 and I2
reach their peak values of ~45% and ~10%, respectively. At ~110°C, the system transitions into the unfolded state (U). These observations suggest that the dominant unfolding pathway of the 3WJ follows F → I1 → I2 → U, with I2 being the most populated intermediate at ~70°C. Additionally, two minor pathways are identified: F → I2 → U and F → I1 → I2
→U, with the former having a slightly higher flux than the latter.
(A) Temperature-dependent population fractions of different structural states: fully folded (F), intermediate (I1, I2, I2), and fully unfolded (U). F represents the fully folded conformation, I1 corresponds to the intermediate state with Stems 2 and 3 melted, I2/I2
represent the hairpin intermediates with Stem 2/Stem 3, and U denotes the fully unfolded structure. (B-D) Temperature-dependent free energies of various intermediate states (I1, I1
, I1
, I2, I2
, and I2
) calculated using Mfold. (E) Schematic representation of the structure transitions along the unfolding pathway inferred from the state fractions shown in panel (A).
Given that the inferred unfolding pathways of the 3WJ have not been experimentally validated, we employed the well-established nearest-neighbor thermodynamic model (i.e., Mfold [67,82]) to independently estimate the free energies of eight structural states (see S3 Fig) to further support our proposed mechanism. At ~50°C, the relative free energy differences between the I1 and I1 states
was ~ -1.2 kcal/mol, and
= ~ -2.8 kcal/mol, indicating that I1
and I1
are less stable than I1. This agrees with the higher observed population of I1 compared to I1
and I1
at this temperature. Next, we examined the I2 state (stems 1 and 3 melted). Its free energy is higher than that of I1 below ~50°C but becomes lower at higher temperatures, with
= ~ 0.5 kcal/mol at 50°C and ~-1.3 kcal/mol at 70°C. This suggests that I2 becomes more populated as temperature increases, supporting the major unfolding pathway F → I1 → I2 → U. We also evaluated states with two stems melted, including I2
(stems 1 and 2 melted) and I2
(stems 2 and 3 melted), which have negligible populations. Notably, I2
exhibited significantly higher free energy than I2 and I2
, consistent with its negligible fraction.
For a DNA with four-way junction (4WJ).
The 4WJ displays a more complex landscape, with up to sixteen intermediate states (S6 Fig). Based on our model predictions, we calculated the population fractions of key structural states across temperatures (Fig 7A). At low temperatures (<~20°C), the 4WJ predominantly adopts the fully folded state (F). As temperature increases to ~40°C, the fraction of F decreases to ~30%, accompanied by a rise in intermediate states I1 (Stem 3 melted), I1 (Stem 1 melted), and I2 (Stems 1 and 3 melted), reaching ~14%, ~ 13%, and ~38%, respectively. At ~50°C, I2 becomes the dominant intermediate (~48%), while F, I1, and I1′ nearly vanish. With further heating, I2 transitions into I3 (three stems melted), which peaks (~47%) around 70°C. Eventually, the system reaches the fully unfolded state (U) at ~120°C. Based on these results, the primary unfolding pathway is proposed as F → I2 → I3 → U, with additional minor routes: F → I1 → I2 → I3 → U, and F → I1
→I2 → I3 → U.
(A) Temperature-dependent fractions of the folded state (F), intermediate states (I1, I1, I2, I3), and unfolded state (U) during the thermally unfolding. I1: with Stems 1, 2, and 4, I1
: with Stems 2, 3, and 4, I2: with Stems 2 and 4, I3: with Stem 2. (C-F) Free energies of various intermediate states (I1, I1
, I1
, I1
, I2, I2
, I2
, I2
, I2
, I2
, I3, I3
, I3
, and I3
) as functions of temperature calculated using Mfold. (G) Schematic representation of DNA structure transitions along the unfolding pathway.
To validate the model-predicted pathway, we computed the free energies of key intermediate states using Mfold [82]. For accurate secondary structure prediction for Mfold, loop sequences were replaced with ‘X’. We first analyzed single-stem-melted states at 1 M NaCl and ~30°C. The relative free energy differences were: ( = ~ -0.4 kcal/mol,
= - ~ 1.0 kcal/mol,
= - ~ 2.5 kcal/mol,
= - ~ 0.6 kcal/mol, and
= ~ -2.2 kcal/mol, indicating that I1
and I1
are much less stable than I1 and I1
, which is consistent with their low populations. We then examined the I2 state (Stems 1 and 3 melted). At ~30°C, it is less stable than I1/I1
(
kcal/mol and
kcal/mol), but becomes more favorable at ~50°C (both were ~ -0.6 kcal/mol), supporting its dominance in the major pathway. States with two (e.g., I2
, I2
) or three (e.g., I3
, I3
, I3
) stems melted exhibit significantly higher free energies and negligible populations (Fig 7), consistent with their minor roles. Overall, the free energy landscape confirms that unfolding proceeds primarily via F → I2 → I3 → U, with I2 serving as a key thermodynamic intermediate.
Conclusion
In conclusion, this study presents a significant advancement in modeling the 3D structure and thermal stability of complex DNAs, with a particular focus on multi-way junctions. By refining the electrostatic energy terms of our CG model, incorporating the REMC algorithm, and using the WHAM, we have extended the applicability of model to three-way and four-way DNA junctions under both monovalent and divalent ion conditions. The main contributions of this work are as follows:
- Accurate structure predictions: The refined CG model can predict near-native structures of complex DNAs, including three-way and four-way junctions, from sequence. The predictions align well with results from state-of-the-art models such as 3dRNA/DNA and AlphaFold3, demonstrating the capacity of model to capture the intricate geometry of DNAs with multi-way junctions.
- Reliable thermal stability profiling: The model can reproduce the thermal stability of DNA junctions with different sequences and topologies. Notably, it successfully captures the effects of both monovalent and divalent ions on DNA junction stability, showing strong agreement with experimental data. This demonstrates the capacity of the model to simulate biologically relevant conditions.
- Insight into thermally unfolding pathways: Our analysis reveals that the thermal unfolding of multi-way junctions proceeds via discrete intermediate states, whose relative stabilities govern the unfolding trajectories. These findings elucidate the thermodynamic landscape of DNA junctions and offer a detailed view of their transition pathways.
Despite these advances, several limitations remain. First, the model does not account for non-canonical base pairings (e.g., G-G, G-T, A-A, A-G) [83,84], which are crucial for the stability and structural features of functional DNA structures, such as DNA triplexes, and G-quadruplexes [85,86], meaning that the current version of the model is unable to predict these complex structures. Second, while the model incorporates the effects of mono-/divalent ions through the CC theory and TBI model, the implicit ion model could be insufficient to capture the interactions between multi-valent ions (e.g., Mg2+ and Go3+) and DNA, particularly the specific binding of these ions. Such interactions are critical for stabilizing complex structures like DNA G-quadruplexes [87,88], making it necessary to incorporate explicit treatment of divalent ions in future improvements to the model [54,55]. Third, the top-ranked structure predicted by the present model typically does not correspond to the most native-like structure within the predicted ensemble, highlighting the need for further development of DNA scoring function to identify the near-native structure models [89]. Finally, the cellular environment of DNA involves not only ions but also other macromolecules (e.g., proteins and RNAs) and small molecules (e.g., ligands). This crowded environment and its interactions with DNA can influence DNA structural folding. Effectively accounting for these interactions to study DNA folding in a cell-like environment remains a significant challenge [53,90,91].
Although there are limitations, our refined CG model provides a powerful and reliable framework for studying the 3D structures and stability of complex DNA configurations in the presence of physiologically relevant ions. The insights gained from analyzing the thermally unfolding pathways of DNA junctions contribute to a deeper understanding of DNA stability and the mechanisms underlying their biological functions, providing a strong foundation for future studies in the field of DNA biophysics.
Supporting information
S1 Fig.
(A) The schematic diagram for the formation of one base stacking between base pairs (i, j) and (i + 1, j-1); (B) The conformational entropy changes ΔSc for the formation of base-pairs stacking at different location i (symbols), and the average value of ΔSc (line).
https://doi.org/10.1371/journal.pcbi.1013346.s001
(TIF)
S2 Fig. The schematic diagram of the rebuilding of coarse-grained DNA structures into all-atom ones in our present model.
https://doi.org/10.1371/journal.pcbi.1013346.s002
(TIF)
S3 Fig. The eight structural states of 3WJ, including a folded state, six intermediate states, and a unfolded state.
Here, F state: Stems 1, 2, and 3 retained, I1: Stem 1 resolved, I1: Stem 3 resolved, I1
: Stem 2 resolved, I2: Stems 1 and 3 resolved, I2
: Stems 1 and 2 resolved, I2
: Stems 2 and 3 resolved, U: All stems resolved.
https://doi.org/10.1371/journal.pcbi.1013346.s003
(TIF)
S4 Fig. The sequence and secondary structure of 6 DNAs with multi-way junctions employed in this paper.
https://doi.org/10.1371/journal.pcbi.1013346.s004
(TIF)
S5 Fig. The average radius of gyration (Rg) and ion neutralization fraction of different structural states for 3WJ and 4WJ.
(A,B) The average Rg (A) and ion neutralization fraction (B) of folded state (F), intermediate states I1*/I2* (two stems retained/one stem retained), unfolded state (zero stem retained) for 3WJ. (C,D) The average Rg (A) and ion neutralization fraction (B) of folded state (F), intermediate states I1*/I3* (three stems retained/one stem retained), unfolded state (zero stem retained) for 4WJ.
https://doi.org/10.1371/journal.pcbi.1013346.s005
(TIF)
S6 Fig. The sixteen structural states of 4Wj, including a folded state, fourteen intermediate states, a unfolded state.
Here F: All stem retained, I1: Stem 3 resolved, I1: Stem 1 resolved, I1
: Stem 4 resolved, I1
: Stem 2 resolved, I2: Stems 1 and 3 resolved, I2
: Stems 1 and 4 resolved, I2
: Stems 2 and 3 resolved, I2
: Stems 1 and 2 resolved, I2
: Stems 2 and 3 resolved, I2
: Stems 2 and 4 resolved, I3: Stems 1,3, and 4 resolved, I3
: Stems 1, 2, and 3 resolved, I3
: Stems 1, 2, and 4 resolved, I3
: Stems 2, 3, and 4 resolved, U: All stems resolved.
https://doi.org/10.1371/journal.pcbi.1013346.s006
(TIF)
S1 Table. The PDB codes of 138 DNAs used in our statistical analysis for CG force field.
https://doi.org/10.1371/journal.pcbi.1013346.s007
(XLSX)
S2 Table. The parameters of bonded potentials of CG force field.
https://doi.org/10.1371/journal.pcbi.1013346.s008
(XLSX)
S3 Table. The parameters for the energy functions of base pairing and base stacking.
https://doi.org/10.1371/journal.pcbi.1013346.s009
(XLSX)
S4 Table. The predicted RMSD and F1-socre by three prediction models for 4 DNAs.
https://doi.org/10.1371/journal.pcbi.1013346.s010
(XLSX)
S5 Table. The predicted temperature of 3WJ at extensive ion concentration by our present model.
https://doi.org/10.1371/journal.pcbi.1013346.s011
(XLSX)
S6 Table. The predicted temperature of 4WJ at extensive ion concentration by our present model.
https://doi.org/10.1371/journal.pcbi.1013346.s012
(XLSX)
S1 Text. Detailed description of the coarse-grained force field of the present model and the Weighted Histogram Analysis Method.
https://doi.org/10.1371/journal.pcbi.1013346.s013
(DOCX)
Acknowledgments
We are grateful to Profs. Yang Yu (Guizhou Medical University), Qiude Li (Guizhou Medical University), Zhi-Jie Tan (Wuhan University), and Bengong Zhang (Wuhan Textile University) for valuable discussions, and we would like to acknowledge computing resources from the Super Computing Center of Guizhou Medical University.
References
- 1. Cevallos Y, Nakano T, Tello-Oquendo L, Rushdi A, Inca D, Santillán I, et al. A brief review on DNA storage, compression, and digitalization. Nano Commun Netw. 2022;31:100391.
- 2. Kornberg A. Biologic synthesis of deoxyribonucleic acid: An isolated enzyme catalyzes synthesis of this nucleic acid in response to directions from pre-existing DNA. Science. 1960;131:1503–8.
- 3. Smith GR. DNA supercoiling: another level for regulating gene expression. Cell. 1981;24(3):599–600. pmid:6265097
- 4. Guiblet WM, Cremona MA, Harris RS, Chen D, Eckert KA, Chiaromonte F, et al. Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome. Nucleic Acids Res. 2021;49(3):1497–516. pmid:33450015
- 5. Xavier PL, Chandrasekaran AR. DNA-based construction at the nanoscale: emerging trends and applications. Nanotechnology. 2018;29(6):062001. pmid:29232197
- 6. Zhang D, Lam J, Blobel GA. Engineering three-dimensional genome folding. Nat Genet. 2021;53(5):602–11. pmid:33958782
- 7. Bonev B, Cavalli G. Organization and function of the 3D genome. Nat Rev Genet. 2016;17(11):661–78. pmid:27739532
- 8. Andersson R, Sandelin A. Determinants of enhancer and promoter activities of regulatory elements. Nat Rev Genet. 2020;21(2):71–87. pmid:31605096
- 9. Liu D, Chen G, Akhter U, Cronin TM, Weizmann Y. Creating complex molecular topologies by configuring DNA four-way junctions. Nat Chem. 2016;8(10):907–14. pmid:27657865
- 10. Xia Y, Zheng K-w, He Y-d, Liu H-h, Wen C-j, Hao Y-h, et al. Transmission of dynamic supercoiling in linear and multi-way branched DNAs and its regulation revealed by a fluorescent G-quadruplex torsion sensor. Nucleic Acids Res. 2018;46:7418–24.
- 11. Rose PW, Prlić A, Altunkaya A, Bi C, Bradley AR, Christie CH, et al. The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 2016;gkw1000.
- 12. Burley SK, Berman HM, Kleywegt GJ, Markley JL, Nakamura H, Velankar S. Protein Data Bank (PDB): the single global macromolecular structure archive. Protein Crystallogr Methods Protoc. 2017:627–41.
- 13. Mu Z-C, Tan Y-L, Zhang B-G, Liu J, Shi Y-Z. Ab initio predictions for 3D structure and stability of single- and double-stranded DNAs in ion solutions. PLoS Comput Biol. 2022;18(10):e1010501. pmid:36260618
- 14. Mu Z-C, Tan Y-L, Liu J, Zhang B-G, Shi Y-Z. Computational Modeling of DNA 3D Structures: From Dynamics and Mechanics to Folding. Molecules. 2023;28(12):4833. pmid:37375388
- 15. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, et al. GenBank. Nucleic Acids Res. 2012;41:D36–42.
- 16. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Ostell J, Pruitt KD, et al. GenBank. Nucleic Acids Res. 2018;46:D41–7.
- 17. Singh A, Maity A, Singh N. Structure and Dynamics of dsDNA in Cell-like Environments. Entropy (Basel). 2022;24(11):1587. pmid:36359677
- 18. Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630(8016):493–500. pmid:38718835
- 19. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9. pmid:34265844
- 20. Wang W, Feng C, Han R, Wang Z, Ye L, Du Z, et al. trRosettaRNA: automated prediction of RNA 3D structure with transformer network. Nat Commun. 2023;14(1):7266. pmid:37945552
- 21. Li Y, Zhang C, Feng C, Pearce R, Lydia Freddolino P, Zhang Y. Integrating end-to-end learning with deep geometrical potentials for ab initio RNA structure prediction. Nat Commun. 2023;14(1):5745. pmid:37717036
- 22. Shen T, Hu Z, Sun S, Liu D, Wong F, Wang J, et al. Accurate RNA 3D structure prediction using a language model-based deep learning approach. Nature Methods. 2024:1–12.
- 23. Li J, Chiu T-P, Rohs R. Predicting DNA structure using a deep learning method. Nat Commun. 2024;15(1):1243. pmid:38336958
- 24. McDonnell RT, Henderson AN, Elcock AH. Structure Prediction of Large RNAs with AlphaFold3 Highlights its Capabilities and Limitations. J Mol Biol. 2024;436(22):168816. pmid:39384035
- 25. Zhang Y, Wang J, Xiao Y. 3dRNA: 3D Structure Prediction from Linear to Circular RNAs. J Mol Biol. 2022;434(11):167452. pmid:35662453
- 26. Xiong Y, Zhang Y, Wang J, Xiao Y. Using 3dRNA/DNA for RNA and DNA 3D Structure Prediction and Evaluation. Curr Protoc. 2023;3:e770.
- 27. Mittal A, Turner DH, Mathews DH. NNDB: An Expanded Database of Nearest Neighbor Parameters for Predicting Stability of Nucleic Acid Secondary Structures. J Mol Biol. 2024;436(17):168549. pmid:38522645
- 28. Croitoru A, Kumar A, Lambry J-C, Lee J, Sharif S, Yu W, et al. Increasing the Accuracy and Robustness of the CHARMM General Force Field with an Expanded Training Set. J Chem Theory Comput. 2025;21(6):3044–65. pmid:40033678
- 29. Vanommeslaeghe K, Hatcher E, Acharya C, Kundu S, Zhong S, Shim J, et al. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J Comput Chem. 2010;31(4):671–90. pmid:19575467
- 30. Qiang X-W, Zhang C, Dong H-L, Tian F-J, Fu H, Yang Y-J, et al. Multivalent Cations Reverse the Twist-Stretch Coupling of RNA. Phys Rev Lett. 2022;128(10):108103. pmid:35333091
- 31. Dong H-L, Zhang C, Dai L, Zhang Y, Zhang X-H, Tan Z-J. The origin of different bending stiffness between double-stranded RNA and DNA revealed by magnetic tweezers and simulations. Nucleic Acids Res. 2024;52(5):2519–29. pmid:38321947
- 32. Dickson CJ, Walker RC, Gould IR. Lipid21: Complex Lipid Membrane Simulations with AMBER. J Chem Theory Comput. 2022;18(3):1726–36. pmid:35113553
- 33. Dans PD, Walther J, Gómez H, Orozco M. Multiscale simulation of DNA. Curr Opin Struct Biol. 2016;37:29–45.
- 34. Ratajczyk EJ, Šulc P, Turberfield AJ, Doye JP, Louis AA. Coarse-grained modeling of DNA–RNA hybrids. J Chem Phys. 2024;160.
- 35. Reshetnikov RV, Stolyarova AV, Zalevsky AO, Panteleev DY, Pavlova GV, Klinov DV, et al. A coarse-grained model for DNA origami. Nucleic Acids Res. 2018;46(3):1102–12. pmid:29267876
- 36. Souza PCT, Alessandri R, Barnoud J, Thallmair S, Faustino I, Grünewald F, et al. Martini 3: a general purpose force field for coarse-grained molecular dynamics. Nat Methods. 2021;18(4):382–8. pmid:33782607
- 37. Assenza S, Pérez R. Accurate Sequence-Dependent Coarse-Grained Model for Conformational and Elastic Properties of Double-Stranded DNA. J Chem Theory Comput. 2022;18(5):3239–56. pmid:35394775
- 38. Starr FW, Wang W, Nocka LM, Wiemann BZ, Hinckley DM, Mukerji I. Holliday junction thermodynamics and structure: comparisons of coarse-grained simulations and experiments. Biophys J. 2016;110:178a.
- 39. Poppleton E, Romero R, Mallya A, Rovigatti L, Šulc P. OxDNA.org: a public webserver for coarse-grained simulations of DNA and RNA nanostructures. Nucleic Acids Res. 2021;49(W1):W491–8. pmid:34009383
- 40. Freeman GS, Hinckley DM, de Pablo JJ. A coarse-grain three-site-per-nucleotide model for DNA with explicit ions. J Chem Phys. 2011;135(16):165104. pmid:22047269
- 41. Chakraborty D, Hori N, Thirumalai D. Sequence-Dependent Three Interaction Site Model for Single- and Double-Stranded DNA. J Chem Theory Comput. 2018;14(7):3763–79. pmid:29870236
- 42. He Y, Liwo A, Scheraga HA. Optimization of a Nucleic Acids united-RESidue 2-Point model (NARES-2P) with a maximum-likelihood approach. J Chem Phys. 2015;143(24):243111. pmid:26723596
- 43. Cragnolini T, Laurin Y, Derreumaux P, Pasquali S. Coarse-Grained HiRE-RNA Model for ab Initio RNA Folding beyond Simple Molecules, Including Noncanonical and Multiple Base Pairings. J Chem Theory Comput. 2015;11(7):3510–22. pmid:26575783
- 44. Zhang Y, Zhou H, Ou-Yang ZC. Stretching single-stranded DNA: interplay of electrostatic, base-pairing, and base-pair stacking interactions. Biophys J. 2001;81(2):1133–43. pmid:11463654
- 45. Cruz-León S, Vanderlinden W, Müller P, Forster T, Staudt G, Lin Y-Y, et al. Twisting DNA by salt. Nucleic Acids Res. 2022;50(10):5726–38. pmid:35640616
- 46. Lipfert J, Doniach S, Das R, Herschlag D. Understanding nucleic acid-ion interactions. Annu Rev Biochem. 2014;83:813–41. pmid:24606136
- 47. Maity A, Singh A, Singh N. Differential stability of DNA based on salt concentration. Eur Biophys J. 2017;46(1):33–40. pmid:27165706
- 48. Tan Z-J, Chen S-J. Electrostatic free energy landscapes for nucleic acid helix assembly. Nucleic Acids Res. 2006;34(22):6629–39. pmid:17145719
- 49. Hinckley DM, Freeman GS, Whitmer JK, de Pablo JJ. An experimentally-informed coarse-grained 3-Site-Per-Nucleotide model of DNA: structure, thermodynamics, and dynamics of hybridization. J Chem Phys. 2013;139(14):144903. pmid:24116642
- 50. Knotts TA 4th, Rathore N, Schwartz DC, de Pablo JJ. A coarse grain model for DNA. J Chem Phys. 2007;126(8):084901. pmid:17343470
- 51. Freeman GS, Hinckley DM, Lequieu JP, Whitmer JK, De Pablo JJ. Coarse-grained modeling of DNA curvature. J Chem Phys. 2014;141:165103.
- 52. Ouldridge TE, Louis AA, Doye JPK. Structural, mechanical, and thermodynamic properties of a coarse-grained DNA model. J Chem Phys. 2011;134(8):085101. pmid:21361556
- 53. Hong F, Schreck JS, Šulc P. Understanding DNA interactions in crowded environments with a coarse-grained model. Nucleic Acids Res. 2020;48(19):10726–38. pmid:33045749
- 54. Hayes RL, Noel JK, Mandic A, Whitford PC, Sanbonmatsu KY, Mohanty U, et al. Generalized Manning Condensation Model Captures the RNA Ion Atmosphere. Phys Rev Lett. 2015;114(25):258105. pmid:26197147
- 55. Nguyen HT, Hori N, Thirumalai D. Theory and simulations for RNA folding in mixtures of monovalent and divalent cations. Proc Natl Acad Sci U S A. 2019;116(42):21022–30. pmid:31570624
- 56. Shi Y-Z, Jin L, Wang F-H, Zhu X-L, Tan Z-J. Predicting 3D Structure, Flexibility, and Stability of RNA Hairpins in Monovalent and Divalent Ion Solutions. Biophys J. 2015;109(12):2654–65. pmid:26682822
- 57. Wang X, Tan Y-L, Yu S, Shi Y-Z, Tan Z-J. Predicting 3D structures and stabilities for complex RNA pseudoknots in ion solutions. Biophys J. 2023;122(8):1503–16. pmid:36924021
- 58. Manning GS. The molecular theory of polyelectrolyte solutions with applications to the electrostatic properties of polynucleotides. Q Rev Biophys. 1978;11(2):179–246. pmid:353876
- 59. Tan Z-J, Chen S-J. Nucleic acid helix stability: effects of salt concentration, cation valence and size, and chain length. Biophys J. 2006;90(4):1175–90. pmid:16299077
- 60. Shi Y-Z, Wang F-H, Wu Y-Y, Tan Z-J. A coarse-grained model with implicit salt for RNAs: predicting 3D structure, stability and salt effect. J Chem Phys. 2014;141(10):105102. pmid:25217954
- 61. Hukushima K, Nemoto K. Exchange Monte Carlo method and application to spin glass simulations. J Phys Soc Japan. 1996;65:1604–8.
- 62. Okamoto Y. Generalized-ensemble algorithms: enhanced sampling techniques for Monte Carlo and molecular dynamics simulations. J Mol Graph Model. 2004;22:425–39.
- 63. Fukunishi H, Watanabe O, Takada S. On the Hamiltonian replica exchange method for efficient sampling of biomolecular systems: application to protein structure prediction. J Chem Phys. 2002;116:9058–67.
- 64. Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, et al. SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction. Nucleic Acids Res. 2016;44(7):e63. pmid:26687716
- 65. Stasiewicz J, Mukherjee S, Nithin C, Bujnicki JM. QRNAS: software tool for refinement of nucleic acid structures. BMC Struct Biol. 2019;19(1):5. pmid:30898165
- 66. Chodera JD, Swope WC, Pitera JW, Seok C, Dill KA. Use of the Weighted Histogram Analysis Method for the Analysis of Simulated and Parallel Tempering Simulations. J Chem Theory Comput. 2007;3(1):26–41. pmid:26627148
- 67. Xia T, SantaLucia J Jr, Burkard ME, Kierzek R, Schroeder SJ, Jiao X, et al. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson− Crick base pairs. Biochemistry. 1998;37:14719–35.
- 68.
Kanungo T, Mount DM, Netanyahu NS, Piatko C, Silverman R, Wu AY. Proceedings of the sixteenth annual symposium on Computational geometry. 2000. p. 100–9.
- 69. Shi Y-Z, Jin L, Feng C-J, Tan Y-L, Tan Z-J. Predicting 3D structure and stability of RNA pseudoknots in monovalent and divalent ion solutions. PLoS Comput Biol. 2018;14(6):e1006222. pmid:29879103
- 70. Zhao Y, Huang Y, Gong Z, Wang Y, Man J, Xiao Y. Automated and fast building of three-dimensional RNA structures. Sci Rep. 2012;2:734.
- 71. Zhang D, Chen S-J. IsRNA: An Iterative Simulated Reference State Approach to Modeling Correlated Interactions in RNA Folding. J Chem Theory Comput. 2018;14(4):2230–9. pmid:29499114
- 72. Singh J, Hanson J, Paliwal K, Zhou Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat Commun. 2019;10(1):5407. pmid:31776342
- 73. Cruz JA, Blanchet M-F, Boniecki M, Bujnicki JM, Chen S-J, Cao S, et al. RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction. RNA. 2012;18(4):610–25. pmid:22361291
- 74. Wu B, Girard F, van Buuren B, Schleucher J, Tessari M, Wijmenga S. Global structure of a DNA three-way junction by solution NMR: towards prediction of 3H fold. Nucleic Acids Res. 2004;32(10):3228–39. pmid:15199171
- 75. Huang R-H, Fremont DH, Diener JL, Schaub RG, Sadler JE. A structural explanation for the antithrombotic activity of ARC1172, a DNA aptamer that binds von Willebrand factor domain A1. Structure. 2009;17(11):1476–84. pmid:19913482
- 76. Andrałojć W, Wieruszewska J, Pasternak K, Gdaniec Z. Solution Structure of a Lanthanide-binding DNA Aptamer Determined Using High Quality pseudocontact shift restraints. Chemistry. 2022;28(66):e202202114. pmid:36043489
- 77. Zhang Y, Xiong Y, Yang C, Xiao Y. 3dRNA/DNA: 3D Structure Prediction from RNA to DNA. J Mol Biol. 2024;436(17):168742. pmid:39237199
- 78. Carr CE, Marky LA. Effect of GCAA stabilizing loops on three- and four-way intramolecular junctions. Phys Chem Chem Phys. 2018;20(7):5046–56. pmid:29388988
- 79. Fu H, Zhang C, Qiang X-W, Yang Y-J, Dai L, Tan Z-J, et al. Opposite Effects of High-Valent Cations on the Elasticities of DNA and RNA Duplexes Revealed by Magnetic Tweezers. Phys Rev Lett. 2020;124(5):058101. pmid:32083903
- 80. Zhang C, Tian FJ, Zuo HW, Qiu QY, Zhang JH, Wei W, et al. Counterintuitive DNA destabilization by monovalent salt at high concentrations due to overcharging. Nat Commun. 2025;16:113.
- 81. Sardana D, Alam P, Yadav K, Clovis NS, Kumar P, Sen S. Unusual similarity of DNA solvation dynamics in high-salinity crowding with divalent cations of varying concentrations. Phys Chem Chem Phys. 2023;25:27744–55.
- 82. Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31(13):3406–15. pmid:12824337
- 83. Das J, Mukherjee S, Mitra A, Bhattacharyya D. Non-canonical base pairs and higher order structures in nucleic acids: crystal structure database analysis. J Biomol Struct Dyn. 2006;24(2):149–61. pmid:16928138
- 84. Mukherjee S, Bansal M, Bhattacharyya D. Conformational specificity of non-canonical base pairs and higher order structures in nucleic acids: crystal structure database analysis. J Comput Aided Mol Des. 2006;20(10–11):629–45. pmid:17124630
- 85. van Dongen MJ, Doreleijers JF, van der Marel GA, van Boom JH, Hilbers CW, Wijmenga SS. Structure and mechanism of formation of the H-y5 isomer of an intramolecular DNA triple helix. Nat Struct Biol. 1999;6:854–9.
- 86. Koshlap KM, Schultze P, Brunar H, Dervan PB, Feigon J. Solution structure of an intramolecular DNA triplex containing an N7-glycosylated guanine which mimics a protonated cytosine. Biochemistry. 1997;36(9):2659–68. pmid:9054573
- 87. Lam EYN, Beraldi D, Tannahill D, Balasubramanian S. G-quadruplex structures are stable and detectable in human genomic DNA. Nat Commun. 2013;4:1796.
- 88. Mao S-Q, Ghanbarian AT, Spiegel J, Martínez Cuesta S, Beraldi D, Di Antonio M, et al. DNA G-quadruplex structures mold the DNA methylome. Nat Struct Mol Biol. 2018;25(10):951–7. pmid:30275516
- 89. Tan YL, Wang X, Shi YZ, Zhang W, Tan ZJ. rsRNASP: A residue-separation-based statistical potential for RNA 3D structure evaluation. Biophys J. 2022;121:142–56.
- 90. Singh A, Singh N. DNA melting in the presence of molecular crowders. Phys Chem Chem Phys. 2017;19(29):19452–60. pmid:28718468
- 91. Mathur N, Singh A, Singh N. Force-induced unzipping of DNA in the presence of solvent molecules. Biophys Chem. 2024;307:107175. pmid:38244296
- 92. Darty K, Denise A, Ponty Y. VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics. 2009;25:1974.
- 93. DeLano WL. Pymol: An open-source molecular graphics tool. CCP4 Newsl Protein Crystallogr. 2002;40:82–92.