Simple Elastic Network Models for Exhaustive Analysis of Long Double-Stranded DNA Dynamics with Sequence Geometry Dependence

Simple elastic network models of DNA were developed to reveal the structure-dynamics relationships for several nucleotide sequences. First, we propose a simple all-atom elastic network model of DNA that can explain the profiles of temperature factors for several crystal structures of DNA. Second, we propose a coarse-grained elastic network model of DNA, where each nucleotide is described only by one node. This model could effectively reproduce the detailed dynamics obtained with the all-atom elastic network model according to the sequence-dependent geometry. Through normal-mode analysis for the coarse-grained elastic network model, we exhaustively analyzed the dynamic features of a large number of long DNA sequences, approximately ∼150 bp in length. These analyses revealed positive correlations between the nucleosome-forming abilities and the inter-strand fluctuation strength of double-stranded DNA for several DNA sequences.


Introduction
Elastic network models of proteins, including all-atom models [1,2] and coarse-grained models [3][4][5][6][7][8][9] represent some of the simplest and most powerful types of theoretical models that can accurately reveal structure-dynamics relationships and the mechanisms underlying a protein's functional activities [10][11][12][13][14]. Such models have also been widely employed to accurately reproduce the temperature factors on the crystal structure of a protein via normal-mode analysis. These models can thus demonstrate the large and slow deformations of proteins that are essential for protein functions, but remain difficult to demonstrate via all-atom molecular dynamics simulations.
Along with proteins, DNA is the most important biomolecule for the activities of all living organisms. Recent developments in molecular biology have revealed that DNA contains several functional regions. Therefore DNA is no longer considered to have the sole function in the storage of genetic information, but is now known to be actively involved in gene regulation [15][16][17][18], insulator activity [19][20][21][22][23][24], and in the construction of chromosomal architectures [25][26][27][28] such as heterochromatins and topologically associated domains through nucleosome formation and protein bindings [29][30][31][32][33]. The functional behavior of each strand of DNA is determined not only by chemical aspects of the nucleotides and base pairs but also by its physical characteristics such as the structure and dynamics of the nucleotide sequences in each functional region. However, comprehensive understanding of the physical properties of nucleotide sequences lags far behind the knowledge accumulated of their biochemical aspects [15,16,26].
Since the last century, much progress has been made in revealing the physical aspects of DNA using all-atom normal-mode analysis [34,35] and molecular dynamics simulations [36][37][38][39][40][41][42]. Several coarse-grained models of DNA (and RNA) have also been proposed. Some of these describe the detailed shape of each nucleotide using three or more particles [43][44][45][46][47][48][49], whereas others describe each base pair by simply one or two particles [50][51][52][53][54]. Molecular dynamics simulations and normal-mode analysis of these models have identified the flexibilities, nucleosome-forming abilities, and zip-unzip transitions of the double helices of some specific DNA sequences from tens to a few hundred base pairs in length. Although these methods have proven to be very powerful for the analysis of the physical aspects of DNA, molecular dynamics simulations are not suitable for conducting exhaustive analyses of several sequences simultaneously owing to the high computational costs of such extensive simulations; for example, analyses of whole genomic and whole possible sequences. Moreover, the normal-mode analysis of all-atomic models of long DNA sequences is also computationally heavy.
Alternatively, data-driven methods have been proposed for determining the mechanical properties of DNA with respect to the helical parameters and local flexibilities of base pairs from X-ray crystal structure analysis and all-atom molecular dynamics [55][56][57][58][59][60]. These methods also seem to be powerful and can be applied to the analysis of the mechanical properties of several DNA sequences simultaneously. However, the methods thus far proposed have focused on static and local mechanical properties. Therefore, they are not particularly useful for the study of the functional contribution of the dynamic and correlated motions of DNA.
The objective of this study was to construct a simple coarse-grained elastic network model of DNA to allow for exhaustive analysis of the dynamic correlated motions of several long DNA sequences. For this purpose, we first constructed a simple all-atom elastic network model of short DNA sequences based on the method introduced by Tirion [1] for the modeling of protein dynamics. We confirmed that this model could accurately reproduce the temperature factors of some DNA fragments from data obtained with X-ray crystal structure analysis.
Second, we developed a simple coarse-grained elastic network model of long DNA sequences, where each nucleotide is described by only one node. We confirmed that this simplified model has lower computational costs but can nonetheless reproduce the nucleotide sequence-dependent dynamics revealed by the all-atom elastic network model.
Finally, through the normal-mode analysis of this coarse-grained model, we conducted an exhaustive analysis of the general features of the sequence-dependent structure-dynamics relationships among several DNA sequences. We specifically focused on the dynamic properties of a large number of long DNA sequences, approximately *150 bp in length (with respect to the length of the nucleosome-forming regions), for the genomes of some model organisms as well as for random sequences with several A, T, C, G ratios. Through these analyses, we found that the dynamic aspects of DNA are highly influenced by their sequences, and found positive correlations between the nucleosome-forming abilities and inter-strand fluctuations of doublestranded DNA.
In particular, we focus on the geometry dependencies of the dynamics of several DNA sequences, since a recent study demonstrated that the geometry of DNA sequences has a dominant contribution to their mechanical features [47]. In the following arguments, for simplicity, we use "sequence-dependent" to mean "dynamics that depend on sequence geometry".

Basic structures of Elastic Network Models of DNA
In order to construct elastic network models of DNA, the basic DNA structure first needs to be determined. In the following arguments, we construct two types of models: i) a simple all-atom elastic network model (AAENM) to reproduce the characteristics of the crystal structures of DNA, and ii) a simple coarse-grain elastic network model (CGENM) to reproduce the characteristics of the AAENM. We obtained the basic DNA structures in the following two ways for the respective purposes of constructing each model.
For construction of the AAENM, we employed the atom coordinate sets of several naked DNA crystal structures included in the Protein Data Bank (PDB) as the given basic structures of the model. We used 9 DNA crystal structures containing only DNA and water molecules under different conditions of crystallization, where all temperature factors are given as positive values (Table 1). The suitability of the model was evaluated by comparisons of the temperature factors obtained between the model and those obtained from X-ray crystal structure analysis.
The objective of our constructed arguments was to unveil the sequence-dependent dynamic features of several long DNA sequences simultaneously. In recent crystal structure analysis, only shorter DNA sequences (i.e., less than 12 bp in length) were studied. On the other hand, several types of helical parameter sets, base pair parameters, and base step parameter sets have been proposed through experiments or molecular dynamics simulations [36,[61][62][63][64][65][66] (Table 2,  S1 Table). Here, we used the application X3DNA [67] to obtain the coordinates of each atom for any sequence from these helical parameters, and used these parameters to construct the AAENMs and CGENMs of longer DNA sequences (for detailed methods of the generation of the atom coordinates, see Lu and Olson [67]). In the following arguments, we compare the characteristics of the AAENM and CGENM constructed by X3DNA from three different helical parameter sets: i) a parameter set obtained by an in vitro experiment and X-ray crystal structure analysis [48,[61][62][63] (Table 2), ii) a parameter set obtained only by the X-ray crystal structure analysis, and iii) a parameter set obtained by all-atom molecular dynamics simulations [58,65,66] (S1 Table). Table 1. Information of the analyzed DNA segments. PDB ID, sequences of X-ray crystal structure analysis of DNA fragments, parameter sets of the AAENM for each DNA structure, and correlation coefficients of TF i and MATF i between the AAENM and X-ray crystal structure.

PDB ID
Sequence

All-atom Elastic Network Model of DNA
A simple all-atom elastic network model of double-stranded DNA was constructed based on the model proposed by Tirion [1]. In this model, we regard all of the atoms of given DNA sequences as the nodes comprising the elastic network. For the analysis of the crystal structures of DNA involving water molecules, we also regard the oxygen atoms of water around DNA as the nodes of the elastic network. We define the mass and position of atom i as m i and r i (r i ¼ ðx i ; y i ; z i Þ), respectively. The potential V of all atoms is given as Here, r 0 i is the position of atom i of the basic DNA structure, as described above. The first term indicates the interaction potential among atoms that are spatially closed in the basic DNA structure (Fig 1(a)). Here, R i refers to the Van der Waals radius of atom i, R c is an arbitrary cut-off parameter that models the decay of interactions with distance, and θ indicates the Heaviside function, where θ(z) = 1 (θ(z) = 0) for z ! 0 (z < 0). We assume R c ¼ 2:0Å, which is empirically considered to be an appropriate range for biomolecules, at least for proteins [1]. The results of the arguments presented in this paper were qualitatively unchanged for the appropriate range of R c . In the crystal structures of DNA, the motions of water molecules are also restricted by the crystal packing. Thus, for all atoms, we assume that spatially closed  pairs of atoms are connected by linear springs with their respective natural lengths. The elastic coefficient C a (½kJ=ðÅ 2 molÞ) is a phenomenological constant, which is assumed to be the same for all interacting pairs.
The second term indicates the boundary effects of each DNA and water molecule in each DNA crystal structure. This term plays a crucial role for the analysis of the fluctuations of the crystal structure of DNA, such as temperature factors, since the fluctuations of the nucleotides at the edges of DNA and water molecules are restricted due to the following facts.
In the crystal of DNA, the motion of edges at upper and lower streams of each DNA segment (left and right edges in Fig 1(a)) is influenced by atoms of other adjacent segments in the long-axis direction of DNA segments. Moreover, the motions of water molecules around each DNA segment are influenced by atoms of other adjacent water molecules or DNA segments in the direction perpendicular to the longitudinal axis of the DNA segment. Thus, we need to consider the second term of Eq (1) to implement such effects, where B j indicates the strength of such effects for atoms in the edge nucleotides of each DNA and water molecule, respectively.
Remarkably, as shown in the Results section, the distributions of the temperature factor of atoms exhibited various patterns from the same sequence of crystallized DNA (S1 and S2 Figs) owing to the dependency on the conditions of crystallization. Therefore, in order to compare the characteristics of the present DNA model to those of the crystal structure of DNA, appropriate values of B j need to be assigned to the atom j that belongs to the edge base pairs and water molecules. For simplicity, we assign B j = B B and B w to the atoms j at the edge base pairs and the water molecules, respectively, and B j = 0 otherwise.
It is noted that the internal structure of the crystal of DNA is spatially anisotropic. Thus, it is reasonable to assume that the interactions among local parts of the crystal of DNA exhibit different strengths in different directions. Accordingly, in general, the strengths of the restrictions of atoms belonging to the edge of DNA differ from those of water molecules. Thus, we assume that B B and B w are different values (Table 1).
For simplicity, the mass of each atom is assumed as a constant value (m i = m = 10 −3 /N A [kg], N A = 6.02214129 × 10 23 [/mol] is Avogadro's number). However, we confirmed that the results were almost identical when using the precise masses of the atoms.

Coarse-grained Elastic Network Model of DNA
A simple coarse-grained elastic network model of double-stranded DNA, where each nucleotide is described as one node, was constructed as follows (Fig 1(a)). We define the coordinate of the C1' carbon of nucleotide i, x i (x i ¼ ðx i ; y i ; z i Þ), as the position of nucleotide i, and regard the motion of the C1' carbon as that of the nucleotide. Here, we assume that the mass of the C1' carbon obeys m i = 10 −3 /N A [kg]. The potential V of all nucleotides is given as Here, x 0 i is the position of nucleotide i of the basic structure of DNA, as defined above. We assume that nucleotides i and i 0 belong to the same base pair. For nucleotide i, the sum is restricted to the pair of nucleotides in the same base pair (j = i 0 ), in the neighboring base pairs (j = i + 1, i 0 + 1, i − 1, i 0 − 1), in the next neighboring bases pairs (j = i + 2, i 0 + 2, i − 2, i 0 − 2), and in the next-next neighboring bases in another strand (j = i 0 + 3, i 0 − 3) (Fig 1(b)). The elastic coefficient C g (½kJ=ðÅ 2 molÞ) is a phenomenological constant that is assumed to be the same for all interacting pairs. This model is considered as a simplified version of previously proposed one-site-per-nucleotide models [50,53].

Normal-mode Analysis
An overview of the theory of normal-mode analysis is provided in several recent studies [1][2][3][4][5][6][7][10][11][12][13]. Thus, we here briefly show the results of this analysis. For this analysis, we define qðtÞ (q ¼ ðq 1 ; q 2 ; :::q N Þ, q i ¼ ðx i ; y i ; z i Þ) as a 3N-dimensional position vector, and q 0 as the position vector of the basic structure. Here, q ¼ ðr 1 ; r 2 ; :::r N Þ for the AAENM and q ¼ ðx 1 ; x 2 ; :::x N Þ for the CGENM. The motions of small deviations of qðtÞ from q 0 , dqðtÞ ¼ q À q 0 obey where −(ω k ) 2 and v k ¼ ðv k 1 ; v k 2 ; :: are the k-th largest eigenvalue and its eigenvector of the 3N × 3N Hessian matrix H as We assume that the system is in thermodynamic equilibrium with temperature T. Thus, the amplitude A k is given as with Boltzmann constant k B = 1.3806488 × 10 −23 [m 2 kg/s 2 K]. Using this solution, the mean square fluctuation of the i-th atom in the AAENM (dq i ¼ dr i ) is obtained as with Boltzmann constant, and the temperature factor is displayed as TF i ¼ 8 3 p 2 AF i . Here, <. . .> represents the temporal average. For the CGENM, we define the mean square fluctuation of the n-th nucleotide (dq n ¼ dx n ) as CF n ¼< jdx n j 2 >. To consider the motion of the n-th nucleotide in the AAENM, we define the average nucleotide motions as dR n ¼< dr j > nÀth nucleotide . Using this vector, we define the mean square fluctuation of the n-th nucleotide as NF i ¼< jdR n j 2 >. For the motif mo of the nth nucleotide (mo = sugar, base, and phosphoric acid), the average temperature factor of the motif (MATF (mo in n-th nucleotide) ) is defined as the average of TF j belonging to each motif of each nucleotide, < TF j > ðmo in nÀth nucleotideÞ ¼ 8 3 p 2 < AF j > ðmo in nÀth nucleotideÞ .

Treatment of the Temperature Factor in X-ray crystal Structure Analysis
In order to evaluate the validity of the AAENM, we measured the correlation coefficient between the profile of the temperature factor obtained from PDB data (via X-ray crystal structure analysis) and that obtained from the AAENM based on this crystal structure. It is noted that the temperature factor profiles for some of the PDB data often include unnaturally large or small values. Thus, the correlation coefficients were estimated using data excluding such outliers. In the present evaluations, the value g i was considered as an outlier if |g i − μ| > sσ, where μ and σ are the average and standard deviation of {g i }, respectively, and s = 2.5 is used based on the standard arguments of statistics.

Evaluations of Anisotropic Fluctuations of DNA
We also focused on the relationships between the fluctuations of each nucleotide in the AAENMs and CGENMs in the directions parallel to the base pair axis, parallel to the helix axis, and vertical to both the base pair and helix axes.
Here, we name the nucleotides in one and the other strand constructing the i-th base pair as the i-th and i 0 -th nucleotide. We define the position vectors of the C1' atoms belonging to the i-th and i 0 -th nucleotides as c i and c i 0 , and consider and ( Fig 1(c)). It is noted that b i and s i are not orthogonal in general; however, we confirmed that the angles of these vectors were always sufficiently close to π/2 rad for each i.
and CF s i 0 ¼< jdx i 0 s i j 2 >, and CF t i ¼< jdx i t i j 2 > and CF t i 0 ¼< jdx i 0 t i j 2 >, respectively. Moreover, we consider the mean square fluctuation of the relative base position of each base pair of the CGENM in the three directions listed above given by DF b i ¼< jðdx i À dx i 0 Þb i j 2 >, DF s i ¼< jðdx i À dx i 0 Þs i j 2 >, and DF t i ¼< jðdx i À dx i 0 Þt i j 2 >, respectively.

Evaluations of the Overall Geometry of DNA
The overall geometry of each modeled DNA molecule is characterized by the ratios among the square root of the three principal components of the populations of atoms, ffiffiffiffi ffi l 1 p , ffiffiffiffi ffi l 2 p , and ffiffiffiffi ffi l 3 p (λ 1 > λ 2 > λ 3 > 0). Here, λ 1 , λ 2 , and λ 3 are obtained as eigenvalues of the covariant matrix where (Δx i , Δy i , Δz i ) = (x i − x CM , y i − y CM , z i − z CM ), (x i , y i , z i ) is the position of i-th atom, (x CM , y CM , z CM ) is the position of the center of mass of a given DNA molecule, and <. . .> i indicates the average for all is. We evaluate the overall geometry of a given DNA molecule using the linearity s 1 ¼ ffiffiffiffi ffi l 1 p = ffiffiffiffi ffi l 2 p and the line symmetry with respect to the λ 1 -axis s 2 ¼ ffiffiffiffi ffi l 3 p = ffiffiffiffi ffi l 2 p . Here, σ 1 is large when the DNA is straightened, and σ 2 is large (small) when the DNA forms a three (two)-dimensional curve with wide (flat) envelope.

Evaluations of Correlations Among the Results of AAENM, CGENM, and Experiments
We employ Pearson's correlation coefficients, ρ, to evaluate the correlations among the profiles of temperature factors and several anisotropic fluctuations of atoms obtained by AAENM, CGENM, and experiments.

Comparisons of Fluctuations between the AAENM and the Crystal Structure of DNA
The fluctuations of atoms of the AAENMs of several short DNA sequences were measured with normal-mode analysis. Here, the basic structures of the DNAs are given by the crystal structures of the naked DNAs with the following PDB IDs: 1BNA, 7BNA, 9BNA, 1D91, 1DC0, 122D, 123D, 181D, and 330D (Table 1 [68][69][70][71][72][73][74][75]). To confirm the validities of the AAENMs, the correlations between the distribution profiles of the temperature factor of atoms (TF i ) and the average temperature factor of the motifs (MATF i ) in the crystal structures and those in the corresponding AAENMs were measured. In the following arguments, we employ the optimal values of C a , B B , and B w for each crystal structure (Table 1), which were manually found to maximize ρ of TF i between the results of the crystal structure analysis and those of the AAENM.
By choosing the appropriate conditions for the atoms of the two edge base pairs and water molecules (B B and B W ) for each PDB ID, TF i of each AAENM exhibited a similar profile to that of the crystal structure with a significant correlation coefficient ρ (Fig 2(a), Table 1, and S1 Fig). Therefore, the AAENM seemed to reproduce the overall structure of the temperature factor profile of each crystal structure well, although the detailed profiles among atoms showed some deviations.
Furthermore, we focused on the average temperature factor for the motifs, MATF i , as recently discussed [35]. We found that the MATF i obtained with the AAENMs could accurately reproduce those of the corresponding crystal structures with high correlation coefficients ρ, where ρ > *0.7 was obtained for most cases (Fig 2(b), Table 1, and S2 Fig). These results demonstrate that the AAENM is a suitable model for describing the sequence-dependent fluctuations of the nucleotide motifs of several double-stranded DNA sequences, despite the simplicity of model construction and its easy implementation. Moreover, these results show that the sequence-dependent forms of DNA have a dominant contribution to their overall flexibilities and fluctuations.
Nevertheless, the present AAENM could not accurately reproduce MATF i well for some crystal structures. These deviations are considered to have arisen from the following primary assumption: we only considered the effects of the restrictions by the packing of DNAs in crystal form for two edge base pairs and water, whereas such effects have an influence on all atoms. Therefore, obtaining and incorporating more detailed knowledge of the restrictions of bulk sequences caused by crystal packing should help to achieve a more accurate reproduction of the molecular fluctuations for all cases. Simple Elastic Network Models of Long Double-Stranded DNA Dynamics with Sequence Geometry Dependence

Comparison between the AAENMs and CGENMs of DNA
The main objective of this study was to unveil the sequence-dependent dynamic correlated motions of several long DNA sequences. We next describe these dynamics of DNA sequences with longer lengths than considered in the previous subsection. For this purpose, we also constructed a coarse-grained model, which is often useful for focusing on the slow and large-scale dynamics of molecules that essentially influence their function. Thus, to propose a coarsegrained model of long double-stranded DNA, we evaluated whether the CGENM proposed provides an appropriate coarse-grained model of the present AAENM. We performed normal-mode analysis of the AAENM and corresponding CGENM for 500 randomly chosen 50-bp sequences, and compared the mean square fluctuations of the i-th nucleotide (NF i and CF i ) in the directions parallel to the base pair axis (NF b i and CF b i ), parallel to the helix axis (NF s i and CF s i ), and in the torsional direction (NF t i and CF t i ). Here, C a ¼ 1:29kJ=ðÅ 2 molÞ is employed, as in the previous study [1], and C g ¼ 7:7kJ=ðÅ 2 molÞ is assumed, which was manually found to provide the best fit of fluctuation profiles between AAENM and CGENM. Here, the overall fluctuation profiles of CGENM are independent of the value of C g since C g influences only on the absolute values of fluctuations. Independent of the sequences and employed helical parameters, we found that the fluctuations of each nucleotide in several directions were very similar between the AAENMs and CGENMs when these models are constructed with the same helical parameters, with average correlation coefficients ρ > 0.98 (Fig 3,  Table 3, S2 Table, S3 and S4 Figs).
It is noted that the present CGENM contains only one node per nucleotide, whereas the AAENM contains 19 * 22 nodes (atoms) per nucleotide. This fact demonstrates that the computational costs of the CGENMs are much lower than those of the AAENMs, although the accuracies of the obtained statistical aspects are almost identical between these two models. Thus, this CGENM could be used for exhaustive analysis and comparisons of the dynamic features of several sequence-dependent DNAs related to protein binding affinities, functions of transcription regulation sequences, and nucleosome positioning [16-26, 46, 55-60]. In the next subsection, we provide an example of such an analysis to determine the relationships between the nucleosome-forming abilities of several double-stranded DNA sequences and their inter-strand dynamic features.

Exhaustive Analysis of the Sequence-dependent Behaviors of 150-bp DNAs with the CGENM
Nucleosome positioning is important not only for compacting DNA but also for appropriate gene regulation. Several recent studies have been performed for genome-wide nucleosome mapping and the identifications and predictions of nucleosome-forming and -inhibiting sequences for some model organisms [57,59,[76][77][78][79][80].
As an example of the applications of the CGENM to an exhaustive analysis of the sequencedependent behavior of DNA, we compared the dynamic features of DNA sequences of *150 bp that were predicted as nucleosome-forming or nucleosome-inhibiting sequences in the genome of budding yeast Saccharomyces cerevisiae (5000 forming sequences and 5000 inhibiting sequences of 150 bp) [57], nematode Caenorhabditis elegans (2567 forming sequences and 2608 inhibiting sequences of 147 bp), Drosophila melanogaster (2900 forming sequences and 2850 inhibiting sequences of 147 bp), and Homo sapiens (2273 forming sequences and 2300 inhibiting sequences of 147 bp) [59]. The histograms of the average relative fluctuations of DNAs for the three directions . .> i indicates the average for all is.) for the nucleosome-forming  Table 3. Comparisons between the AAENM and CGENM. Average and standard deviation of the correlation coefficients of 500 random samples of 50-bp sequences between NF i and CF i , NF b i and CF b i , NF s i and CF s i , and NF t i and CF t i . Helical parameter set (i) ( Simple Elastic Network Models of Long Double-Stranded DNA Dynamics with Sequence Geometry Dependence sequences and the nucleosome-inhibiting sequences are shown in Fig 4 and S5 Fig * S7 Fig. Here, we employed the helical parameter set (i) ( Table 2) that was used in the coarse-grained molecular dynamics simulations by Freeman et al., which exhibited consistent results to some experiments [48,[61][62][63].
The histograms for budding yeast showed that that nucleosome-forming sequences tend to exhibit larger fluctuations in several directions compared to the inhibiting sequences. In particular, the histogram of < DF b i > i for the nucleosome-forming sequences showed a clear shift in the direction toward larger values compared to that for the nucleosome-inhibiting sequences (Fig 4). For the other organisms, most of the histograms showed few differences between the nucleosome-forming and -inhibiting sequences. However, similar to the case of yeast, the distribution of < DF b i > i for the nucleosome-forming sequences shifted largely in the direction of larger values compared to that for the nucleosome-inhibiting sequences in these organisms (S5 Fig * S7 Fig). These results indicate that the nucleosome-forming ability is highly correlated to the fluctuations of the inter-strand distances of DNAs, in which sequences with larger fluctuations tend to form the nucleosome.
Similar to the previous arguments, we compared the dynamical features among the random 150-bp DNA sequences varying in average GC-contents; the histograms, averages, and stan- and < DF t i > i were measured from 10,000 random sequences for each GC content (Fig 5 and S8 Fig). In this case, < CF i > i exhibited a minimum at a GC content of *0.2, which indicates that the AT-rich sequences tend to be more rigid than the GC-rich sequences. However, the fluctuations of sequences consisting of only A or T were as large as those of the GC-rich sequences. Moreover, the GC content dependencies of < DF i > i , < DF b i > i , < DF s i > i , and < DF t i > i showed different characteristics for GC contents larger or smaller than 0.6 * 0.7. In particular, the results for < DF b i > i were similar for cases with a large GC content (0.7 * 1) but monotonically decreased with a GC content with little variance. A recent experimental study showed that the probability of nucleosome formation tends to increase with increases in the GC content ratio [81]. Thus, the present results indicate that sequences with larger fluctuations of inter-strand distances tend to form the nucleosome, which is consistent with the results described above from the analysis of the genomes of the four model organisms.
Finally, we focus on the relationships between the overall geometries and fluctuations of the considered DNA sequences. The overall geometries of several DNA sequences, the nucleosome-forming and -inhibiting DNA sequences for four model organisms and random DNA sequences with different GC contents, were evaluated using scatter plots of σ 1 (linearity) and σ 2 (line symmetry) (Fig 6 and S9 Fig). It is noted that σ 1 and σ 2 are highly correlated. The average, standard deviation, and distributions of σ 1 and σ 2 exhibited slight but not significant deviations between the nucleosome-forming and -inhibiting sequences.
For the random sequences, the average value of σ 1 exhibited similar variations to < CF i > i with an increase in GC content. In particular, both values decreased with increases in GC content for GC contents 0.2, whereas they increased with increases in GC content for GC contents !0.3 (Figs 5(c) and 6(c)). In fact, σ 1 and < CF i > were highly correlated for random DNA sequences, regardless of their CG content. The Pearson correlation coefficient for the 110,000 sequences analyzed above with GC contents = 0.0 * 1.0 was 0.9433. It is noted that σ 1 showed large dispersion for each GC content, and there were significant overlaps among σ 1 distributions with different GC contents (Fig 6(b) and 6(c)). This fact indicates that different DNA sequences can often show similar geometries, and such sequences also tend to show similar overall fluctuations. On the other hand, the fluctuations of inter-strand distances < DF b i > that may correlate to the nucleosome-forming ability did not correlate significantly to either σ 1     or σ 2 , with Pearson correlation coefficients of 0.2250 and 0.1967. This fact indicates that the nucleosome-forming ability of DNA sequences are not only determined by the overall DNA geometries but also by their dynamic properties.

Summary and Conclusion
In this study, simple elastic network models of double-stranded DNAs were developed in order to perform an exhaustive analysis of several sequence-dependent dynamical features. First, we constructed a simple all-atom elastic network model that could reproduce the fluctuations of the motifs of each nucleotide (sugar, phosphoric acid, and bases) of several crystal structures of short DNA sequences. Second, we proposed a simple coarse-grained elastic network model that could reproduce the dynamic features of the long DNA sequences obtained by the allatom elastic network model. Through exhaustive analysis of the dynamic features of several DNA sequences with normal-mode analysis of the presented coarse-grained elastic network model, we found that the dynamic aspects of DNA are highly influenced by the properties of nucleotide sequence such as GC content. We also found that the nucleosome-forming abilities of double-stranded DNA exhibited positive correlations with their sequence-dependent interstrand fluctuations.
In the present study, we demonstrated the sequence-dependent dynamic features for several *150-bp DNA sequences to evaluate the relationships between the nucleosome-forming abilities and DNA dynamics. Of course, DNA sequences longer than *150 bp can also be analyzed using the presented coarse-grained model. Moreover, coarse-grained molecular dynamics simulations can be performed to consider the large deformations of DNA, such as formation of a super helix and nucleosome that are the basic structures of higher-order chromosome architectures, using the presented elastic network models with the excluded effect of the volume of each atom or each nucleotide. We are currently conducting these molecular dynamics simulations, and the results will be reported in the future. We did not consider the effects of solvents such as temperature and salt concentrations in the present elastic network models. Therefore, we are also planning to attempt modifications of the models so that several solvent conditions can be incorporated in future work.
Supporting Information S1 Table. Helical parameter sets. (a) Helical parameter sets (ii) obtained by X-ray crystal structure analysis, [61,62], and (b) helical parameter sets (iii) obtained by all-atom molecular dynamics simulations. [58,65,66]. (EPS) S2 Table. Comparisons between CGENM and AAENM. Average and standard deviation of the correlation coefficients of 500 samples of random 50-bp sequences between NF i and CF i , NF b i and CF b i , NF s i , and CF s i , and NF t i and CF t i . Helical parameter sets (ii) and (iii) (S1 Table) were used.  Table 1.