DNA Dynamics Is Likely to Be a Factor in the Genomic Nucleotide Repeats Expansions Related to Diseases

Trinucleotide repeats sequences (TRS) represent a common type of genomic DNA motif whose expansion is associated with a large number of human diseases. The driving molecular mechanisms of the TRS ongoing dynamic expansion across generations and within tissues and its influence on genomic DNA functions are not well understood. Here we report results for a novel and notable collective breathing behavior of genomic DNA of tandem TRS, leading to propensity for large local DNA transient openings at physiological temperature. Our Langevin molecular dynamics (LMD) and Markov Chain Monte Carlo (MCMC) simulations demonstrate that the patterns of openings of various TRSs depend specifically on their length. The collective propensity for DNA strand separation of repeated sequences serves as a precursor for outsized intermediate bubble states independently of the G/C-content. We report that repeats have the potential to interfere with the binding of transcription factors to their consensus sequence by altered DNA breathing dynamics in proximity of the binding sites. These observations might influence ongoing attempts to use LMD and MCMC simulations for TRS–related modeling of genomic DNA functionality in elucidating the common denominators of the dynamic TRS expansion mutation with potential therapeutic applications.


Introduction
Repetitive DNA sequence elements are widely abundant in the human and the other eukaryotic genomes. They are classified into two large families, the ''tandem'' and ''dispersed'' repeats. The trinucleotide repeats sequences (TRS) represent the most common type of tandem microsatellites in the vertebrate genomic DNA. Such genomic elements were found in the coding and the noncoding DNA co-localizing with human chromosomal fragile sites that are associated with genomic breakpoints in cancer and a growing number of devastating human diseases [1,2,3,4,5]. TRS disorders typically have large and variable repeat expansions [6] that result in multiple tissue dysfunction or degeneration. The neurological disorder Friedreich's ataxia (FRDA) co insides with expansion of a genetically unstable (GAA? TTC) N tract in the first intron of the frataxin gene [7,8,9] resulting in the transcriptional inhibition of the gene. The (CTG.CAG) N repeats in the Huntington's disorder (HD) is one of the most highly variable TRS in the human population [10,11]. In the fragile X syndrome (FXS) the (CGG.GCC) expansion in the 59 untranslated region of the FMR1 gene causes the transcriptional silencing of the gene [12].
The expression of fragility was found to be dependent upon the TRS expansion beyond a threshold of copies in tandem. DNA replication, transcription and DNA repair are important cis-acting factors in the process of TRS amplification [13,14,15,16]. The exact mechanisms that drive expansion and the TRS specific expansion effect on genomic DNA functions are presently not well understood.
It is commonly accepted that the TRS amplification cause formation of non B-DNA structures that could disrupt normal cellular processes [17,18]. The formation of such structures starts with transient DNA openings, i.e. local DNA melting and bubbles [19] that extend from a few to a hundreds of DNA base pairs. Experimental results with A/T-reach repeats reveals that their expansion is usually initiated with transient local DNA melting (bubble formation) that could next extend into static loops or non-B-DNA structures [16,17,18]. Our recent sequence specific breathing DNA dynamics observations suggest that transient DNA bubbles form not only in A/T-reach sequences but also in sequences with relatively high G/C-content caused by the softness of the base pair stacking [20][21][22][23]. Therefore, transient DNA bubbles is expected to form in the G/C-reach (CTG.CAG)n and (CGG.CCG)n TRSs as well as in the (GAA.TTC)n sequences with high A/T-content. It is likely that the local base pair dynamics may display some sequence and number of repeats specificity that could underline the propensity for expansion and possibly alteration in genomic DNA functions. Local bubble formations that extends from a few to several base pairs could shift from stable to more unstable structures that interact with nuclear components promoting further TRS expansion.
Using the concept of ''intermediate bubble states'' and our recently established criterion for DNA base pair ''thickness'' through the base pairs average displacement (BAD) characteristic [22], we compare the breathing dynamics of TRS against random sequences with identical nucleotide composition as well as repeats with different lengths and G/C content. We report results for a notable coherent dynamical behavior of the TRS, leading to an enhanced tendency for forming large and stable local DNAopening modes at physiological temperatures. The synchronized behavior of the average displacements from the equilibrium positions of the base-pairs in TRS is suggestive of a possible advance of extended intermediate states that are known to be strong precursors for transient bubble formation. Our LMD and MCMC simulations of TRS with different G/C content and number of repeats demonstrate appearance of large transient bubbles that depend on the TRS length. We provide an experimental example of how the TRS bubble spectrum could interfere with protein-DNA interactions. We specifically demonstrate that the flanking TRS has a profound effect on the spectrum of the TATA-box DNA dynamic activity that could explain the lost TFIID-TATA binding. We propose that presence of repeats in the noncoding genomic DNA could nucleate the formation of bubbles that directly interfere with specific gene expression by altering protein-DNA binding [12]. Our findings could shed some light and facilitate functional predictions of the effect of TRS expansions on gene expression.

Expansion of repeats leads to well-pronounced synchronized DNA breathing behavior
To elucidate the effect of repeats' amplification on the DNA breathing behavior we compared the local base pair breathing of the frataxin (GAA.TTC) N TRS with varying repeats number [24]. As a measurement we use the BAD criterion, which represents the thermal base pair stability or ''softness'' due to base pair breathing at physiological temperature and salt content. BADs are directly connected with the DNA melting simulation procedure. In general, DNA denaturation is a 'close-to-open' state transition of the double helix. This transition can be visualized by considering the fraction of intact hydrogen bonds between complimentary nucleotides as a function of the temperature. It is well known that when DNA is melting, i.e. opening, the transition is initiated at lower temperatures first in the ''soft'', i.e. A/T-rich, DNA regions, where the stabilizing hydrogen bonds are only two per base pair. As the temperature increases, the mixed A/T-G/C regions also begin to melt, and finally the G/C-rich regions undergo denaturation. The local openings of the DNA double helix can be measured through the UV absorption as the UV absorption depends on the averaged fraction of the open base pairs in DNA. DNA melting can be simulated by calculation of the average fraction of open DNA base pairs at the given temperature, i.e. by calculating the average DNA base pair displacements, that is the BADs. Here, the BAD profiles [20,21] are calculated from MCMC simulations based on the EPBD model of DNA dynamics [21,25].
The BAD base pair values of (GAA.TTC) N with three different numbers of repeats (N = 6, 40, and 120) are shown in Figure 1 Moreover, Figure 1b compares the BAD profiles of the repeat sequence (GAA.TTC)41 with a profile arising from a random sequence with the same length and G+A content. While the magnitudes of the BADs are comparable there is a clear lack of coherency in the case of the random sequence. This effect is not specific to the shown sequence but occurs for any non-periodic sequence. This observation of TRS length-dependent synchronized BADs behavior is novel, and the resulting collective behavior of the tandem repeats may serve as a precursor for outsized intermediate bubble states.

Longer repeats exhibit well pronounced intermediate bubble states
The synchronized BAD behavior in TRS sequences suggests for possible formation of extended intermediate bubble states at elevated temperatures that are known to be strong precursors for transient bubble formation [20,21]. The fraction of the open base pairs at higher temperature that is a basic characteristic for the intermediate bubbles states, could differ based on the TRSs sequence and the number of repeats. In Figure 2, panels a, left we show results from our MCMC simulations together with experimentally derived, normalized UV-absorption melting curve for (GAA.TTC) 41 repeats. The experimental melting conditions are described in the Materials and Methods section. The results for the (GAA.TTC) 41 repeats (panel a) demonstrate an excellent agreement between our simulations and the experimental data.
The EPBD model, which accurately determines the DNA melting behavior [21], could also be applied to derive the  [25].
We initially compared the s values derived by our EPBD based MCMC simulations for sequences with different numbers of (GAA.TTC) N (N = 6, 40, 120) repeats ( Figure 2, panel a, right). The larger and more pronounced peaks for longer repeat segments are clear indications that the dsDNA repeat tracts sustain larger intermediate bubbles. A shifting s max values tendency for longer repeat tracks toward lower temperatures is clearly visible for these TRSs, which correspond to a lower melting temperature. Such tendency is not observed for the G/C-reach (CAG.GTC) N (panel b) and (CGG.CCG) N (panel c). For these sequences the s max is notably shifted toward higher temperatures for the longer repeats as compared to the shorter TRSs. The shift to higher melting temperatures correlates with the increased TRS G/C content. Importantly, the s max value (i.e. the fraction of the open base pairs forming the maximal intermediate bubble) increases with the number of repeats, in the same fashion as the dynamical instability mutations, i.e. accelerating with longer repeats tracts and for generally 'softer' repeat sequences.
Interestingly, this acceleration does not exclusively depend on the AT-content, i.e. on the hydrogen bonds-governing ''softness'' of the DNA sequence. The reason of this behavior is rather in the collective breathing behavior of DNA repeats and the ''stacking softness'' [21], which triggers their simultaneous opening, although at elevated temperatures for highly GC-rich repeats. The collective breathing behavior of the repeats causes the simultaneous strand separation independently of the high C/G content. To present this more distinctively we plot, in Figure 2, [21,25] i.e. the lifetime of the local bubbles. We applied our LMD simulations [20] to derive this effect and compare it for TRSs with different A/T, G/C content, and number of repeats.
We conducted EPBD-based LMD simulations (Figure 3) on the following TRSs: (CAG.CTG) 10   (GAA.TTC) 6 TRSs. This tendency is present in the (CAG.CTG) 45 , and (CGG.CCG) 240 TRSs as well. Although both, the long and the shorter TRSs have identical flanking sequences such kind of longlived large openings lack in the short repeat tracts.
The above data indicates that the repeat expansion coincides with significant changes in the local DNA breathing dynamics. The appearance of specific features of the bubble spectrum, viz. long lived large bubbles is profoundly influenced by the size of the repeated sequence.

Repeats interfere with the function of transcription factors binding sites by altered local DNA breathing dynamics
The observed activities are striking manifestation of how accumulation of repeats could have a profound effect on the local DNA breathing dynamics. The transient collective openings in the double helix due to TRSs in the noncoding genomic DNA may seed the formation of bubbles that could interfere with DNAprotein interactions involved in a gene specific transcriptional regulation [12]. This notion is supported by the experimental observation that the presence of a certain number of repeats could promote cellular protein-DNA binding [26].
We used a TATA box gene promoter sequence (SCP1) [27] as a test case for the general transcription factor complex TFIID-TATA box DNA binding in the presence of (GAA.TTC) 15 TRS immediately downstream of the wild type intact TATA box sequence (Figure 4). To probe for TFIID binding to such TATA-TRS flanking variant sequence (RtSCP1), we performed gel shift experiments with a purified from HeLa cells TFIID protein complex (panel b) [28]. As a positive control, we conducted gel shift reactions with the wilt type promoter fragment (Wt SCP1). Reactions were assembled with equal protein amounts of TFIID. The results clearly suggest that the RtSCP1 binds TFIID less tightly compared to WtSCP1 and 100 ng and 50 ng of protein shifts significantly less WtSCP1 (compare lanes 1, 2, 3 with lanes 6,7,8). This result demonstrates that the presence of (GAA.TTC) 15 repeats leads to inhibition of TATA box-TFIID binding although the TATA sequence is entirely preserved. The LMD simulations reveal that the bubble spectrum of the wild type promoter is significantly altered when the TATA-box flanking sequence is replaced by (GAA.TTC) 15 (panel b) although the replacement does not disturb any of the TFIID-TATA DNA point of contacts [29].
The LMD simulation predicts that the flanking TRS has a profound effect on the spectrum of the local TATA-box dynamic activity that significantly differs from the TATA spectrum in the wild type flanking sequence environment. Importantly, this prediction also coincides with the absence of TFIID binding to the TATA-TRS flanking oligonucleotide. Although the repeats do not disturb any of the TFIID points of contact [29] the altered local TATA box dynamics could explain the loss of TFIID binding.

Discussion
We report a novel coherent DNA breathing behavior in TRSs that is readily calculated using the EPBD derived values of the base pairs average displacements. We describe a synchronized BADs behavior that clearly depends on the length of the TRSs. The expansion of repeats results in a measurable collective TRS specific breathing dynamics. The collective behavior leads to the appearance of significantly enhanced DNA intermediate bubble states when compared to sequences with a random nucleotide composition or with much shorter repeat tracts. We propose that the collective propensity of TRSs breathing could serve as a precursor for overextended intermediate bubble length and lifetimes. Similar behaviors have been previously reported for A/Trich repeats sequences, but not in G/C reach TRSs [30].
The correlation between repeats expansion and DNA ''stacking softness'' is quantified by the calculated value of the intermediate bubble state parameter s. The value of this parameter correlates to the experimentally determined DNA melting values and size of the intermediate bubbles [22] that are directly related to the DNA breathing dynamics. Our observation is that the s max increases with the number of repeats and independently of the A/T content of the TRS. The effect corresponds to the collective BADs behavior and it is likely to be caused by the TRS periodicity. Such striking result connects the average TRSs behavior, BADs, and maximal intermediate bubble states independently of the A/T- content. It is likely that the TRS expansion in the disease-related sequences could lead to enhanced coherent DNA openings i.e. enhanced local strand separations when compared to the ''healthy'' sequences with a low number of repeats. This could explain at least in part, the previously described tendency of sequences with a larger number of repeats to form uncommon non-B DNA structure conformations [15].
The DNA bubble spectrum, calculated by LMD simulations, also reveals TRS length-related profile of transient bubbles appearance. Based on findings by other groups and the reported here protein-DNA binding results one could expect that the amplification of repeats might nucleate transient bubbles that selectively alter binding of proteins involved in repeats expansion while preventing binding of expansion inhibitors [31][32][33]. Furthermore, TRSs expansion and bubble nucleation in the noncoding genomic DNA might alter binding of transcription factors [28] resulting in alterations of specific gene expression [34,35]. Our TFIID-TATA box binding data together with the recently published observation by Kunicki group [26] directly support such notion.
The correlation between the transient bubble spectrum and repeats expansion in the individual genomes and gene regulatory sequences could be considered as a local DNA dynamics ''epigenetic'' determinant. The proposed novel dynamic-related role of repeat expansion in the genomic DNA functionality has far reaching implications for interpretation of genomic data in health and disease.

Computer simulations
The EPBD model is an extension of the classical Peyrard-Bishop-Dauxois nonlinear model [36] that includes inhomogeneous stacking potential [22]. The LMD and MCMC computer simulations are based on the EPBD model [22] as previously described [20,21]. It is important to note that both simulation methods are used to generate equilibrium quantities. The LMD generates a number of trajectories the average over which allows the determination of temporal information such as averaged bubble duration etc. The MCMC method does not offer access to temporal information but is computationally much faster.

Base pair Average displacement (BAD)
BAD is a new criterion that has been previously introduced to describe the local base pair breathing dynamics [20,21]. It represents an average characteristic of DNA breathing, viz. BADs are the average displacement of the nucleotides from their equilibrium positions. BADs are calculated with the MCMC techniques, and the results are equivalent to those derived from LMD simulations [20].

Gel shift reactions
Gel shift reactions are assembled with 20 ng of purified TFIID complex from HeLa cells as previously described [28]. The SCP1 promoter fragment sequence is as in [27]. The sequences of the oligos that have been used in the reactions are: Wt SCP1-CGCCCTTATATAAGTACTC TAGAGGATCCC CGGGT ACC GAGCTCGAATTCA CTGGCCGTCGGCG; RtSCP1-CGCCCTTATATAAGTA (GAA) 15 GCG.

DNA melting curve
All DNA oligos were synthesized and gel-and HPLC-purified at the Midland Certified DNA Synthesis Facility, and further characterized for melting behavior as previously described [20]. The DNA was dissolved to 200 mM in 30 mM K phosphate buffer pH 7.5, 100 mM KCl, 1 mM EDTA. dsDNA melting curves were collected for 20uC-105uC at 250-280 nm on a Varian Cary 50 Bio UV/Vis spectrometer equipped with a Peltier probe.
Melting data were collected from five independent experiments. The DNA oligonucleotide sequence using in the melting experiments is: CGCG (GAA.CTT) 41 CGCG.