Twisting Right to Left: A…A Mismatch in a CAG Trinucleotide Repeat Overexpansion Provokes Left-Handed Z-DNA Conformation

Conformational polymorphism of DNA is a major causative factor behind several incurable trinucleotide repeat expansion disorders that arise from overexpansion of trinucleotide repeats located in coding/non-coding regions of specific genes. Hairpin DNA structures that are formed due to overexpansion of CAG repeat lead to Huntington’s disorder and spinocerebellar ataxias. Nonetheless, DNA hairpin stem structure that generally embraces B-form with canonical base pairs is poorly understood in the context of periodic noncanonical A…A mismatch as found in CAG repeat overexpansion. Molecular dynamics simulations on DNA hairpin stems containing A…A mismatches in a CAG repeat overexpansion show that A…A dictates local Z-form irrespective of starting glycosyl conformation, in sharp contrast to canonical DNA duplex. Transition from B-to-Z is due to the mechanistic effect that originates from its pronounced nonisostericity with flanking canonical base pairs facilitated by base extrusion, backbone and/or base flipping. Based on these structural insights we envisage that such an unusual DNA structure of the CAG hairpin stem may have a role in disease pathogenesis. As this is the first study that delineates the influence of a single A…A mismatch in reversing DNA helicity, it would further have an impact on understanding DNA mismatch repair.


Introduction
Apart from the 'canonical' B-DNA conformation, DNA can also adopt a variety of 'non-canonical' conformations such as hairpin, triplex and tetraplex depending on the sequence and environment. It is well known that formation of such unusual non-B-DNA structures during the overexpansion of trinucleotide microsatellites (tandem repeats of 1-3 nucleotide length) is responsible for at least 22 incurable trinucleotide repeat expansion disorders (TREDs) that are mainly neurological or neuromuscular in nature [1,2,3,4,5]. For instance, occurrence of hairpin structure due to the abnormal increase in the CTG repeat length in the untranslated region of DMPK gene causes myotonic dystrophy type-1 [6,7]. Likewise, hairpin formation in CAG repeat expansion located in the protein-coding region leads to Huntington's disorder & several spinocerebellar ataxias [7]. Direct evidence for the role of such hairpin structure in instigating replication-dependent instability has been demonstrated for the first time in human cells with 5'CTG.5'CAG microsatellite overexpanion [8]. Recently, it has been shown that CAG repeat overexpansion in DNA leads to toxicity by triggering cell death [9,10] and thus, warranting a detailed investigation on the hairpin structures formed under such abnormal expansion.
Although diverse mechanisms at DNA, RNA and protein levels have been identified for the progression of TREDs [11], until now, the main focus as potential therapeutic targets has been on RNA and protein levels. In fact, crystal structures of RNA duplex (hairpin stems) containing CUG [12] and CAG [13] repeats that form noncanonical U. . .U [12] & A. . .A [13] base-pairs offers useful information as the pathogenic CUG and CAG RNA hairpins have a role in misregulating the alternative splicing by MBNL1 [14], leading to neurotoxicity. Though the isosequential DNA also intends to form hairpin structure [15], detailed structural insights about DNA duplex with CAG and CTG repeats that form A. . .A and T. . .T mismatches respectively are still inaccessible. With emerging evidence on 'DNA toxicity' of CAG repeat overexpansion [9,10], such structural information would facilitate the understanding of underlying mechanisms behind repeat instability at DNA level which is yet another potential drug target. In this context, we aim here to investigate the structure and dynamics of DNA duplex containing CAG repeat using molecular dynamics (MD) simulation technique. Surprisingly, results of the MD simulations indicate that A. . .A mismatch in a CAG repeat overexpansion induces periodic B-Z junction irrespective of the starting conformation. Thus, we suggest that such an unusual DNA structure of CAG hairpin stem may affect the biological function and may be one of the factors responsible for 'DNA toxicity' [9,10]. A. . .A disfavors anti. . .anti glycosyl conformation Root mean square deviation (RMSD) calculated over 300ns simulation indicates the existence of three different ensembles (Fig. 1B): the first ensemble persists till~16.5ns with RMSD centered around 2.8(0.7)Å, the second one persists between 16.5-181ns with a RMSD of 4.7(0.7)Å and the third one persists beyond~181ns with the highest RMSD of 6.2(0.8)Å.
Intriguingly, a high RMSD of 4.5(0.6)Å observed between 16.5-100ns is associated with a change in glycosyl conformation of mismatched A 23 and A 8 from the starting anti conformation to -syn conformation.
Strikingly, the effect of left-handed Z-DNA conformation observed between 16.5-100ns is also reflected in the helical twist angle of C 7 A 8 .A 23 G 24 step which favor low (negative) twist of −4°(7) (Fig. 1G) flanked by high (positive) twists at the neighboring G 6 C 7 (32 (4°)) & A 8 G 9 (31(6°)) steps (S1 Fig). These, together with the conformational changes at A 23 . . .A 8 mismatch reflect in the helicity of the duplex, which can be clearly seen from the superposition of average structures calculated over 1-100ps and 14.9-15ns (Fig. 1H). While the former is in B-form conformation, the latter shows a change in helicity leading to local Z-DNA formation. Occurrence of a low negative twist due to local Z-DNA formation in the midst of high twists at G 6 C 7 A 8 G 9 stretch leads to local unwinding of the helix as can be seen Fig. 1I. As A 23 . . .A 8 mismatch site is located exactly in the middle of DNA (Fig. 1A), aforementioned distortions lead to Z-DNA sandwich, viz., a mini Z-DNA is embedded in a B-DNA. Essentially, similar features are observed in B-Z junction formed by L-deoxy guanine and L-deoxy cytosine (S1 Fig).
As the Z-DNA formation happens due to the sugar-phosphate flipping, hydrogen bond between A 8 &A 23 undergoes minor changes (S2 Fig). During the first~16.5ns, N1(A 8 ). . .N6(A 23 ) hydrogen bond persists, whereas, between 16.5-100ns, N1(A 23 ). . .N6(A 8 ) hydrogen bond is predominantly favored due to the slight movement of A 23 towards the minor groove. Base extrusion at the mismatch site is also observed during 100ns simulation.
Detailed analysis indicates that the increase in RMSD to 5.5Å is due to the conformational preference for local Z-DNA structure at & around the A 8 . . .A 23 mismatch site to accommodate the mismatch. In fact, an increase in Z-DNA stretch around the mismatch site is seen (Fig. 3B) during the 300ns simulation. One of the marked changes associated with Z-DNA conformational preference is A 8 adopting high-anti/-syn (287 (17°)) glycosyl conformation beyond 36ns (Fig. 3C). Conformational changes at A 8 beyond 36ns enforce -syn glycosyl conformation for neighboring G 9 (248 (26°) to 321(32°)) and G 24 (248(25°) to 324 (15°)) (S3A Fig). Other notable changes that happen during the early part of the simulation (~9ns) in seeding Z-DNA conformation are, the preference for -syn glycosyl conformation by G 21 (from 249(24°) to 296 (25°)) (hydrogen bonded with C 10 ) and A 11 (from 257(23°) to 302(32°)) (base paired with T 20 ) that are located in the neighborhood of A 8 . . . Yet another interesting observation is the preference for stacked conformation between the mismatched A 8 &A 23 bases (Fig. 4) that is facilitated by the Z-DNA conformation. As a result, It happens in such a way that 133ns the hydrogen bond becomes longish, followed by A 8 and A 23 moving out-of-plane with each other. Subsequently, A 8 stacks on top of A 23 like an intercalator and stays till the end of the simulation (Fig. 4). During the aforementioned conformational changes, the canonical C 7 . . .G 24 and G 9 . . .C 22 that is located above and below the A 8 . . .A 23 mismatch respectively remain intact (S3B Fig).
It is clear from above that like in the previous situation ( Fig. 1), formation of local Z-DNA conformation is propagated to the neighboring bases (from C 7 to G 12 ) of A 8 . .  the 100ns simulation. It further shrinks to 8 Å, followed by the stacked conformation of A 8 &A 23 .
Thus, formation of a local Z-DNA conformation accompanied by unwinding of the helix is evident even with a single A. . .A mismatch irrespective of the starting conformation.

duplex
To investigate the effect of periodic occurrence of A. . .A mismatch as in the real situation of Huntington's disorder and several spinocerebellar ataxias, 300ns MD simulation has been carried out for d(CAG) 6 .d(CAG) 6 sequence (Fig. 5A). As before, 2 starting models each with +syn. . .anti and anti. . .anti glycosyl conformations are considered for all the six A. . .A mismatches.
Aforementioned conformational changes caused by A. . .A mismatch at the CA and GC steps leads to sugar-phosphate backbone flipping causing helicity reversal that results in the formation of periodic B-Z junction (Fig. 5I). Formation of such B-Z junction also reflects in the solvation as both water and ion populate more in the minor groove than the major groove (S10 Fig).  Intriguingly, 8 out of 10 G's adopt -syn conformation (S13 Fig). In fact, in one of the strands, all the G's (G 21 ,G 24 ,G 27 ,G 30 &G 33 ) adopt -syn conformation. This is associated with (ε,z,α,γ) favoring (g -,g + ,g + ,trans) (>70%) at the GC step (S14 (I As before, this reflects in the helical twists with CA(9(16°)) and AG(11(9°)) steps confined to lower values (including negative values), while GC step taking a higher twist (31(10°)), causing frequent left-handedness in the helix (Fig. 6 (bottom), S15 Fig). These indicate the periodic occurrence of B-Z junction in d(CAG) 6 .d(CAG) 6 . Above conformational rearrangements result in a high RMSD of~8Å at the end of the simulation (S16 Fig). Further, similar to above (S10 Fig), B-Z junction results in minor groove of the duplex occupied with more water and ion molecules compared to the major groove (S17 Fig), a characteristic of the Z-DNA. Canonical (CTG) 6 .(CAG) 6 duplex retains B-form RMSD (~3.3 (0.9) Å) calculated over 300ns MD simulation of (CTG) 6 .(CAG) 6 duplex (Fig. 7A) indicates that the molecule undergoes minimal conformational rearrangement from the starting B-form geometry (Fig. 7B). Strikingly, the overall structure doesn't show any tendency to adopt Z-form, as can be visualized from Fig. 7C. Instead, it retains the compact Bform geometry.

A. . .A mismatch propels Z-DNA conformation
Structural information about the distortions caused by A. . .A mismatch in a DNA duplex is not yet well defined at the atomistic level. The only structure that has been reported so far with A. . .A mismatch in a DNA is the complex of a DNA duplex and Muts, an E. coli mismatch repair protein, with a significant bending at the mismatch site (PDB ID: 2WTU). NMR and thermodynamic studies of A. . .A mismatch containing DNA duplex offer controversial results. While some of them suggest that A. . .A mismatch destabilizes [18,19,20,21] the DNA duplex significantly, the others do not [22]. Physicochemical studies indicate that A. . .A mismatch in a GAC repeat adopt several distinct conformations in solution including Z-DNA [23,24]. In fact, it has been suggested that A. . .A mismatch in GAC repeat promotes Z-DNA formation [23].
Understanding the structural role of A. . .A mismatch is very important in the context of Huntington's disorder and several spinocerebellar ataxias due to the formation of hairpin structures consisting of noncanonical A. . .A base-pairs. MD simulations carried out in this context reveal a very exquisite observation that A. . .A mismatch in a CAG repeat induces change in the helicity from right-handed B-DNA to left-handed Z-DNA. Even a single A. . .A mismatch tends to form a local Z-DNA structure leading to Z-DNA sandwich (Figs. 1,3). When the A. . .A mismatches occur in a regular interval, it leads to local left-handed Z-DNA formation at the mismatch site followed by a right-handed DNA at the canonical WC pair site leading to periodic B-Z junctions (Figs. 5,6). Formation of Z-DNA structure is evident from the preference for (±)syn. . .high-anti/(-)syn glycosyl conformation by A. . .A mismatch and backbone conformational angles (ε,z,α,γ) favoring (g -,g + ,g + ,t), (g -,g -,g + ,t) and (g -,g -,g -,g + ) at & around the mismatch site. Additionally, G's prefer -syn conformation. This results in a low helical twist at the CA and AG steps in the midst of high twist at the GC step, a characteristic of B-Z junction (PDB ID 1FV7).

Base flipping mechanism
A. . .A mismatch adopts 2 different 'base flipping' pathways to undergo transition from +syn. . .anti to -syn. . .-syn (Fig. 6) accompanied by sugar phosphate rearrangements. One mode of transition is +syn moving to -syn through cis conformation (via counter-clockwise rotation around glycosidic bond), while the other is via trans conformation (through clockwise rotation around the glycosidic bond). In general, DNA with +syn. . .anti conformation takes longer time to undergo the B-Z transition, compared to anti. . .anti conformation.
Base pair nonisomorphism is the key factor for inducing Z-DNA conformation by A. . .A mismatch Reported structural changes provoked by A. . .A mismatch can be attributed to the higher degree of nonisomorphism between A. . .A mismatch and the canonical base pairs. This can be visualized from the larger value of residual twist and radial difference [17,26], the measures of base pair nonisomorphism (S23 Fig). In fact, both residual twist (16º) and radial difference (1.6Å) are quite prominent for A. . .A mismatch with anti. . .anti glycosyl conformation, but, only residual twist (16º) is significant and the radial difference is negligible (0.2Å) in the case of +syn. . .anti glycosyl conformation. This may be the reason for the reluctance of A. . .A mismatch to retain anti. . .anti conformation and the transition to -syn. . .-syn being quite fast compared to +syn. . .anti starting conformation.
In general, the transition from B-to-Z involves complex mechanisms and exhibits a high-energy barrier to transit to Z-DNA conformation. In fact, several mechanisms have been proposed for B-to-Z transition [27] and a recent adaptively biased and steered MD study demonstrates the coexistence of zipper and stretch-collapse mechanisms engaged in transition [28]. However, the mechanistic effect that arises from the intrinsic extreme nonisosterecity of A. . .A mismatch with the canonical base pairs immediately dictates B-to-Z transition without the influence of any external factors. As the A. . .A mismatch is single hydrogen bonded, it exhibits enormous flexibility for base extrusion and flipping, facilitating the formation of Z-DNA through zipper mechanism. Interestingly, such a conformational change is not seen in the crystal structure of RNA duplex with A. . .A mismatch [13]. Thus, it is clear that the effect of A. . .A nonisomorphism is pronounced in the DNA and not in the RNA.
Several experimental studies have revealed that d(GA) [29], d(GAA) [30], d(GGA) [31] and d(GAC) [23,24] repeats that contain A. . .A mismatches are prone to adopt parallel homoduplex. Such preponderance for parallel duplex by these sequences may be due to left-handed Z-DNA provoking nature of A. . .A mismatch, which is a high-energy conformation. Hitherto, this aspect is not realized as there is no DNA duplex structure with A. . .A mismatch available with any sequence context. Earlier low-resolution 1D NMR studies on DNA duplexes comprising of A. . .A mismatch [18,19,20,21,22] offer only minimal information with some of them indicating notable destabilization induced at A. . .A mismatch site [18,19,20,21]. Strikingly, it has been shown by circular dichroism study that CAG repeat spectra resembles GA homoduplex but not CCG and CTG [32]. Propensity of A. . .A mismatch containing DNA to adopt a parallel DNA duplex is also reported [21]. However, the possibility of CAG repeat expansion to favor parallel duplex can be ruled out as it forms hairpin structure [7,8], which eventually leads to antiparallel orientation for the two strands of the DNA hairpin stem. Thus, DNA hairpin stems containing CAG repeat may adopt local Z-DNA conformation at A. . .A mismatch site leading to 'B-Z junction' as revealed by the current investigation. Our result gains support from earlier surface probing using anti-DNA antibody that demonstrated the presence of Z-DNA structure in CAG & CTG repeat expansions [33]. It can also be recalled that formation of hairpin structure with such Z-DNA stem has been observed earlier in a different sequence context [34,35,36]. Thus, we envisage that such noncanonical 'B-Z junction' in CAG repeat expansion may be one of the factors responsible for the newly emerging mechanism of 'DNA toxicity' observed in CAG repeat expansion [37].
Thus, for the first time it has been shown here that the A. . .A mismatch in a DNA duplex with CAG repeat is an inducer of local Z-form conformation through 'zipper mechanism' that stems from backbone flipping and base pair extrusion & flipping leading to B-Z junction. Such B-Z junction instilled by A. . .A mismatch results from the mechanistic effect intrinsic to the nonisoterecity of A. . .A mismatch with the flanking canonical base pairs. With emergence of evidence on 'DNA toxicity' of CAG overexpansion and its role in triggering cell death [9,10], one can envision that occurrence of B-Z junction is the molecular basis for Huntington's disorder and several spinocerebellar ataxias. This further leads to the speculation that B-Z junction binding protein may have a role in the diseased states. Reported results would further be useful in understanding DNA repair mechanisms involving A. . .A mismatch, thus adding a new dimension to the role of A. . .A nonisosterecity on DNA structure.

Modeling of DNA duplex with A. . .A mismatch
Initially, (CTG.CAG) 5 & (CTG.CAG) 6 DNA duplexes containing canonical C. . .G and G. . .C base-pairs with ideal B-form geometry are generated using 3DNA [38]. These models are subsequently manipulated to introduce a non-canonical A. . .A mismatch in the middle of canonical base pairs to generate a 15mer DNA duplex (Fig. 1A) using Pymol (www.pymol.org, Schrödinger, LLC) molecular modeling software. A. . .A mismatch is modeled so as to form N6 (A). . .N1(A) hydrogen bond. For the generation of model with periodic A. . .A mismatches (18mer, Fig. 3A), 'T's in the (CTG.CAG) 6 duplex are replaced manually with A's as mentioned above. To establish base-sugar connectivity and to restraint the sugar-phosphate backbone conformation, the models are refined using X-PLOR [39] by constrained-restrained molecular geometry optimization and van der Waals energy minimization. The second conformation for the A. . .A mismatch, viz., N6(A). . .N1(A) hydrogen bond with +syn. . .anti glycosyl conformation is generated using X-PLOR by applying appropriate restraints. Subsequently, the models are subjected to a total of 1.5μs molecular dynamics simulations (MD) using Sander module of AMBER 12 package [40].

Molecular dynamics simulation protocol
X-PLOR generated duplex models with A. . .A mismatches and the 3DNA generated canonical (CTG.CAG) 6 duplex are solvated with TIP3P water molecules and net-neutralized with Na + counter ions. Following the protocols described in our earlier papers [17,41,42], equilibration and production runs are pursued for 300ns for the sequences given in Table 1. Simulations are performed under isobaric and isothermal conditions with SHAKE (tolerance = 0.0005 Å) on the hydrogens [43], a 2fs integration time and a cut-off distance of 9 Å for Lennard-Jones interaction. FF99SB forcefield is used and the simulation is carried out at neutral pH. Trajectories are analyzed using Ptraj module of AMBER 12.0. Helical parameters and conformation angles are extracted from the output of 3DNA using in-house programs. Due to the presence of noncanonical base pairs, helical twist angles are calculated with respect to C1'. . .C1' vector [17,41,42]. Pymol is used for visualization and MATLAB software (The MathWorks Inc., Natick, Massachusetts, United States) is used for plotting the graphs.  6 .d(CAG) 6 DNA duplex with +syn. . .anti starting conformation for the mismatch (Fig. 5A). Note that one of the A's moves towards minor groove and undergoes flipping by rotating in counter-clockwise direction. (MOV) S6 Movie. Base flipping leading to the formation of B-Z junction at A 11 . . .A 26 mismatch site in d(CAG) 6 .d(CAG) 6 DNA duplex with +syn. . .anti starting conformation for the mismatch (Fig. 5A). Note that prior to flipping, both the A's are moving apart that results in total  Note that the first two columns belong to residues from C 1 to G 18 of the duplex, while the third and fourth belong to the complementary residues (C 19 to G 36 ) of the duplex. While the first and third columns indicate the relationship between ε & z (ε in X-axis and z in Yaxis), the second and fourth columns illustrate the relationship between α & γ (α in X-axis and γ in Y-axis). Scaling used for contour density plot is shown in the 4 th row. Note the strong preponderance for Z-form geometry by CA and GC steps. (TIF) S10 Fig. Ion (top) and water (bottom) density around d(CAG) 6 .d(CAG) 6  . While the first and third columns indicate the relationship between ε & z (ε in X-axis and z in Y-axis), the second and fourth columns illustrate the relationship between α & γ (α in X-axis and γ in Y-axis). Scaling used for contour density plot is shown in the 4 th row. Note the strong preponderance for Z-form geometry by GC step (viz., more than 70% of (ε,ξ,α,γ) in (g -,g + ,g + ,t) conformation). CA step as well shows the tendency for Z-form geometry with (ε,ξ,α,γ) in (g -,g -,g -,g + ) conformation. AG step favors B-form geometry with~59% of (t,g -, g + ,t),~23% of (t,g -,g -,t) and 18% of (t,g -,g -,g + ) for (ε,ξ,α,γ).  6 .d(CAG) 6 duplex.

Supporting Information
(ε&z) and (α&γ) 2D contour density plots corresponding to (Top) 5'CT/5'AG, (Middle) 5'TG/5'CA and (Bottom) 5'GC/5'GC steps. Note that (ε&z) does not exhibit any other conformational preference apart from BI (83%) and BII (17%). Similarly, as in the B-form, (α&γ) favor (g-, g+) or (g+, t) conformations. Exceptionally, at the TG step, (α&γ) also favor (g-, t) conformation, which is also favored by B-DNA. First two columns belong to one of the strands of the duplex (C 1 to G 18 ), while the third and fourth columns belong to the complementary second strand of the duplex (C 19 to G 36 ). While the first and third columns indicate the relationship between ε & z (ε in X-axis and z in Y-axis), the second and fourth columns illustrate the relationship between α & γ (α in X-axis and γ in Y-axis). Scaling used for contour density plot is shown in the 4 th row. (TIF)