The Impact of the HydroxyMethylCytosine epigenetic signature on DNA structure and function

We present a comprehensive, experimental and theoretical study of the impact of 5-hydroxymethylation of DNA cytosine. Using molecular dynamics, biophysical experiments and NMR spectroscopy, we found that Ten-Eleven translocation (TET) dioxygenases generate an epigenetic variant with structural and physical properties similar to those of 5-methylcytosine. Experiments and simulations demonstrate that 5-methylcytosine (mC) and 5-hydroxymethylcytosine (hmC) generally lead to stiffer DNA than normal cytosine, with poorer circularization efficiencies and lower ability to form nucleosomes. In particular, we can rule out the hypothesis that hydroxymethylation reverts to unmodified cytosine physical properties, as hmC is even more rigid than mC. Thus, we do not expect dramatic changes in the chromatin structure induced by differences in physical properties between d(mCpG) and d(hmCpG). Conversely, our simulations suggest that methylated-DNA binding domains (MBDs), associated with repression activities, are sensitive to the substitution d(mCpG) ➔ d(hmCpG), while MBD3 which has a dual activation/repression activity is not sensitive to the d(mCpG) d(hmCpG) change. Overall, while gene activity changes due to cytosine methylation are the result of the combination of stiffness-related chromatin reorganization and MBD binding, those associated to 5-hydroxylation of methylcytosine could be explained by a change in the balance of repression/activation pathways related to differential MBD binding.


Introduction
Genomic expression is a finely controlled process regulated by a myriad of external and internal effectors which ensure that genetic information is expressed whenever and wherever required in the interest of the entire organism [1]. External effectors are proteins or RNAs, which can regulate directly DNA transcription activation or repression. Internal effectors are chromatin chemical modifications (epigenetic changes) [2] that can affect DNA expression by recruiting activator or repressor proteins; or by controlling the accessibility of the transcriptional machinery to the target DNA sequences.
In developed organisms, such as mammals, DNA epigenetic changes are highly prevalent and most CpG steps (except for those in CpG islands) are methylated at the 5-position of the pyrimidine ring [3,4]. The presence of 5-methylcytosine (mC) has been traditionally related to gene inactivation, [5][6][7][8] but analysis of methylation changes in Leukaemia strongly suggests that the real connection between methylation and gene expression is probably more complex and differs in each individual gene [9]. It is known that methylated DNA (met-DNA) recruits proteins containing methylated-DNA binding domains (MBD; [10]) triggering protein-specific responses. Furthermore, it has been described that DNA methylation modulates DNA accessibility, [11] mainly by making the DNA stiffer and less prone to wrap around nucleosomes [12,13], leading to nucleosome shifts and accordingly to changes in the pattern of accessible and hidden sequences.
In the past years approaches to detect and differentiate mC and hmC in vitro and in vivo along the genome at base-resolution have been developed [29,30].
Interestingly, for still unknown reasons, hmC is prevalent in specific brain cell types, as well as in embryonic stem cells (ESCs), with consequential reduction during differentiation [31][32][33][34][35]. When present, hmC signals are not randomly distributed, but appear located near transcriptional starting sites and promoters, as well as in certain CpG islands [24,[36][37][38], suggesting a specific role of this epigenetic modification in gene activity. The presence of hmC seems to correlate with a reversion of mC effects [39], but the molecular mechanisms connecting the presence of the hmC with gene activity are unclear. Two main possibilities emerge: i) a change in the DNA physical properties reverting to the unmethylated situation that leads to nucleosome repositioning; ii) a direct protein-mediated mechanism. However, there is still a controversy on the physical/conformational effect of mC and hmC on DNA flexibility and its ability to wrap around histones to form nucleosomes [12,[40][41][42][43][44][45]. Using a variety of experimental and Government AGAUR (SGR2017-134 to M.O.), the Instituto de Salud Carlos III-Instituto Nacional de Bioinformática (ISCIII PT 17/0009/0007 co-funded by the Fondo Europeo de Desarrollo Regional, to M.O.); H2020 European Commision. "BioExcel-2. Centre of Excellence for Computational Biomolecular Research" (823830, to M.O.). Funding was also provided by the MINECO Severo Ochoa Award of Excellence from the Government of Spain (awarded to IRB Barcelona). M.O. is an ICREA (Institució Catalana de Recerca i Estudis Avancats) academia researcher. P.D.D. is a PEDECIBA (Programa de Desarrollo de las Ciencias Basicas) and SNI (Sistema Nacional de Investigadores, Agencia Nacional de Investigación e Innovación, Uruguay) researcher. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
theoretical methods, here we explored this issue to have a complete view on all the sequencedependent effects within the nearest-neighbor framework. We found that changes in flexibility induced by cytosine modifications strongly depend on the sequence of the tetrameric environments, considering CpG as central step. In general, we discovered that physical properties of hmC are significantly different to those of unmodified cytosine, but similar to those of mC. This suggests that the reversion of methylation effects in gene expression by TET action cannot be explained simply by changes in physical properties of DNA involving nucleosome remodelling. No specific hmC-binding proteins have been described, to our knowledge, but the analysis of known MBDs [16,46] show a surprisingly wide range of substrate specificity. Our calculations suggest MBD with dual activation/repression activity shows nearly equal affinity for mC and hmC, while those with exclusively inactivating role have a large preference for the methylated form. This can be the ultimate reason of the differential effect of mC and hmC in gene expression.

The effect of 5-hydroxymethylcitosine on the stability of DNA
We first analysed the changes in stability due to methylation and hydroxymethylation of the model sequence d(CGAC � GTCG)�d(CGAC � GTCG). UV-Melting experiments demonstrated (see Materials and Methods and Table A in S1 File) a slight increase in the melting temperature of hmC with respect to unmodified DNA (2-3 degrees), and a slight decrease (around 1 degree) from mC-DNA. This different becomes significative at higher concentrations while still inside the experimental error at lower concentrations, suggesting a mild effect in stability. Processing of melting profiles confirms the small impact of C➔mC➔hmC substitutions in duplex stability, and the predictable enthalpy/entropy compensation, with the hydrophobicity term favouring (as expected, [47]) the stability of mC (mC >hmC>C, see Table B in S1 File). Processing of melting profiles confirms the small impact of C➔mC➔hmC substitutions in duplex stability, and the predictable enthalpy/entropy compensation, with the hydrophobicity term favouring (as expected, [47]) the stability of mC (mC >hmC>C, see Table C in S1 File).

The effect of 5-hydroxymethylcitosine on dsDNA structure. NMR studies
We studied the impact of hmC at d(CpG) steps on the conformation of DNA by NMR experiments using two model sequences with a central CpG step: d(CGTC � GACG)�d(CGTC � GACG) (central base pair very flexible, [48]) and d(CGCGAC � GTCGCG)�d(CGCGAC � GTCGCG) (central base pair very stiff, [48]). NMR data was collected for C � = C, hmC and C � = C, mC and hmC, respectively; allowing us to obtain, for the first time direct experimental information on the structure of DNA related to epigenetic changes C ➔mC ➔hmC placed on both strands of the duplex (i.e., d(C � pG)�d(C � pG)) (see Materials and Methods and S1 File). Sequential assignments of exchangeable and non-exchangeable proton found small, but significant changes in chemical shifts of the groups involved in H-bonding at the substitution site (chemical shifts at C � amino move from 8.21 ppm (C � = C) to 8.41 ppm (C � = mC) and 8.54 ppm (C � = hmC)) (data in Tables C-H and Figs A-D in S1 File). The NMR signal of-OH in the hmC duplex could not be detected, most probably due to rapid exchange with the solvent, suggesting that OH-N6 hydrogen bond is not highly populated. Angular and distance constraints derived from J-couplings and NOE cross-peaks, respectively, allowed us to determine the solution structures (see Materials and Methods and S1 File) for the mC and hmC containing duplexes in the two distinct tetrameric environments. In all cases structures agree very well with a general B-like duplex (Fig 1). The changes induced by the presence of mC or hmC are small, in general less important than those induced by tetramer-dependency (TC � GA vs AC � GT) and a detailed analysis at the central base pair step (d(CpG); see Table 1) demonstrates that at the very local level the structural impact of epigenetic changes in DNA are modest.

The effect of 5-hydroxymethylcytosine on the physical properties of DNA
Unbiased MD ensembles of d(CGTC � GACG)�d(CGTC � GACG) and d(CGAC � GTCG)�d (CGAC � GTCG) agrees very well with the MD-NMR-biased ones (RMSd deviation around 0.7-1.1 Å from the average NMR structure for the central tetramers; see detailed analysis at the  C � pG step in Fig E in S1 File), supporting the quality of parmbsc1 simulations [49]. This allows us to analyse the impact of C➔mC➔hmC substitutions in a much wider range of sequence environments and in the absence of the artificial restraints used in NMR refinements. Such restrains which are nearly-harmonic bias the intrinsic deformability of the DNA hindering the possibility to sample for example, the bimodality at the central CpG step, a well-known polymorphism found both in MD simulation and in high-resolution experimental structures [48,[50][51][52]. With this aim we collected 0.5 μs trajectories (see Materials and Methods) of 10 dodecamers containing all the unique tetramer environments of the d(C � pG) step (for C � = C, mC or hmC). Additional trajectories were collected for d(C � pG) 6 �d(C � pG) 6 , a poly(CpG) dodecamer mimicking CpG islands, repetitive polymers with unique biological importance. In summary, we collected 16.5 μs of unbiased trajectories of dodecamers containing C � pG tetramers (0.5 μs x 33 structures), which confirm that the introduction of epigenetic variants of cytosine leads to only mild structural alterations in the duplex, and the differences between mC-and hmC-DNAs being very small ( Table I in S1 File). Modifications induced by epigenetic changes are typically smaller than tetramer-induced changes, but there is a clear general tendency for a reduction in twist and an increase in roll when C is substituted by mC or hmC (Fig 2).
In the case of twist, the epigenetic changes induce (in general) a displacement of the bimodal CpG distribution towards low twist values (Fig 3), while in the case of roll, it results in the movement of the normal distribution towards higher values, probably related to the steric hindrance of the cytosine 5-substituent (Fig 3). Note that it is difficult, due to the refinement procedures, to detect changes in twist bimodality in NMR structures, but the high twist to low twist transition detected by MD in flexible sequences has been already detected in a high-resolution crystal structure (PDB ID 4GLH, 4HLI and 4GLC) (see Fig F in S1 File), supporting the validity of our theoretical findings. Finally, and very interestingly, we found that some tetramers show quite unique distributions (for example the high twist of AC � GT) and that polymer effects are not negligible as shown in the d(CGCG)�d(CGCG) tetramer behaves differently when embedded in the reference dodecamer or in a poly(CpG).

PLOS COMPUTATIONAL BIOLOGY
The Impact of the HydroxyMethylCytosine epigenetic signature Stiffness analysis (see Materials and Methods) [53,54] illustrates the impact of epigenetic changes on the elastic deformability of DNA. In general, cytosine methylation increases the stiffness of CpG steps, mainly due to the restriction of roll flexibility. Such a stiffness increase is even more evident for hmC (see diagonal terms of the stiffness matrices in Table 2 and Table J in S1 File, and data in S3 File for the entire stiffness matrices), due the narrowing of the roll distribution (see Fig 3), probably related to the formation of transient hydrogen bond interactions involving the hydroxyl group of hmC (see Fig G in S1 File as examples). The formation of those hydrogen bonds covers only a very small part of the trajectory, consequently they cannot justify completely the stiffness due to the hydroxyl group but combined with the loss of conformational sub-states these results show a narrower sampling of the conformations and restrictions for some movements due to the hydroxyl group interactions. It is worth noting, again, that these general trends are not universal and significant deviations are evident for certain tetramers (for example ACGT and TCGA). Again poly(CpG) has a rather differential behaviour, as for this repetitive sequence methylation increases rather than decrease flexibility (in agreement with previous findings by Zacharias and coworkers [44]), which combined with Table 2

PLOS COMPUTATIONAL BIOLOGY
The Impact of the HydroxyMethylCytosine epigenetic signature results above suggest that epigenetic changes can affect physical properties of CpG islands in a quite different way than in the rest of the DNA. We also investigated the stiffness at the base pair level and calculated the 6x6 stiffness matrices for the intra-base pair movements (translational: Shear, Stretch, Stagger, and rotational: Buckle, Propeller, Opening) (data in S3 File). We found that the translational movements have in general higher force constants for normal cytosine (C>hmC>mC), while the rotational parameters are stiffer when hmC is present and the unmodified cytosine is the most flexible. Also in this case the sequence-dependent variability looks more important that the effect of the epigenetic change.
We calculated the correlations between the neighboring base pair (j and j+1), starting from the central C � G step. As previously reported [55] for unmodified CpG steps we found correlations of shift-shift; twist-twist and tilt-tilt base pair parameters between neighboring steps. However, this correlation going from the base pair C � G to the neighboring step decreases (see Fig H in S1 File) and does not go further than the j+1. The epigenetic modifications clearly diminish the neighboring correlations (Fig 3), as they simplify the conformational landscape by reducing the diversity of possible sub-states of several degrees of freedom.
To validate our claim that epigenetic variants tend to increase the DNA stiffness (in the majority of the sequence contexts), we carried out DNA circularization experiments using the double stranded 21mers (d(GAAAAAACGGGC � GAAAAACGG)� d(TCCCGTTTTTC � GC CCGTTTTT)) with a central C � pG dinucleotide (C � = C, mC, hmC, respectively) as described in Perez et al [12]. Under favorable ligation conditions (see Materials and Methods), the multimers form circles that are as short as allowed by the flexibility of the DNA. Results shown in Fig 4 (and Fig I in S1 File) clearly demonstrate that, as suggested by our MD simulations (Table 2), epigenetic variants increase (in most sequence context) the stiffness of DNA, following the order: hmC>mC>C. In summary, the reversion of methylation signature by hydroxymethylation cannot be explained as a recovery of the properties of the unmethylated DNA. Our experimental and theoretical results can be compared with previous data from the literature. Thus, the impact of methylation reducing DNA flexibility found here agrees well with a myriad of experiments from our own group and others [12,40,41,[43][44][45], but disagrees with the results in a recent study by Zaichuk and Marko [42] that point in the opposite direction, probably due to a different experimental set-up where no full control of the placement and level of methylation exists. The impact of hydroxymethylation is less clear, and our results disagree with previous claims by Ngo et al. [41] derived from circularization experiments using short oligomers. We can speculate that the differences might arise from the use in those experiments [41] of a short oligomer that cannot circularize by harmonic deformations (condition in which stiffness analysis can be applied), and possible phasing problems which can be prevalent when experiments are done using only a single duplex length.
The reduction of flexibility introduced by epigenetic variants of cytosine is expected to increase the energetic cost of wrapping the DNA around the nucleosome core [13], leading to a decrease in nucleosome affinity. To confirm this hypothesis, we compute deformation energies (see Materials and Methods) associated to nucleosome wrapping for DNA containing unmodified CpG, mCpG and hmCpG steps. To include phasing issues into consideration we explored two scenarios: i) 4 epigenetic changes (2 d(CpG) steps modified) "in-phase", i.e., each modification separated by 11 bases (the periodicity of DNA in a nucleosome [56]) and ii) 4 epigenetic changes in "anti-phase", i.e. each modification separated by 5 bps. Results in Fig 5A demonstrate that epigenetic variations increase the energy cost required for nucleosome wrapping, both when introduced in "in-phase" and "anti-phase" (Fig J in S1 File), suggesting that change in flexibility rather than in the intrinsic curvature (roll and tilt) are the main responsible of the epigenetic-induced difference in nucleosome affinity. To validate these theoretical findings, we performed nucleosome reconstitution experiments (see Materials and Methods, Supporting Methods in S1 File) [57] using both "in-phase" and "anti-phase" arrangements (the same sequences as before). The results obtained (Fig 5B), characterized by high standard deviation but statistically significant (pvalue of 0.006), confirm that the C➔mC➔hmC substitution leads to a reduction in the efficiency of nucleosome reconstitution, the difference between methylated and hydroxymethylated DNAs being small. Interestingly, experimental changes in nucleosome reconstitution induced by epigenetic alterations "in-phase" are not dramatically different than those found in "anti-phase", where the effect of changes in intrinsic curvature should cancel, supporting the claims that changes in flexibility are the main responsible for epigenetic-related changes in nucleosome affinity. To our knowledge, this is the first time that hmC impact on nucleosome binding has been analyzed, but our findings on methylation fully agree with previous experimental data [12,40,41,43,44] including recent one with cellular models [45], which inspires confidence on the reliability of our hmC predictions.
Altogether, theoretical and experimental results strongly suggest that C➔mC➔hmC substitution at d(CpG) steps leads to small changes in structure and to a small, and environment specific increase in the stiffness (hmC>mC>C), which is reflected in a reduction of nucleosome affinity. Very interestingly, the behavior of DNA containing hmC is much closer to that of met-DNA than that of unmodified DNA, which means that the "reversion" to the unmodified DNA behavior found when methylated DNA is processed by TET-proteins to yield hmC-DNA [22] cannot be explained by changes in the intrinsic physical properties of DNA and the associated recovering of the nucleosome distribution.

The impact of epigenetic modifications on the recognition properties of DNA
The preferred recognition pattern of DNA changes with epigenetic modifications and this can modulate DNA-protein contacts. Thus, DNA containing CpG steps exhibits a preferred region of interaction with cations along the minor groove (Fig 6) (see Materials and Methods) [58]. Cytosine methylation widens slightly the minor groove, displacing the preferred interaction region to the backbone and hydroxymethylation makes the major groove the preferred interaction region (see examples in Fig K in S1 File), probably because of the presence of a polar group. The polarity of the CH 2 OH group is also visible in the water distributions (see examples in Fig L in S1 File), where the integration of the hydroxyl into highly ordered strings of waters leads in general to a very dense solvation atmosphere along the major groove.
Results above suggest epigenetic modifications alter DNA interaction pattern, which might impact the binding of methylated DNA binding domains (MBDs). We focus here on the four MBD proteins related to the regulation of gene expression: i) MeCP2, a protein with direct (via repression transcription domain; TRD) and indirect (via recruitment of histone deacetylases) silencing activity in mature nerve cell [3,46,59], ii) MBD1 which represses directly (via TRD) or indirectly (via recruitment of histone methylases) gene expression [46], iii) MBD2 which is also coupled to repression (directly through TRD, or indirectly through histone methylation and deacetylation), and which is known to induce NuRD-dependent nucleosome remodeling [46,60,61]and finally iv) MBD3 which lacks TRD and which has been related to a myriad of secondary interactions with other partners such as NuRD, DNMT and TET leading to either gene expression or activation. [32,46,61].
We built models of the four MBD-DNA complexes from the available experimental data (details in Materials and Methods and Supporting Methods S1 File), we used as starting structures well-solved complexes and with a clear definition of the interaction site between the modified cytosine and the protein, without the need to sample the positioning [62]. The complexes were solvated and used as starting conformations for MD simulations, coupled to thermodynamic integration (MD/TI), which combined with standard thermodynamic cycles ( Fig  7) allowed us to determine the impact in binding of methyl-cytosine to hydroxymethyl-cytosine change in protein binding (see Materials and Methods). Quite interestingly, despite the very high sequence similarity the four studied MBD have a different discriminant hmC vs mC power. Based on our simulations, MBD1 has the highest specificity for mC compared to hmC, MBD2 and MeCP2 show a small preference for the methylated DNA, MBD3 seems unable to distinguish between mC and hmC.
Analysis of the binding sites (Fig 8) allowed us to determine the origin of the differential specificity. The four proteins have a large sequence similarity and identical fold. They recognize d(CpG) steps (i.e. d(CpG)�d(CpG) by means of interactions along the major groove (N7 and O6) of the two guanines. Such interactions are made by two Arg in three of the proteins (MeCP2, MBD1 and MBD2), while for MBD3 recognition is made by one Arg and one His, suggesting a lower specificity (Fig 8A). The electrostatic environment of the protein pocket where the hydroxyl group is placed is also different, in the MBD1, the protein with highest affinity for mC, the methyl group is placed in a pocket defined by residues such as VAL and PHE defining a quite hydrophobic cavity. When the methyl is hydroxylated, the OH-group is embedded in a unfavourable apolar environment (see white surface in Fig 8B left panel). On the contrary, in MDB3 the hydroxyl group are placed in a hydrophilic/polar cavities, interacting with ASP and a SER (see red and green surface in Fig 8B). These theoretical results are supported by a myriad of experimental evidences, for example, MBD1 is known to be the protein with the largest specificity for methylated DNA, while MBD3 is known to be the most promiscuous of all MBD [46]. There is also good agreement with in vitro binding estimates by Hashimoto et al. [63], which suggests the same range of mC/hmC specificity than that found here.
In summary, MD/TI simulations strongly suggest that mC➔hmC introduces a general decrease in binding affinity in MBD proteins with repressive activities, reversing then the inactivating signals associated with cytosine methylation. On the contrary, MBD3, which shows that largest sequence variability at the recognition site has no preference between the two epigenetic variants. This suggests that MBD3-associated signaling is not affected, which might yield to specific activation or deactivation of genes. The final consequences of the presence of d(hmCpG) steps should be then not only a general activation of gene activity (reversing methylation signal), but a qualitative gain in the complex and promiscuous MBD3 associated signaling.
Experiments and calculations suggest that the presence of hmC in both strands of a d(CpG) step leads to minor structural changes in the DNA duplex, but to non-negligible changes in physical properties, which are reflected in a general increase in the stiffness of DNA that becomes evident in MD simulations and is validated by circularization experiments. This increase in stiffness implies a reduced ability to wrap around nucleosomes, as suggested by MD simulations and demonstrated by gel reconstitution experiments. Very interestingly, both simulations and experiments suggest that hydroxymethylated DNA is even stiffer than methylated DNA, precluding that physical properties could justify the known reversion to unmethylated behavior which is coupled to the d(mCpG)➔d(hmCpG) transformation catalyzed by TET. In fact, our results suggest a small impact of d(mCpG)➔d(hmCpG) transformation on chromatin conformation. Positive ΔΔG values mean that the complex binding to met-DNA is more favorable than to the hydroxymethylated one. As a control of the quality of our theoretical estimates, free energy calculations were done also for the transition from mC to C. The results suggest a strong destabilization of MDB1 binding (2.2±0.4 kcal/mol), a moderate destabilization of MBD2 and MeCP2 binding (0.8±0.6 and 0.5±0.4) and no appreciable effect on MBD3 binding (0 ±0.6), values which agree well with in vitro experimental estimates [63]. https://doi.org/10.1371/journal.pcbi.1009547.g007

PLOS COMPUTATIONAL BIOLOGY
The Impact of the HydroxyMethylCytosine epigenetic signature The presence of the hydroxymethyl in the major groove of changes the recognition properties of DNA and this is expected to modify binding of many effector properties, modulating then in a complex way DNA activation and repression pathways. In this paper we have centered our attention in MBD, the major elements for recognition of methylation signals. Accurate MD/TI calculations strongly suggest that d(mCpG)➔d(hmCpG) transformation leads to reduction of the affinity of binding for all MBD coupled to repressive action, while binding to MBD3, the only methyl-binding domain with dual (activation/repression) activities is not affected. Simulations suggest then that a significant part of the biological effect of d(mCpG) ➔d(hmCpG) transformation can be related to an unbalance in the recognition of DNA by MBD proteins.

UV-monitored thermal-denaturation studies
Modified and unmodified 8-nucleotide DNAs (CGAC � GTCG) were synthetized (details in Supporting Methods and Table K in S1 File) and their absorbance versus temperature curves of each duplex was measured at 5 μM, 20 μM and 66 μM strand concentration in phosphate buffer (25 mM) containing NaCl (100 mM). Experiments were performed in Teflon-stoppered quartz cells of 1 cm and 1mm length path on a JASCO V-650 spectrophotometer equipped with thermoprogrammer. The samples were heated to 90˚C, allowed to cool slowly to 25˚C, and then warmed during the denaturation experiments at a rate of 0.5˚C min -1 to 90˚C. The absorbance was monitored at 260 nm. The data were analyzed by the denaturation-curve processing program, MeltWin v.3.0. Melting temperatures (T m ) were determined by computer fit of the first derivative of absorbance with respect to 1/T. These values were designed to give uniformly separated data points on ln (C t /4) scale. ΔH, ΔS, and ΔG values were estimated from dependence of melting temperatures on DNA concentrations [64]. The reciprocal values of average melting temperatures were plotted against ln (C t /4) and fitted to linear relationships, 1/T m = (R/ΔH) ln(CT/4) + (ΔS/ΔH).

NMR
Samples of the duplexes with a central CpG step: d(CGTC � GACG)�d(CGTC � GACG) and d (CGAC � GTCG)�d(CGAC � GTCG) with and without epigenetic modifications (synthesis details in Supporting Methods in S1 File) were dissolved in either D 2 O or 9:1 H 2 O/D 2 O in 100 mM NaCl and 10 mM sodium phosphate, pH = 7 buffer. All NMR spectra were acquired in Bruker Avance spectrometers operating at 600 and 800 MHz, equipped with cryoprobes and processed with the TOPSPIN software. DQF-COSY, TOCSY and NOESY experiments were recorded in D 2 O at 25˚C. In the experiments in D 2 O, presaturation was used to suppress the residual H 2 O signal. The NOESY spectra in D 2 O were acquired with a mixing time of 250 ms. TOCSY spectra were recorded with standard MLEV17 spinlock sequence, and a mixing time of 80 ms. NOESY spectra in H 2 O were acquired with 150 ms mixing time at 5˚C to reduce exchange with the water. In 2D experiments in H 2 O, water suppression was achieved by including a WATERGATE module prior to acquisition. The spectral analysis program SPARKY (D. K. TD Goddard and D. G. Kneller, SPARKY v3, University of California, San Francisco) was used for semiautomatic assignment of the NOESY cross-peaks and quantitative evaluation of the NOE intensities. Additional details of the NMR analysis are given in S1 File. Coordinates and NMR restraints have been deposited in the Protein Data Bank, PDB IDs 7OGV, 7OHE, 7OHJ and 7OHM.

Refinement of the experimental NMR structures
Ensembles of structures using atomistic force-fields based on the distance constraints obtained experimentally were calculated using classical and usual annealing procedure similar to that used to refine most NMR structures [65]. Accordingly, ideal fiber B-DNA and A-DNA structures are thermalized (298 K) and equilibrated for 100 ps each (using the same options described previously), applying harmonic restraints of 100 kcal/mol�Å 2 on the DNA. Then, a 500 ps MD simulation is performed where the global restraints are replaced by the specific NMR distance constraints obtained experimentally. To obtain the final ensemble, 50 structures were chosen and minimized in vacuo, keeping the NMR constraints.

HydroxyMethylCytosine parameterization
Geometrical parameters for hmC were derived starting from parmbsc1 [49] cytosine and serine from ff14SB for hydroxymethyl group. Final hmC geometry was obtained by energy minimization [66]. Charges were derived from RESP fit at HF/6-31G � level starting from an optimized hmC capped at the C1' using the procedure described in reference [67]. Library with parameters for hmC are available upon request.

Molecular dynamics simulations
We performed molecular dynamics (MD) simulations for 33 12-mer DNA duplexes of sequence CGCGXC � GYCGCG, being C � = C, mC or hmC, with X and Y selected to represent all the 10 possible unique tetranucleotide environments containing a central CG step, were each simulated for 0.5 μs. Additionally the poli(CpG) sequence C � GC � GC � GC � GC � GC � G being C � = C, mC or hmC were simulated. Starting structures were taken from Arnott canonical B-DNA [68], and oligomers were built using the Nucleic Acid Builder [69], epigenetic modification (mC and hmC) were afterwards added to the duplex. The oligomers were simulated using periodic boundary conditions with a truncated octahedral box and an explicit solvent of TIP3P water molecules [70], with a minimum thickness of 10 Å around the solute. The DNA net charge was neutralized with K + cations and K + Cl − ion pairs substituted water molecules to reach a concentration of *0.15 M. Each oligomer was simulated using the AMBER 18 [71] program suite and considering the 3 epigenetic modifications led to a total of 33 simulations and 16.5 μs of unrestrained trajectories. The simulation was performed using our wellestablished multistep protocol [72][73][74] which involves energy minimizations of the solvent, slow thermalization and a final re-equilibration for 10 ns, prior to the 0.5 μs production runs. All simulations were carried out in the isothermal-isobaric ensemble (T = 298 K, P = 1 atm) using the parmbsc1 force field [49] and Dang parameters for ions [75,76]. Long-range electrostatic effects were treated using the Particle Mesh Ewald method [77] using a cutoff of 10 Å. The bonds involving hydrogen were restrained using SHAKE [78] and The temperature and the pressure, with a coupling constant of 5 ps were controlled using Berendsen algorithm [79]. Trajectories, data, analyses and parameters for methylated and hydroxymethylated cytosines are available in our BigNAsim database [80], following recent recommendations for FAIR trajectory storage [81].

Analysis of the trajectories
All the trajectories were processed with the cpptraj module of the AmberTools 18 package [71]. The ability of DNA to recognize sodium was analysed using our classical molecular interaction potential (cMIP [82]). The electrostatic interaction term was determined by solving the linear Poisson-Boltzmann equation, while the van der Waals contribution was determined using standard AMBER Lennard-Jones parameters. The ionic strength and the reaction-field dielectric constant were set to 0.15 and 78.4 M, respectively, while the dielectric constant for DNA was set to 8 [83]. The molecular plots were generated using either VMD 1.9 [84] or the UCSF Chimera package version 1.8.1 [85].

Analysis of DNA physical properties and Deformation Energy calculation
DNA helical parameters and backbone torsion angles were measured with the Curves+ and Canal programs [86] along the MD simulations and for X-ray crystal structures. From the ensemble of MD simulations, the covariance matrix describing the deformability of the helical parameters for each dinucleotide step was computed and inverted to generate the 6×6 stiffness matrix, resulting in the force constants for helical deformation at the base pair step level [53,54]. To be noted that our method is a simplified model, where we considered a reduce stiffness matrix and that our deformation energy model does not consider explicitly the nearest neighbor correlations but only the tetrameric effect on the central CpG step. Average and force constants were calculated for the last 200ns of each MD simulation. The position of each cation in curvilinear cylindrical coordinates for each snapshot of the simulations with respect to the instantaneous helical axis was determined using Canion, a utility program that reads the ion counts [58]. Deformation energy for the wild type, mCpG and hmCpG DNA sequences was calculated using our mesoscopic energy model [13], with parameters fitted (see above) for each epigenetic modification in each tetramer environment (see above).
Annealing and ligation. The complementary strands were denatured at 90˚C and subsequently annealed by gradually decreasing the temperature to room conditions. Double stranded oligonucleotides were then multimerized with 400U of T4 DNA ligase (New England Biolabs) by an overnight incubation at room temperature. The reaction was stopped by inactivating the ligase at 65˚C for 10 minutes. The DNA was ethanol precipitated and dissolved in 10 μl of DNAse free water.
Two-Dimensional Gel Electrophoresis. The ligation/digestion products were loaded on a 5% polyacrylamide native gel (19:1 acrylamide:bisacrylamide) and electrophoresed at room temperature at 4 V/cm in TBE buffer. After separating the different DNA species on the first dimension, the lanes of interest were excised from the gel and loaded on a two dimensional gel (8% polyacrylamide) to separate circular and linear DNA. The separation of the two families of DNA was facilitated by the presence of chloroquine phosphate (250 μg/ml) that distorts the linear molecules of DNA in a different way from the circular ones (3). The gels were stained with SyBr Safe (Invitrogen) and visualized on the ImageQuant system (GE Healthcare) under UV light (see Results).

In vitro nucleosome reconstitution
Free energies for histone binding in in vitro nucleosome reconstitution were measured for the wild type 601 sequence and the epigenetically mutate versions, mCpG601 and hmCpG601 (methods details in S1 File). The 3 forms of the high affinity 601.2 DNA sequence (sequences in S1 File), with normal C-, mC-, hmC-DNA were chemically synthetized starting from the oligos 65-, 66-, 80-and 81-nucleotide long DNAs on the 0.2 μmol scale on an Applied Biosystems 394 synthesizer (see Supporting Methods in S1 File for details and Table K in S1 File). We introduced 2 CpG base pair steps 5 or 11 base pairs apart: in positions that face the protein core through the minor and major groove or, alternatively, only the major grooves. Such positions explore sites in the nucleosomal DNA of negative and positive opening of the base pairs along their long axis [13]. Bands intensities corresponding to nucleosome-bound or free DNA were measured by densitometry using the PhosphorImager system (GE Healthcare) (Fig M in S1 File). Bands intensities quantitation allowed to calculate the affinity for each sequence to form nucleosome as the ratio of counts in nucleosome bound DNA to counts in free DNA. Experimental normalised affinity was then computed dividing the relative affinity for methylated or hydroxymethylated sequences by the control (unmodified cytosine). Each experiment, for each sequence, was repeated four times.

Thermodynamic integration
We used a thermodynamic cycle to compute the reversible work associated to the alchemical transformation between two DNA sequences (between mC and hmC), both in the bound and in the unbound state for a list of all MDB protein-DNA complexes (Supporting Methods in S1 File). The details on how these calculations were carried out are explained in a previous work in which we used the same method to calculate the variation in free energy adding methylation in a nucleosome complex [13]. In this case, to achieve the alchemical transformation between mC and hmC, our molecular dynamics simulations convert the hydroxyl group in position 5 of the cytosine ring into a hydrogen atom. In each complex for each mutation, the variation of λ is discretized in 21 windows, i.e. dλ = 0.05, and the final ΔG is computed via numerical integration. We used soft-core potentials implemented in GROMACS to avoid singularities in the Lennard-Jones and Coulomb potentials, with α = 0.3, σ = 0.25 and a soft-core power of 1. The initial structures for each window were obtained from a 10 ns simulation in which λ was continuously varied from 0 to 1. From this simulation we extracted frames corresponding to a given λ value, and we relaxed the structures by minimizing the energy of the system in those configurations. We simulated each window at fixed λ value for 1 ns, and we discarded the first 100 ps of simulation. For each window we collected estimates for h(δH)/δλi λ by using blocks of 20 ns, which were then integrated through the entire mutation pathway to obtain mutation free energies (with associated statistical errors). All the sequences used in this work, for computational and/or experimental analyses are listed in Supporting Methods in S1 File.

S1 File. This file contains Supporting Methods and Tables A-K and Figs A-M. Table A.
Melting Temperatures (Tm in˚C) for the oligonucleotide CGAC � GTCG, where C � stands for cytosine (C), methylated-cytosine (mC) and hydroxymethylated-cytosine (hmC) respectively, calculated by UV experiments at different oligo concentrations. Table B. Variation in thermodynamic parameters, enthalpy (ΔH in Kcal/mol), entropy (ΔS in cal/mol � K) and free energy ΔG (Kcal/mol), at 25˚C calculated using Van't Hoff equation (see Methods) from UV data for the oligomer in Table K with cytosine, methylcytosine and hydroxymethycytosine respectively.  Table I. Average parameters (in Å and Degrees) averaged over the last 200 ns for the central step (d(C � pG)�d(C � pG)) in the different tetrameric environments between the different forms of cytosine, Hydroxy-MethylC, MethylC, Cytosine. Table J. Diagonal stiffness constants for translational movements in kcal/mol ang2 for the central C � pG step (C � = C, mC and hmC) in the different tetrameric environments. Table K. Mass spectrometry analysis of synthesized oligonucleotides. Fig A. Regions of NOESY spectra (150 ms mixing time) of hMC (CGA � CGTCG)2 duplexes, � C = hMC (top), 5MC duplex (middle) and C (control) duplex (bottom). H1'-base assignment pathways are indicated. Buffer conditions: 100 mM NaCl, 25 mM sodium phosphate, T = 5˚C, pH 7. Fig B. Imino region of the NOESY spectra in H2O (τm = 150 ms) of (CGA � CGTCG)2 duplexes, � C = hMC (right), 5MC (middle), and C (control) duplex (left). Buffer conditions: 100 mM NaCl, 25mM sodium phosphate, T = 5˚C, pH 7. Fig C. 1H-31P correlation spectra for hMC (right), 5MC (middle), and control duplex (left). Buffer conditions were 100 mM NaCl, 25mM sodium phosphate, T = 5˚C, pH 7. Fig D. Chemical shifts differences for non-exchangeable protons between hMC (top) and 5MC (bottom) with respect to the control duplex. Fig E. Overlap of the central tetramer of the average NMR structures (light blue) with the average structure from MD (green), for the tetramers ACGT, AMGT, AHGT, TCGA and THGA. RMDs calculated between the structures for heavy atoms 0.95, 0.69, 0.98, 1.1, 0.76 Å respectively. Fig F. Twist profile for the hemi-hydroxymethylated sequence in the X-ray crystal structures 4GLH (red line), 4HLI (green line) and 4GLC (blue line). HG/CG steps are characterised by low-twist state. Fig G. Hydrogen bonds detected along the MD simulations among hydroxy group of hmC (OH, hmC position 6) and the flanking bases (guanine 7, adenine 5). We detected the formation of hydrogen bonds between the hydrogen of the hydroxyl group and the neighbouring guanine (G7); in particular between the hydrogen of the hydroxyl group and the nitrogen 7 (panel A) OHN7-G7O6-G7OHAHGT A(N7)-hmC(OH) N7-A5OHABCDH42-hmCOH or/and the oxygen 6 (panel B) of the guanine. We also detected a HB between the hydroxyl hydrogen and the oxygen of the backbone (panel C). It is worth noticing that these hydrogen bonds are formed maximum the 0.039 fraction of the simulation time and they are not stable. Nevertheless, the interaction between hmC and G7 indicates that the base pair can assume a more distorted conformation, that reflects into an opening of the minor groove, high roll. In the AC � GT tetramer (panel D) the hydroxyl group interacts also with the adenine (A6), and not only with the following guanine. This behaviour translates into the stiffer and higher twist of the tetramers XCGY where X is a purine and Y a pyrimidine base. Fig H. Correlations between the neighboring base pair (j-> j+1), starting from the central C � G (j = 6), averaged over all the dodecamers simulated. We found correlation, as previously reported [1], between twist-twist, tilt-tilt, shift-shift (left to right panels) base pair parameters. The correlation going from the central base pair C � G to the neighboring steps decreases and does not go further than the j+1 (step = 7). Fig I. Replicas of the 2D polyacrylamide native gels showing different migrations of linear and circular DNA species oligomers of 21 bp, respectively for Cytosine (C), Methylcytosine (mC) and Hydroxymethylcytosine (hmC) containing fragments. Linear DNA molecules are positioned on the lower diagonal, and circular DNA molecules are positioned on the upper diagonal (see Fig 4). base pairs apart. In both plots a schematic image of the DNA modifications (red dots) when they are positioned 5bps or 11 bps apart, in the minor or major grooves facing the histones (in yellow). Fig K. Cation K+ concentration (molarity) along the minor and major grooves averaged over the last 200 ns of the trajectories. K+ molarity distribution as a function of the angular dependence (in degrees) for the tetramers AC � GT and TC � GA where C � = C,mC,hmC (in blue, green and red respectively). Fig L. Occupancy maps of water molecules in the major and minor groove for the unmodified CpG, mCpG and hmCpG respectively in the two tetramer AC � GG and TC � GA. Fig M. In vitro nucleosome core particle reconstitution. Results of the four replicas of the gel mobility shift assays of nucleosomes reconstituted in vitro with a 147-bp 601 (with normal cytosines, methylated and hydroxylated respectively). The upper bands (Nucleosome) correspond to histone core-bound DNA, and lower bands correspond to unbound DNA (free DNA). Mk: DNA ladder for size band estimation.