Domains of methylated CAC and CG target MeCP2 to tune transcription in the brain

Mutations in the gene encoding the methyl-CG binding protein MeCP2 cause neurological disorders including Rett syndrome. The di-nucleotide methyl-CG (mCG) is the canonical MeCP2 DNA recognition sequence, but additional targets including non-methylated sequences have been reported. Here we use brain-specific depletion of DNA methyltransferase to show that DNA methylation is the primary determinant of MeCP2 binding in mouse brain. In vitro and in vivo analyses reveal that MeCP2 binding to non-CG methylated sites in brain is largely confined to the tri-nucleotide sequence mCAC. Structural modeling suggests that mCG and mCAC may be interchangeable as minimal structural perturbation of MeCP2 accompanies binding. MeCP2 binding to chromosomal DNA in mouse brain is proportional to mCG + mCAC density and defines domains within which transcription is sensitive to MeCP2 occupancy. The results suggest that MeCP2 interprets patterns of mCAC and mCG in the brain to negatively modulate transcription of genes critical for neuronal function.


Introduction
comprise predominantly promoter regions, showed a 56% reduction.  Fig. 1g-k). Reduced DNA methylation in the 160 conditional mutant brain was confirmed at IAP elements and mouse satellite 161 by bisulfite analysis (Fig. 1i-j) and MeCP2 binding was correspondingly lower 162 ( Fig. 1g-h). Extending the analysis to unique regions of the genome, we chose 163 four random sites in a relatively gene dense domain ( Supplementary Fig. 1g). 164 In all replicates MeCP2 binding to each site was consistently reduced in each 165 Dnmt1-deficient brain compared with a matched WT littermate 166 (Supplementary Fig. 1h-k). Furthermore, bisulfite analysis confirmed the 167 reduction of DNA methylation at the nucleotide level in each case 168 (Supplementary Fig. 1l-o). The data for 6 randomly chosen regions of the 169 mouse genome provide strong support for the conclusion that DNA 170 methylation is a major determinant of MeCP2 binding in the mouse brain. 171 172

MeCP2 recognizes modified di-and tri-nucleotide sequences 173
The predominant methylated sequence is the di-nucleotide CG, but in adult 174 brain mCA 3 and hmCG 17,35 are implicated as binding partners of MeCP2. A 175 recent study concluded that in addition to mCG, MeCP2 binds both mCA and 176 hmCA 25 and confirmed earlier reports that hmCG is a low-affinity binding site 177 for MeCP2 25,36-38 . In addition MeCP2 has been reported to bind in vitro to 178 DNA in which every cytosine was substituted with hmC 17 . Given that neurons 179 only accumulate significant levels of both hmCG and mCH (where H is any 180 base except G) from two weeks postnatally 3 , the Dnmt1 KO model is unsuited 181 to investigate additional sequence specificities of methylation dependent N-terminal domain of MeCP2 that contains the MBD. Surprisingly, the data 185 revealed a further constraint on MeCP2 binding, as the third base following 186 mCA strongly affected MeCP2 binding affinity in vitro (Fig. 2a). Probes 187 containing the mCAC tri-nucleotide sequence bound with high affinity to 188 MeCP2, whereas probes containing mCAA, mCAG and mCAT bound much 189 less strongly. This result was confirmed in EMSA experiments using all 190 possible mCXX tri-nucleotide sequences as unlabeled competitors against a 191 labeled mCGG-containing probe (Fig. 2b). Quantification showed that mCAC 192 and, to a lesser extent mCAT, are both effective competitors, but mCAG and 193 mCAA compete no better than non-methylated control DNA (Fig. 2c). All 194 mCGX oligonucleotide duplexes competed strongly indicating that the base 195 following mCG on the 3' side does not have a large effect on binding, 196 although we note that mCGA was reproducibly a weaker competitor than 197 mCGC, mCGG or mCGT. Neither mCCX nor mCTX tri-nucleotides have a 198 significant affinity for MeCP2 in vitro. 199 200 As hmCA is reported to bind MeCP2 in vitro 25 , we asked whether the third 201 base is also important for hmC binding. Using hmCXX tri-nucleotides as 202 probes in EMSAs, we found that hmCAC bound with a much higher affinity 203 than hmCAA, hmCAG and hmCAT DNA (Fig. 2d). It is notable that the great 204 majority of hmC in the brain and elsewhere is in the hmCG di-nucleotide, with 205 hmCAC being extremely rare 3 . The DNA binding specificity of MeCP2 MBD 206 deduced from these in vitro experiments is summarized in a matrix of di-and 207 tri-nucleotide sequences that bind to MeCP2 (red lettering, Fig. 2e). 208 209

MeCP2 binding specificity in vivo 210
To determine whether the binding specificities established in vitro apply to full-211 length protein in living cells, we developed a novel assay using transfection 212 followed by ChIP 39 . Synthetic DNA duplexes containing specific cytosine 213 presence of the cytosine hydroxyl group would not be accommodated due to 253 the close proximity of this polar group with the guanidinium group of R111 254 ( Supplementary Fig. 3d). Binding of hmCAC is allowed, however, due to tilting 255 of the R133 side-chain and the formation of hydrogen bonds with guanine on 256 the opposite DNA strand (Supplementary Fig. 3e). The presence of a purine 257 in the third position of hmCAG and hmCAA introduces a clash that is not 258 observed with hmCAC or hmCAT (data not shown). Thus the observed tri-259 nucleotide binding specificity of MeCP2 can theoretically be explained with 260 minimal perturbation of the established X-ray structure of the MBD-DNA 261 complex. The minimal change to the structure of the MBD required to 262 accommodate either mCG or mCAC raises the possibility that the biological 263 consequences of binding either motif will be the same. 264 265 A prediction of the modeling is that MeCP2 should bind in only one orientation 266 to mCAC or mCAT, whereas binding to the symmetrical mCG dyad may occur 267 in either orientation. The model requires that R111 and D121 interact with the 268 methyl group of thymine rather than that of 5mC (Fig. 4b). As thymine is 269 effectively 5-methyluracil (Fig. 4c), we replaced it with uracil in the labeled 270 probe and performed EMSA analysis (Fig. 4d). Loss of the thymine methyl 271 group abolished binding to MeCP2, in agreement with the hypothesis that 272 MeCP2 binding to mCAC is confined to one orientation. The data suggest that 273 a symmetrical pair of 5-methyl pyrimidines, one of which is mC, offset by one 274 base pair is an essential pre-requisite for MBD binding to DNA. 275

DNA sequence specificity of MeCP2 binding in adult mouse brain 277
To test whether the DNA binding specificities established in vitro and in 278 transfected cells also apply in native tissues, we analyzed MeCP2 ChIP-seq 279 and whole genome bisulfite (WGBS) datasets derived from adult mouse brain 280 (references 3,25,26 and WGBS from sorted neurons and WGBS from 281 hypothalamus this study). CG is under-represented in the mouse genome 282 (~4% of CX), but highly methylated (~80%), whereas CA is the most abundant 283 CX di-nucleotide (36% of CX), but even in brain only a small fraction of CA is 284 methylated (<2%) (Fig. 5a- analysis of several MeCP2-ChIP data sets for which the antibody used has 294 been rigorously verified, indicates, however, that the profile of Input -295 specifically, DNA derived from the fragmented chromatin sample used for 296 ChIP -is closely similar to that of MeCP2 (Fig. 5d). Using the published high 297 quality ChIP-seq dataset for hypothalamus 26 , we fitted a linear model to 298 predict MeCP2 read coverage from Input reads alone and found a coefficient 299 of determination of 0.84, indicating that MeCP2 is almost uniformly distributed 300 across the genome at this resolution (Fig. 5e). These results are in line with a 301 previous report that the number of MeCP2 molecules in mature neurons is 302 sufficient to almost ''saturate'' mCG sites in the genome 16 . Given the similarity 303 between ChIP and Input, we focussed on regions that show deviations from 304 the Input profile regarding enrichment (purple) or depletion (green) of MeCP2 305 ( Fig. 5f). First, we investigated genomic regions that are depleted of potential 306 binding sites, e.g. unmethylated CpG islands (CGIs). As the ChIP dataset was 307 derived from mouse hypothalamus, we performed WGBS on three biological 308 replicates of this brain region (see Online Methods). Using ChIP and DNA 309 methylation datasets from the same brain region, we observed a pronounced 310 drop of the log2(MeCP2/Input) signal across CGIs for two different data sets, 311 confirming previous analyses 16, 25,26 (Supplementary Fig. 4a). We next 312 examined regions where the MeCP2 signal was higher than expected by 313 applying the MACS 41 tool to detect summits of MeCP2 binding peaks relative 314 to Input 25 . As expected, the di-nucleotide mCG showed a sharp peak at 315 MeCP2 ChIP summits in the hypothalamus dataset ( Supplementary Fig. 4b-e 316 and 25 ). In addition, the tri-nucleotide mCAC, but not other mCAX tri-317 nucleotides, coincided strikingly with MeCP2 peak summits, confirming that 318 mCAC provides a focus for MeCP2 binding (Fig. 5g-h Fig. 4i). Regarding 321 targeting of mCAT, which bound relatively weakly in EMSA, but strongly in the 322 transfection assay, the ChIP-seq data suggest that this is a relatively low 323 affinity site in native brain (Fig. 5g-h and Supplementary Fig. 4f-h). 324

325
To establish MeCP2 binding preferences across the whole genome we 326 adopted a sliding window approach. In the Input sample, both, mCG density 327 and density of unmethylated CGs strongly correlated with coverage revealing 328 a methylation independent CG sequencing bias (data not shown). In contrast, 329 the MeCP2 ChIP coverage was DNA methylation sensitive, with mCG being 330 positively correlated, while unmethylated CG density was anti-correlated (data 331 not shown). Importantly, DNA methylation-sensitivity was also observed in the 332 Input-corrected signal (log2(MeCP2/Input)), strongly supporting the view that 333 mCG is targeted by MeCP2 binding (Fig. 5i, Supplementary Fig. 4j, l). This 334 mCG binding preference was independent of the third DNA base 335 ( Supplementary Fig. 4n). In agreement with the in vitro and in vivo results 336 reported above, the density of mCAC correlated strongly with increasing 337 MeCP2 enrichment, whereas a much weaker trend was observed for other 338 methylated tri-nucleotide sequences (Fig. 5j, Supplementary Fig. 4k, m). We 339 found no evidence for MeCP2 binding to hmCG or unmethylated C's in any 340 sequence context ( Supplementary Fig. 4j, l, o-r). To complement the sliding 341 window approach, we focused on MeCP2 binding within gene bodies and 342 found once again that the Input-corrected signal is strongly correlated with the 343 density of mCG and mCAC (Fig. 5k). The ChIP data therefore sustain the 344 view that MeCP2 binding is determined by the combined density of mCG and 345 mCAC sites.  (Fig. 5l). 355 These regions share common sequence features, in particular high mCG and 356 mCAC densities (Fig. 5m, Supplementary Fig. 4t-u). We also identified a large 357 number of short regions containing highest GC content (<1 kb) but are 358 strongly depleted in MeCP2 binding. As expected, these regions significantly 359 overlap with CGIs ( Supplementary Fig. 4s). Lastly, we found a third group of 360 relatively long regions (10 kb -1 Mb) which are moderately depleted in 361 MeCP2 binding (MeCP2 enrichment score: <0). These regions are 362 moderately enriched for mCG but lack mCAC (Fig. 5m, Supplementary Fig.  363 4s-u). 364

365
As part of this binding site analysis we re-visited an earlier report that MeCP2 366 binds preferentially to mCG flanked by an AT-rich run of 4-6 base pairs in 367 vitro 43 . To look for this preference in brain, we asked whether isolated mCG Over-expression of MeCP2, however, did not significantly affect total RNA. As 389 the latter is ~98% ribosomal RNA, we asked whether mRNA levels were also 390 reduced in KO hypothalamus. Quantitative RT-PCR (qPCR) using spiked-in 391 Drosophila cells to control for experimental error and normalized to brain cell 392

DNA methylation-dependent recognition sequences of MeCP2 493
We comprehensively analyzed the modified DNA sequences that determine 494 MeCP2 binding. To insure the reliability of our conclusions we used three 495 experimental approaches: in vitro EMSA; in vivo ChIP in transfected cultured 496 cells; and in vivo ChIP-seq of mouse brain. Three methylated DNA motifs 497 consistently recruited MeCP2: mCG, mCAC and hmCAC. Interestingly, mCAC 498 is the predominant methylated non-CG sequence in brain, comprising 15 -499 30% of all methylated cytosine in sorted mouse neurons, probably due to the 500 action of the de novo DNA methyltransferase Dnmt3a 2,4 . The 501 hydroxymethylated sequence hmCAC, on the other hand, is reportedly 502 extremely rare perhaps due to the preference of Tet enzymes for mCG as a 503 substrate 3 . Given the inability of MeCP2 to bind hmCG and the apparent 504 extreme rarity of hmCAC, it seems unlikely that hmC is a major contributor to 505 the biological role of MeCP2. 506

507
Our modeling provides a structural explanation for the sequence specificity of 508 the interaction between MeCP2 and DNA. We previously observed that the 509 replacement of a mC at a methylated CG di-nucleotide with T, forming a T:G 510 mispair, had a negligible effect upon the binding affinity of MeCP2 36 , 511 indicating that hydrogen bonding with the mC/T is not essential and that the 512 interaction is flexible enough to accommodate the T:G wobble geometry. Here 513 we observed that in duplex DNA pyrimidine-methyl groups can be provided by 514 either the mC or T, as demonstrated by the finding that the replacement of T 515 with U, which lacks the T methyl group, results in loss of MeCP2 binding. 516 Remarkably the observed specificity for mC in either mCG or mCAC 517 sequence contexts can be rationalized with minimal changes to the 518 conformation of the MBD that was established by X-ray crystallography 40 . 519 Only the configuration of the side-chain of amino acid arginine 133 needs to 520 be adjusted to account for both permitted and non-permitted interactions. The 521 model has potentially important biological consequences, as it suggests that 522 binding to mCAC or mCG is structurally very similar, making it likely that 523 MeCP2 binding to any of these sequences will lead to the same outcome 524 down-stream. In agreement with this scenario, experiments comparing 525 reporters methylated at mCA or mCG suggest that both modified sequences lead to transcriptional repression 2 . Missense mutations that cause Rett 527 syndrome predominantly affect the MBD or the NID, suggesting that the 528 primary role of MeCP2 is to bridge DNA sequences with the NCoR/SMRT co-529 repressor complexes 13 . We propose that mCG and mCAC, function identically 530 to facilitate this bridging process. RNA, matching reports in cultured mouse and human neurons 24,46 . The 547 mechanism responsible is unknown, but one possibility is that MeCP2 is a 548 direct global activator of transcription 23,24 . Arguing against this possibility, we 549 found that most KO down-regulated genes lie in domains of low MeCP2 550 occupancy. Also, it might be expected that two-fold over-expression of an 551 activator would lead to increased levels of RNA compared to WT, but this is 552 not observed (Supplementary Fig. 5a). An alternative explanation is that 553 reduced RNA reflects reduced cell size, perhaps as a secondary 554 consequence of sub-optimal neuronal gene expression. In this case the 555 change in total RNA and the relative mis-regulation of genes within the RNA

Subtle effects on transcription of many genes 611
While there is no direct evidence that aberrant gene expression is the 612 proximal cause of Rett syndrome or MeCP2 over-expression syndrome, it is 613 noteworthy that thousands of genes, including many implicated in human 614 neuronal disorders, are sensitive to altered levels of MeCP2. Mild mis-615 regulation on this scale may destabilize neuronal function 25 . It is worth 616 recalling that Rett neurons, though sub-optimal, are viable for many decades. 617 In this sense the biological defect can be seen as mild, despite the profound 618 effects on higher functions of the brain. The challenge now is to determine 619 how brain function might be affected by a multitude of small discrepancies in 620  Table 3. 656

Nuclear protein extracts and Western blotting 662
Whole brain protein extracts were prepared as previously described with  Supplementary Table 1). The primary cytosine of this tri-nucleotide was 692 either non-methylated, methylated, or hydroxymethylated. All oligonucleotides 693 were annealed to their complement, 32 P-labelled and electrophoretic mobility previously 43 . In competition assays to assess tri-nucleotide-binding 696 preferences of MeCP2 [1-205] a parent 58 bp Bdnf-probe, containing the 697 centrally methylated sequence mCGG, was 32 P-labelled and co-incubated 698 with a 2000-fold excess of cold-competitor DNA bearing one of the sequences 699 described in Fig. 2b-c. Bound complexes were resolved as described above

Structural modeling 798
Modeling was based on the X-ray structure of MeCP2 (PDB code 3C2I) using 799 the program COOT 55 . Atomic coordinates for DNA bases were generated 800 using the 'mutate' option. To optimize hydrogen-bonded and van der Waals Alignments were then filtered to remove non-unique and blacklisted reads. 886 HTseq-count v0.6.0 was used to quantify read counts over gene exons in the 887 union mode. 888 Correlation between MeCP2 ChIP-Seq and Input read counts (Fig. 5e-f) 889

897
Correlation between log2(MeCP2/Input) and DNA methylation in genomic 898 windows (Fig. 5i-j  Rolling mean plots for genic regions (Fig. 5k) 916 Genes were sorted according to their MeCP2/Input enrichment and rolling 917 means of methylation density were applied over subsets of 400 genes with a 918 step of 80 genes. Datasets used are 26 and hypothalamus WGBS (this study). 919 MSR (Fig. 5 l-m