A Minimal Set of Tissue-Specific Hypomethylated CpGs Constitute Epigenetic Signatures of Developmental Programming

Background Cell specific states of the chromatin are programmed during mammalian development. Dynamic DNA methylation across the developing embryo guides a program of repression, switching off genes in most cell types. Thus, the majority of the tissue specific differentially methylated sites (TS-DMS) must be un-methylated CpGs. Methodology and Principal Findings Comparison of expanded Methyl Sensitive Cut Counting data (eMSCC) among four tissues (liver, testes, brain and kidney) from three C57BL/6J mice, identified 138,052 differentially methylated sites of which 23,270 contain CpGs un-methylated in only one tissue (TS-DMS). Most of these CpGs were located in intergenic regions, outside of promoters, CpG islands or their shores, and up to 20% of them overlapped reported active enhancers. Indeed, tissue-specific enhancers were up to 30 fold enriched in TS-DMS. Testis showed the highest number of TS-DMS, but paradoxically their associated genes do not appear to be specific to the germ cell functions, but rather are involved in organism development. In the other tissues the differentially methylated genes are associated with tissue-specific physiological or anatomical functions. The identified sets of TS-DMS quantify epigenetic distances between tissues, generated during development. We applied this concept to measure the extent of reprogramming in the liver of mice exposed to in utero or early postnatal nutritional stress. Different protocols of food restriction reprogrammed the liver methylome in different but reproducible ways. Conclusion and Significance Thus, each identified set of differentially methylated sites constituted an epigenetic signature that traced the developmental programing or the early nutritional reprogramming of each exposed mouse. We propose that our approach has the potential to outline a number of disease-associated epigenetic states. The composition of differentially methylated CpGs may vary with each situation, behaving as a composite variable, which can be used as a pre-symptomatic marker for disease.

The number of short sequences (reads) aligned with a one or not miss-matches; ** the number of reads aligned to unique loci; *** multi-alignments refer to reads aligning to loci that are repeated in the genome. The three biological replicates are represented by the abbreviations: rep1, rep2 and rep3.

Figure S1
The influence of random sampling and systematic bias are canceling out during the pair-wise site-by-site comparisons Differences in the level of methylation between two samples can be detected in a site-by-site multiple-comparison approach. A) Mouse genomic DNA samples were spiked with un-methylated foreign DNA (λ gDNA). Scatter plots represent pair-wise comparison of the number of reads aligned to the 1,202 surveyed CpG sites in the λ gDNA. In spite the measures come from equally un-methylated sites not all of them were identified with similar efficacy, the digestion frequencies at certain positions are affected by systematic biases which introduce variation in the final counts in a methylation-independent manner. However the tendency of certain CpGs to be over or underestimated are systematic and reproducible among the different experiments. B) Scatter plot of liver and testis digestion frequencies scored for 3,228 GCGC sites from chr5. The solid red line represents the result of a linear regression; the dashed lines defined the 95% interval of prediction. Data outside this interval could represent tissue differentially methylated CpGs. The inset shows that points located in one of the two axes represent the most marked differences. C) The variable "differential methylation" was defined according to: Where df T and df L represent the number of reads aligned to a particular CpG site in the testes and liver samples respectively. Despite the existence of T-DMS the distribution of methylation is expected to be highly similar for most CpG in the two tissues. Thus, when the variable Δmet is computed for several pairs of sites with similar levels of methylation the results should oscillate around zero. Figure S1-C depicts the frequency histogram of Δmet calculated for 256,394 Hinp1I sites in the liver and testes libraries. The red curve represents a single fitted Gaussian indicating that the variable Δmet follows a normal distribution with the log2 ratios scattering around zero. Differences in the library sequencing depths slightly shifted the central value from zero. Overall these results shows that the systematic error associated to the sequence environment of each CpG or the inter sites distances is similar for the identical genomes and therefore their effects are largely canceled out during the pair-wise comparisons.

Figure S2
Validation of the assumption of normality for the distribution of digestion frequencies (methyl sensitive cut counts) in CpGs from both lambda and mouse replicates.
Experimentally determined digestion frequencies contain deviations from an unobservable function that relates methyl sensitive counts with the level of methylation of a site, in a particular sequence environment. We studied the distribution of these deviations through replicates in a numerous set of sites both in the mouse genome as the phage lambda genome.
In MSCC the standard deviation of replicates vary from one site to other, thus to compare desviations acrross multiple data points we first standardized the residual (SR) according to: Where, df i represents the digestion frequency for the site i in the sample; (df) is the average digestion frequency among all "i" sites included in this analysis; sd is the standard deviation associated to the previous calculated average.
These SR were depicted in a Quantile-quantile plots to compare the distribution of observed data against theoretical-ex-   Figure S4. Functional enrichment analysis results for genes proximal to TS-DMS located at intergenic regions.

TESTIS
These TS-DMS are located at distances bigger than 3,000 bp of any know TSS. The association between the genes used in this analysis and the above-mentioned TS-DMS has been described as "weak", Data Set S4.

DATA SETS
Data Set S1: Methyl Sensitive Cut Counting Results for CpG sites surveyed in the mouse genome Column # Header Description 1 or A chrm Identity of the chromosome containing the restriction site described in the row 2 or B pos Position of the chromosome containing the restriction site described in the row 3 or C BN Digestion frequencies scored for the restriction site described in the row. The value represents the normalized number of sequences aligned in this position of the chromosome. The results correspond to the first replicate of the brain-samples. 4 or D BR Same as the previous column, but second replicate of the brain-samples. 5 or E BV Same as the previous column, but third replicate of the brain-samples. 6 or F KN Digestion frequencies scored for the restriction site described in the row. The value represents the normalized number of sequences aligned in this position of the chromosome. The results correspond to the first replicate of the kidney-samples. 7 or G KR Same as the previous column, but second replicate of the kidney-samples. 8 or H KV Same as the previous column, but third replicate of the kidney-samples. 9 or I LN Digestion frequencies scored for the restriction site described in the row. The value represents the normalized number of sequences aligned in this position of the chromosome. The results correspond to the first replicate of the liver-samples. 10 or J LR Same as the previous column, but second replicate of the liver-samples. 11 or K LV Same as the previous column, but third replicate of the liver-samples. 12 or L TN Digestion frequencies scored for the restriction site described in the row. The value represents the normalized number of sequences aligned in this position of the chromosome. The results correspond to the first replicate of the testes-samples. 13 or M TR Same as the previous column, but second replicate of the testes-samples. 14 or N TV Same as the previous column, but third replicate of the testes-samples.