Mosaic Epigenetic Dysregulation of Ectodermal Cells in Autism Spectrum Disorder

DNA mutational events are increasingly being identified in autism spectrum disorder (ASD), but the potential additional role of dysregulation of the epigenome in the pathogenesis of the condition remains unclear. The epigenome is of interest as a possible mediator of environmental effects during development, encoding a cellular memory reflected by altered function of progeny cells. Advanced maternal age (AMA) is associated with an increased risk of having a child with ASD for reasons that are not understood. To explore whether AMA involves covert aneuploidy or epigenetic dysregulation leading to ASD in the offspring, we tested a homogeneous ectodermal cell type from 47 individuals with ASD compared with 48 typically developing (TD) controls born to mothers of ≥35 years, using a quantitative genome-wide DNA methylation assay. We show that DNA methylation patterns are dysregulated in ectodermal cells in these individuals, having accounted for confounding effects due to subject age, sex and ancestral haplotype. We did not find mosaic aneuploidy or copy number variability to occur at differentially-methylated regions in these subjects. Of note, the loci with distinctive DNA methylation were found at genes expressed in the brain and encoding protein products significantly enriched for interactions with those produced by known ASD-causing genes, representing a perturbation by epigenomic dysregulation of the same networks compromised by DNA mutational mechanisms. The results indicate the presence of a mosaic subpopulation of epigenetically-dysregulated, ectodermally-derived cells in subjects with ASD. The epigenetic dysregulation observed in these ASD subjects born to older mothers may be associated with aging parental gametes, environmental influences during embryogenesis or could be the consequence of mutations of the chromatin regulatory genes increasingly implicated in ASD. The results indicate that epigenetic dysregulatory mechanisms may complement and interact with DNA mutations in the pathogenesis of the disorder.

having a child with ASD for reasons that are not understood. To explore whether AMA involves covert aneuploidy or epigenetic dysregulation leading to ASD in the offspring, we tested an homogeneous ectodermal cell type from 47 individuals with ASD compared with 48 typically developing (TD) controls born to mothers of ≥35 years, using a quantitative genome-wide DNA methylation assay. We show that DNA methylation patterns are dysregulated in ectodermal cells in these individuals, having accounted for confounding effects due to subject age, sex and ancestral haplotype. We did not find mosaic aneuploidy or copy number variability to occur at differentially-methylated regions in these subjects. Of note, the loci with distinctive DNA methylation were found at genes expressed in the brain and encoding protein products significantly enriched for interactions with those produced by known ASD-causing genes, representing a perturbation by epigenomic dysregulation of the same networks compromised by DNA mutational mechanisms. The results indicate the presence of a mosaic subpopulation of epigenetically-dysregulated, ectodermally-derived cells in subjects with ASD. The epigenetic dysregulation observed in these ASD subjects born to older mothers may be associated with aging parental gametes, environmental influences during embryogenesis or could be the consequence of mutations of the chromatin regulatory genes increasingly implicated in ASD.
The results indicate that epigenetic dysregulatory mechanisms may complement and interact with DNA mutations in the pathogenesis of the disorder. 4

AUTHOR SUMMARY
Older mothers have a higher than expected risk of having a child with an autism spectrum disorder (ASD). The reason for this increased risk is unknown. The eggs of older mothers are more prone to abnormalities of chromosome numbers, suggesting this as one possible mechanism of the increased ASD risk. Age is also associated with a loss of control of epigenetic regulatory patterns that govern gene expression, indicating a second potential mechanism. To test both possibilities, we sampled cells from the same developmental origin as the brain, and performed genome-wide tests looking for unusual chromosome numbers and DNA methylation patterns. The studies were performed on individuals with ASD and typically developing controls, all born to mothers at least 35 years of age at the time of birth. We found the cells from individuals with ASD to have changes in DNA methylation at a number of loci, especially near genes encoding proteins known to interact with those already implicated in ASD.
We conclude that epigenetic dysregulation occurring in gametes or early embryonic life may be one of the contributors to the development of ASD.

INTRODUCTION
Progress in understanding the genetic basis of ASD has been substantial in recent years, with the development of microarray technologies allowing the identification of copy number variants associated with the disorder [1] and massively-parallel sequencing focused on protein-coding exons allowing insights into smaller mutational events disrupting gene function [2,3]. The emerging picture is of rare rather than common genetic variants mediating most of the risk [4].
Less progress has been made in understanding the mechanism by which environmental factors influence the risk of ASD [5]. Epigenetic mediation of such environmental influences has been proposed [6], but a clear association between the phenotype and epigenetic dysregulation has proven elusive in genome-wide studies [7]. Three recent studies have lent support to an association of epigenetic dysregulation with ASD. One study found indications of distinctive chromatin features in the ASD brain [8], another tested peripheral blood leukocytes and found DNA methylation differences to characterize monozygotic twins affected by ASD compared with their unaffected twin [9], while a third study tested brains from 19 subjects with ASD and 21 controls, also finding differential DNA methylation [10].
Interestingly, chromatin regulatory genes have been described to be significantly enriched as targets of mutational events [11], suggesting that epigenomic dysregulation secondary to mutational events may mediate pathophysiological dysfunction in some individuals with ASD.
We were also interested in the parental age effect in ASD, which for advanced paternal age appears to be substantially attributable to mutational events in the male germline [12,13]. The mechanism of the independent maternal age effect on ASD prevalence [14] has substantially less evidence for such underlying genetic events and remains mechanistically unclear. As advanced maternal age (AMA) has long been associated with increased rates of chromosomal non-disjunction [15], while cellular aging in general is increasingly recognized to involve 6 epigenetic dysregulation [16], we therefore evaluated both of these mechanisms as potential contributors to the increased risk of ASD in a cohort of individuals born to older mothers.
A major component of the current study was the minimization of the potential confounding effects now increasingly recognized to affect epigenome-wide association studies (EWAS) [17,18]. We therefore ensured that we reduced effects due to cell type and subpopulation heterogeneity, chromosomal aneuploidy, copy number variability, genetic polymorphism, age, sex and technical influences, and defined differential DNA methylation using advanced analytical approaches. Our intent was to perform a study representing the best of current practices, maximizing our chances of identifying epigenetic regulatory patterns associated with a complex disorder like ASD.

Minimizing the effects of confounding influences on the DNA methylation data
The conclusions of our study are based on Co-methylation Network Analysis approaches described later, but we also included more mainstream analytical techniques to demonstrate that these also yield information about differential DNA methylation characterizing the ASD group. As a first pass analytical approach, we performed on our preprocessed dataset the type of non-parametric F-testing typically performed for Illumina 450K DNA methylation microarrays (IMA [19]), identifying 3,560 differentially methylated CGs (p<0.001) discriminating ASD and TD individuals. We were, however, concerned that this kind of analysis would be subject to confounding influences and prove to be misleading. Such confounding influences include subject age [16,20,21], gender [20,22], and genetic sequence polymorphism at the tested site or acting in cis [23][24][25]. We therefore sought to understand the contribution of these covariates on our dataset. To accomplish this, we adapted an approach used in conjunction with principal 7 components analysis [26]. We assessed all known technical and biological covariates, and then further determined ancestral haplotype of each individual by analyzing genotype data from cohort subjects and applying the HAPMIX algorithm (Figure 1, Supplemental Table S3). We then performed linear regression of each principal component on each covariate, to determine significant correlations (Figure 2). We found that inter-array differences, reflected by bisulphite conversion and DNA loading controls on the microarray, accounted for the majority of confounding variation between samples, even after applying stringent preprocessing algorithms.
We also found that subject age and ancestry contributed significantly to DNA methylation variation.
To strengthen the quality of our dataset further, we used our genotype data to explore the presence of detectable mosaic aneuploidy and copy number variation in the subjects. We applied the Mosaic Alteration Detection (MAD) algorithm to test for mosaicism for chromosome aneuploidy [27], finding no evidence for such an event in our cases or controls. We were therefore able to focus on the analysis of DNA methylation in the same samples knowing that the results would not be confounded by AMA-induced aneuploidy.

Identifying Differentially Methylated Regions (DMRs)
To identify DNA methylation changes specific to the ASD subjects, we used two approaches, the identification of differentially methylated regions (DMRs) using the bump-hunting approach [28], and, as we describe in the next section, Co-methylation Network Analysis. DMR identification has the advantage of being conceptually easy to understand and lends itself to single locus verification studies, as it focuses on the definition of discrete loci with DNA methylation differences. Instead of interrogating individual sites, bump-hunting combines probe information over short regions of hundreds of basepairs, and therefore minimizes single locus false positives that can occur due to polymorphism at individual CGs. Importantly, unlike 8 approaches like IMA [19], bump-hunting considers the many potential sources of confounding variation in epigenomic analysis and includes analytical approaches to account for and remove unwanted covariates. Bump-hunting was previously developed for tiling microarray data [28]; our use of the algorithm for the Infinium 450K microarray platform represents an extension of the approach that has proven successful in another study of ASD [10]. Applying the default 300 bp window [28], this allows ~27% of the probes on the Infinium 450K microarray to be informative.
Since we had confirmed effects of age and ancestry on DNA methylation in our subjects, we included them as covariates in the bump-hunting model, estimating ancestry as the percent of European (CEU) and of African (YRI) alleles genome-wide. Removing CGs that overlap known common SNPs from further analysis is a commonly performed measure to reduce artifactual effects of unrecognized sequence variants [29], but is insufficient to account for genotype-driven differences, as recent studies have demonstrated the ability of haplotypes in cis with tested CGs to affect their methylation status [20, 23,24]. Although we did not observe significant variation due to sex, it was also included as a covariate as a conservative precaution given its previously documented effects [22]. We confirmed that bump-hunting accounted for and removed all of the technical variation by repeating our PCA and association analysis on the bump-hunting output (Supplemental Figure S4).
We discovered 15 DMRs at 14 genes distinguishing the ASD and TD samples (Supplemental Table S4). As it is known that the incidence of genic copy number variation (CNV) is increased in individuals with ASD [2], we used CNVision [30] to identify large CNVs from our genotyping microarray data and to test whether any of the 15 DMRs overlapped CNVs. Three DMRs (one at the MAPK8IP3 gene and two at the CYP2E1 gene) were found to overlap CNVs in multiple cases and/or controls. We therefore excluded the individuals with CNVs at those loci and re-ran the DNA methylation bump-hunting analysis. The two candidate DMRs at CYP2E1 were no 9 longer identified in this re-analysis, indicating that the CYP2E1 DMR assignation was likely due to the presence of the multiple CNVs.
As an extra layer of stringency, we then re-ran the data preprocessing and analysis for a total of 4 iterations, testing to see which of the remaining 13 loci remained stably predicted as DMRs.
We found four (at the KCNQ5, NRG2, LOC643802 and ZG16B genes) to be predicted in only a subset of the iterative analyses, but the other 9 DMRs to remain stably predicted, generating higher confidence predictions for these 9 loci as a consequence. The instability of prediction of certain loci is likely to be due to ComBat generating slightly different outputs every time it is run, requiring this kind of iterative analysis to test for stability of predictions.
The candidate DMRs from the genome-wide analysis were associated with genes, of which many have already been implicated in previous studies with ASD ( Table 1). Of the 9 genes defined by the candidate DMRs in buccal epithelial cells, all are expressed in the brain according to the Human Brain Transcriptome (HBT) database [31]. Furthermore, 6 of the 9 genes were ascribed by the HBT to co-expression modules; all 6 belong to 2 modules involved specifically in post-natal synaptic transmission [31]. This includes the OR2L13 gene, which, unlike other olfactory receptor genes, is not located in the module associated with olfactory function.
To confirm differential DNA methylation at these candidate DMRs, we used bisulphite PCR, multiplexed robotic library preparation and massively-parallel sequencing averaging >15,000x depth for each locus tested, with >99.99% bisulphite conversion efficiency. We show in Figure   3 the concordance of this massively-parallel bisulphite sequencing with the microarray results for 2 of the 3 stably predicted DMRs, at the OR2L13 and FAM134B genes, both showing decreased DNA methylation at these promoter loci. Of these, OR2L13 has changes in DNA methylation that reaches statistical significance (p<0.05) using a t test comparing the group of CGs tested by bisulphite sequencing in the region between the ASD and TD groups, with FAM134B also showing the decrease in DNA methylation predicted by the microarray study and trending towards but not reaching statistical significance (p=0.0957). Interestingly, OR2L13 has been found in two previous studies to demonstrate either altered DNA methylation in blood [9] or gene expression in brain [32] in individuals with ASD.
We also explored the effect of the subjects' age in more detail, identifying 306 potential DMRs (Supplemental Table S6) associated with the age of the individuals at the time of sample collection (which in our study ranged from 1-28 years, Supplemental Table S1). Gene ontology analysis demonstrated that the genes associated with these DMRs in buccal epithelium are significantly enriched in pathways especially related to development and differentiation but also include pathways involved in neuronal development (Supplemental Figure S7). It is notable that buccal epithelium should be informative for these patterns and supports the use of this cell type as a surrogate for studies of events occurring in the central nervous system.

Co-methylation Network Analysis
While DMR identification is useful for the reasons described above, the exclusion from consideration of the majority of probes on the microarray in a bump-hunting approach and the failure of such approaches to consider that multiple dispersed loci may change DNA methylation patterns concordantly prompted our focus on a network-based analytical approach. Previous work has demonstrated the utility of employing such a network-based approach to detect expression and DNA methylation differences potentially missed by conventional statistical difference testing [32,33]. We therefore chose to use Weighted Gene Co-expression Network Analysis (WGCNA) [34] as a complement to bump-hunting, identifying instead of DMRs the patterns of co-methylation that distinguish the ASD subjects. We found co-methylation modules 11 associated with all of our known biological covariates, including age, gender, and percentage of local ancestry. We selected modules significantly associated only with ASD case/control status after stringent Bonferroni correction. We obtained two modules that met these criteria; the full list of association p-values is provided in Supplemental Table S7. The larger co-methylation module significantly associated with ASD contained 116 CGs (Figure 4a). Genes in proximity to these CGs (Supplemental Table S8) include NF1, whose dysfunction is strongly associated with syndromic autism [35]. Ontology analysis of these genes revealed significant enrichment for functional categories including negative regulation of cell migration, locomotion, cellular compartment movement and cell proliferation, categories previously implicated in ASD subjects [2]. However, because of the recent recognition that gene ontology analysis can be biased towards genes with a greater number of probes [36], we used the number of probes per gene defined in the Illumina manifest to generate a weight for each gene, and recalculated enriched gene ontologies using the R package GOseq [37]. The p-values of the resulting candidate ontology categories did not survive correction for multiple testing, indicating that this kind of ontology analysis needs to be interpreted with caution, and may prompt reanalysis of prior, published gene ontological associations in ASD and other phenotypes.
As a different means of assessing the functional significance of the ASD-associated comethylation modules, we tested whether they were functionally and non-randomly connected to known ASD risk genes. Using data from only the more stringent protein physical interaction databases [38], we created a protein-protein interaction network using the genes from the two significant WGCNA modules and a reference list of known ASD genes [39]. Figure 4b displays the resulting highly interconnected network, demonstrating that many of our module genes directly interact with genes previously found to be mutated in ASD. To test whether this degree of interaction was non-random, we performed the Degree-Aware Disease Gene Prioritization (DADA) approach [40] previously used to study associations of sporadic gene mutations with 12 ASD [12]. We found that our module genes were significantly enriched in ranking (Mann Whitney U = 7.1 x 10 -3 ), indicating the non-random selection of epigenetically-dysregulated genes for those within pathways previously implicated in ASD.

DISCUSSION
Despite using a surrogate ectodermal cell type, this study revealed a pattern of altered DNA methylation at genes expressed in the brain, encoding proteins enriched for the post-natal synaptic transmission functions previously implicated in the pathogenesis of ASD, and significantly interactive with proteins encoded by genes previously described to be mutated in subjects with ASD. These findings combine with those of other recent studies to support a contribution of epigenetic dysregulation to the pathogenesis of ASD [8,9].
A current concern with the EWAS approach is that the generally small changes in DNA methylation found [17] may not be substantially in excess of the noise introduced by technical or biological effects influencing DNA methylation that have no relationship to the phenotype being tested. We have an increasing appreciation of the types of influences on DNA methylation in studies of human subjects, allowing us to design and execute studies more rigorously now than was appreciated to be necessary in the past. The current study represents the largest epigenome-wide analysis to date testing a single cell type in ASD. The use of such an homogeneous cell type in this study should minimize the problems recognized to be associated with samples of mixed cellularity [18]. Our parallel SNP genotyping allowed not only the detection and elimination of CNVs and mosaic aneuploidy as influences, it also allowed us to map within chromosomes the ancestral haplotype in which each site tested for DNA methylation was located, helping to control for genetic influences on DNA methylation. Analytically, we accounted for these sources of variability and used the SVA approach [41] from within bump- 13 hunting [28] to identify and account for known influences (age [16] and sex [22]) as well as additional, otherwise unrecognized influences. The analysis also included iterative use of the data preprocessing and analysis steps prior to using bump-hunting to highlight the DMRs that are stably predicted, and gene ontology studies that address the newly-recognized concern that the number of microarray probes representing the gene can influence the outcome of analysis [36]. These measures, which were preceded by stringent pre-processing of the microarray data, appear to represent the currently necessary level of stringency for EWAS studies.
The partial changes in DNA methylation that are typical of EWAS imply an underlying property of the cell populations being studied. Unlike gene expression, which can vary quantitatively in an individual cell, DNA methylation is either present or absent on an allele, so that small changes of DNA methylation reflect allelic (presumably cellular) subpopulation mosaicism for the epigenetic changes. Our finding of limited changes in DNA methylation in buccal epithelium suggests a model for an early embryological event in ectodermal cells, perturbing the epigenome of a subpopulation of these cells, detectable postnatally in buccal epithelium but potentially also occurring as a mosaic cell subpopulation in other ectodermally-derived tissues including the brain. Mosaicism for chromosomal abnormalities has been associated with ASD [42][43][44], indicating that problems affecting only subsets of cells in the brain can lead to this condition, as also suggested in a recent review of somatic mosaicism in neurological disorders [45].
The characteristics of the genes at which the DNA methylation changes are occurring in the ASD cohort lend support to our model linking epigenetic disturbances with the pathogenesis of ASD. Despite studying non-neuronal cells, the differentially-methylated genes are those expressed in the brain, enriched for post-natal synaptic transmission function. Network approaches have proven to be an informative means of understanding the diversity of mutational events in ASD [32], prompting us to apply the same approaches to explore DNA 14 methylation changes at the genes identified by WGCNA. We found that the genes in the most significant WGCNA modules are non-randomly enriched for interactions with genes already implicated in ASD. The model that results is of mosaic epigenetic dysregulation affecting the same networks and pathways targeted by mutational mechanisms, creating comparable deleterious effects on neuronal function. The results will need to be replicated to increase confidence in the conclusions, requiring independently-collected buccal epithelial samples from comparable subjects with ASD and suitable control individuals. As this currently represents an unusual source of clinical material, such a resource is not yet available, requiring that study replication will need to be addressed as the next phase of this project.
We conclude that of the two mechanisms we originally proposed for AMA causing ASD, covert aneuploidy occurring at detectable levels (≥20%) [46] is not as likely to be involved as epigenetic dysregulation. The use of mixed cell types in prior epigenetic studies of ASD [7] may have limited the ability to detect such subpopulation effects, although it is encouraging that, in a recent study of monozygotic twins, DNA methylation differences were also observed, implicating a gene that we also found to be targeted for epigenetic dysregulation, OR2L13 [9]. This gene appears to be especially labile in ASD in terms of DNA methylation and expression -mixed leukocytes from the monozygotic twins study shows increased DNA methylation at this locus, whereas there is decreased DNA methylation in the ectodermal cells we studied, and expression of this gene is significantly upregulated in the Brodmann Area (BA) 44/45 and downregulated in the BA9/41 regions of the brain [32]. Whether this olfactory receptor gene is a contributor to the distinctive olfaction in subjects with ASD [47,48] remains to be determined, although it should also be noted that the Human Brain Transcriptome (HBT) studies categorized this gene, alone among all olfactory receptor genes, into a post-natal synaptic transmission coexpression module [31], indicating unique transcriptional properties for this gene among those 15 encoding olfactory receptors. This gene merits further studies as being unusually prone to dysregulation in ASD.
Epigenetic changes observed in our cohort of subjects born to mothers with AMA may be due to aging of the oocyte, but there are additional potential mechanisms. Spermatozoa from the generally older fathers of the ASD subjects may also have epigenetic changes with aging.
Epigenetic dysregulation in ASD could be due to as-yet unrecognized environmental influences during development. Furthermore, an emerging property of the genes in which mutations or sequence variants occur in subjects with ASD is that of chromatin and epigenetic regulatory properties [11], which may be occurring in ASD subjects born to older mothers as a consequence of the generally older paternal age of such individuals and the increased risk of mutational events with age in the male germline [12,13]. The epigenetic dysregulation we observe may therefore represent a common outcome of the effects of mutations in such genes, making epigenetic changes an event secondary to genetic mutations within cohorts of subjects with ASD. Combined genetic and epigenetic analyses of the same subjects will be needed to test these possibilities. 16

Cohorts and sample collection
All patient recruitment and sample collection was performed with the appropriate human subjects protocol approval from the Institutional Review Board at the Albert Einstein College of Medicine. We enrolled two groups of subjects: individuals with a diagnosed autism spectrum disorder (ASD), born to mothers aged 35 and older, and a control group of typically developing (TD) individuals born to mothers aged 35 and older. Of these, we selected 50 ASD and 50 TD subjects for genotyping analysis, of which we took 47 and 48 from each group, respectively, for methylation analysis. The characteristics of the ASD and TD cohorts are summarized in Supplemental Table S1.
We collected buccal epithelium using exfoliative brushing, choosing this accessible cell type for two major reasons. Firstly, as ectodermal in origin, these cells have been found to represent a better surrogate for the brain than other tissues when relating cognitive impairment with tissuespecific levels of mosaic trisomy 21 [49]. Secondly, exfoliative buccal brushing harvests squamous epithelial cells homogeneously, confirmed by microscopy of samples in this study (Supplemental Figure S1), avoiding the potentially confounding effects of heterogeneous cell subtypes on epigenomic profiling [18]. DNA from the buccal epithelial samples was extracted with a modification of the protocol for the Qiagen Gentra Puregene Buccal Cell Kit. We randomized samples between arrays as a precaution against batch effects and ran them on the Illumina HumanOmni2.5-8 BeadChip genotyping platform and the Illumina Infinium HumanMethylation450 Beadchip methylation platform (Supplemental Table S2).

Chromosomal mosaicism analysis
Stringent genotype data preprocessing included filtering for poor quality arrays and probes, as well as probe missingness and allele frequency (Text S1). After this preprocessing, we used the Mosaic Alteration Detection (MAD) algorithm [27] to assess the prevalence of chromosomal mosaicism. We confirmed the ability of MAD to detect mosaicism in samples from 4 individuals with known chromosomal mosaicism (three cell line and one buccal epithelial samples, Supplemental Figure S2).

Local ancestry deconvolution
We used the HAPMIX [50] software to annotate genomic ancestry genome-wide. Since our population comprised individuals with 3-way admixture, we simulated mixed parental ancestry and calculated probabilities of each potential genotype at every probe. We assigned ancestry only when the local probability exceeded 0.5 (Supplemental Table S3).

Differential methylation analysis
Arrays were subjected to stringent preprocessing to correct for technical (probe type, detection p-value), and batch artefacts (Supplemental Figure S3). Due to inclusion of both males and females in our cohort, we removed the sex chromosomes from our dataset and conducted analysis on autosomes only. Detailed information on the pre-processing procedures is included in the Supplementary Information.
After preprocessing, we then performed principal components analysis (PCA) on the M values (logit-transformed Illumina-defined beta values) obtained. We accounted for the possible known confounding influences, including technical (date of DNA extraction, microarray chip, position on chip), microarray-based (all categories of control probes designed by Illumina) and biological (ASD status, age, gender, and ancestry percentage). Ancestry percent was calculated as the 18 proportion for each population of all allele genotyping positions called by HAPMIX. We identified the significant confounding covariates and corrected for them prior to all subsequent analysis.
To identify differentially methylated regions (DMRs), we used the bump-hunting approach, specifically the dmrFind algorithm within the charm package [28]. Based on our PCA data, we input age, gender, percent CEU (European) and percent YRI (African) ancestry as known biological covariates. Those loci called as significant by the dmrFind algorithm were defined as candidate DMRs. Stability of DMR prediction was tested by re-running the bump-hunting on the data a total of 4 times. Additional loci identified through bump-hunting are illustrated in Supplemental Figure S5.

Copy number variant (CNV) calling
We utilized the CNVision algorithm [30] to detect copy number variants in all individuals. We removed samples with CNVs overlapping called DMRs and re-ran bump-hunting to confirm that the DMR was not solely due to the presence of CNVs in the region.

Weighted gene co-expression network analysis (WGCNA)
We used the Weighted Correlation Network Analysis framework WGCNA [34] for our DNA methylation analysis, as has previously been performed [33] to identify networks of comethylated CG dinucleotides associated with ASD status. As input, we used the CG values generated from the Surrogate Variable Analysis (SVA) algorithm called in the dmrFind function performing bump-hunting, thus correcting for all known technical artifacts. We used the WGCNA package in R and built an unsigned co-methylation network. Correlation matrices were raised to the power of 5, as calculated by the scale-free topology criterion on data subsets, and thresholds were set of minimum module connectivity (kME) of greater than 0.7, and minimum height for module merging of 0.1 [32]. We ran the algorithm with block sizes of 40,000 CGs. 19 We assessed module relevance to case/control status with a t-test (two tailed, unequal variance) of module eigengene values with case/control and gender categories. For relationship with the continuous variables of age and percent of YRI and CEU ancestry, we used Pearson correlation coefficients and their Student asymptotic p-values.
For analysis of methylation changes associated with ASD, we selected the 2 modules ("light green", and "dark olive green2") that showed significant correlation only with ASD status and not with any other covariate, to avoid introducing confounding effects. The genes associated with the dark olive green2 module are listed in Supplemental Table S9.

Functional Analysis
The nature of the enriched pathways involved in the list of genes associated with DMRs was investigated using a Cytoscape plugin for Biological Network Gene Ontology Enrichment, BiNGO [51]. The reference set used was set as the whole annotation from Homo sapiens and Biological Processes from Gene Ontology were queried, with a significance level set to a false discovery rate (FDR) corrected value of 0.05.
To assess the functional impact of our WGCNA ASD-associated co-methylated gene modules, we interrogated the genes' relevance in protein-protein interaction (PPI) networks. We combined the genes in the light green and dark olive green 2 modules with a list of previously curated known ASD risk genes, with the addition of exome sequencing candidates (KATNAL2 and CHD8) [39]. We used GeneMania [38] to build a PPI network of this combined list, using only data from physical protein interaction databases. We visualized this network in Cytoscape.
To test for bias due to probe number differences at each gene, we use the GoSeq package To test the significance of these PPI connections, we performed Degree Aware Disease Gene Prioritization (DADA) [40]. We used the ASD seed list mentioned previously, a combined candidate list of the genes related to the ASD-associated WGCNA CGs, and the physical interaction database from the Human Protein Reference Database (HPRD) available through GeneMania. We evaluated if our genes were significantly enriched in ranking using the Mann-Whitney test.

Verification of differential methylation
We bisulphite converted 500 ng of DNA  Figure S6.

Data access
All microarray data generated are deposited into the Gene Expression Omnibus (GEO) database under accession number GSE50759.

ACKNOWLEDGMENTS
Autism Speaks is thanked for helping with patient recruitment.

Text S1
A file containing all supplementary figures and tables, with details on analytical approaches and including software allowing re-analysis of data. 31  1q44 implicated in linkage study [53] Down-regulated in initial cohort, upregulated in replication cohort [32] Hypermethylated in ASD using probes on the Illumina 27K array [9] PAX8 Transcription factor, particularly involved in developing thyroid, CNS, and kidney.