Monozygotic twin pairs discordant for amyotrophic lateral sclerosis carry both common and unique epigenetic differences relevant to disease

Amyotrophic lateral sclerosis (ALS) is a devastating late-onset neurodegenerative disorder in which only a small proportion of patients carry an identifiable causative genetic lesion. Despite high heritability estimates, a genetic etiology for most sporadic ALS remains elusive. Here we report the epigenetic profiling of five monozygotic twin pairs discordant for ALS in whom previous genome sequencing excluded a genetic basis for their disease discordance. By studying cytosine methylation patterns in peripheral blood DNA we identified thousands of large between-twin differences at individual CpGs. While the specific sites of difference were largely idiosyncratic to a twin pair, a proportion (involving GABA signalling) were common to all affected individuals. In both instances the differences occurred within genes and pathways related to neurobiological function and dysfunction. Our findings reveal widespread changes in epigenetic marks in ALS patients, consistent with an epigenetic contribution to disease. These findings may be exploited to develop blood-based biomarkers of ALS and develop further insight into disease pathogenesis. We expect that our findings will provide a useful point of reference for further large scale studies of sporadic ALS.

report the epigenetic profiling of five monozygotic twin pairs discordant for ALS in whom 23 previous genome sequencing excluded a genetic basis for their disease discordance. By studying 24 cytosine methylation patterns in peripheral blood DNA we identified thousands of large 25 between-twin differences at individual CpGs. While the specific sites of difference were largely 26 idiosyncratic to a twin pair, a proportion (involving GABA signalling) were common to all 27 affected individuals. In both instances the differences occurred within genes and pathways 28 related to neurobiological function and dysfunction. Our findings reveal widespread changes in 29 epigenetic marks in ALS patients, consistent with an epigenetic contribution to disease. These 30 findings may be exploited to develop blood-based biomarkers of ALS and develop further 31 insight into disease pathogenesis. We expect that our findings will provide a useful point of 32 reference for further large-scale studies of sporadic ALS. 33 Introduction 46 Amyotrophic lateral sclerosis (ALS), also known as motor neuron disease, is a lethal adult-onset 47 disease that causes progressive muscle weakness, with death usually 2 to 5 years after initial 48 diagnosis [1]. About 10% of ALS is familial and attributable to germline mutation of specific 49 genes, but in the majority of cases (~90%) no other family member is affected, and the cause of 50 most of this so-called sporadic form of ALS (SALS) remains unknown. Genetic, epigenetic and 51 environmental factors have all been suggested to play a role in SALS, with combinations of these 52 factors proposed to contribute to a multi-staged etiology [2]. 53 Although rare single or multiple genetic variants may underlie some cases of SALS [3,4], much 54 of the heritability of the disease remains to be found [5]. Attention has turned to the possibility 55 that epigenetic factors could contribute to ALS and its associated condition, frontotemporal 56 dementia [6]. The fact that epigenetic changes may be therapeutically modified has driven 57 research in this area [7]. A limited number of unvalidated epigenetic studies of SALS have been 58 undertaken, involving single genes such as SOD1 and VEGF [8], small groups of genes such as 59 those in the metallothionein family (involved in detoxifying heavy metals) [9], and genome-wide 60 methylation analysis using microarray [10]. However, the role of epigenetic variants in SALS 61 remains unclear and largely unexplored [11]. 62 6  [25]. We found no evidence of hypermethylation at probes within any known 113 ALS gene promoter in any twin in the 450K array data. Further, at 10x coverage our RRBS 114 libraries captured allelic information on the promoters of the same genes in all twin pairs, but 115 none of the affected twins exhibited aberrant methylation at any of these loci (Fig 1A). Patterns 116 of methylation at each known ALS disease locus were almost identical among all individuals, 117 with all autosomal promoters showing little to no methylation, as shown for example in C9orf72 118 (Fig 1B). Thus, the discordance for ALS in these monozygotic twin pairs is not due to a germline 119 genetic or epigenetic defect in any of the genes commonly associated with familial ALS.

7
Case-control analysis of methylation implicates GABA receptor signalling as a commonly 121 perturbed epigenetic network in ALS 122 We next took an unbiased approach to determining whether epigenetic differences may underlie 123 the twin discordance for ALS. Unsupervised hierarchical clustering of RRBS data at 10x did not 124 separate cases and controls, but instead identified five distinct clusters representing the five twin 125 pairs (Fig 2A). This is not surprising given the known influence of genotype on inherited 126 methylation patterns [26]. We then used the statistical package methylKit [27] to ask whether 127 there were any differentially methylated CpG sites (DMCs) in common between all ALS cases 128 versus all unaffected controls. At a significance threshold of q<0.01, this identified 135 CpG 129 sites with ≥ 20% average difference in methylation between the two groups ( Fig 2B; Table S2). 130 About one half of these DMCs were in unannotated, intergenic regions of the genome, with the 131 remainder predominantly within intronic regions (Fig 2C). Unsupervised clustering of the 450K 132 data led to a similar clustering by twin pair, not disease status (Fig 2D). Analysis of the array 133 data using minfi [28] failed to identify any significant common DMCs; CpGs with nominal 134 significance, or approaching significance after correction for multiple testing, exhibited only tiny 135 differences in methylation between cases and controls (Fig 2E). 136 None of the common DMCs identified by the RRBS case-control analysis exhibited changes 137 consistent with a germline event (i.e. affecting most or all cells). On average the differences 138 between cases and controls were ±25%, and while mosaicism for a germline change cannot be 139 ruled out in this study of a single tissue, it is more likely that these modest changes indicate 140 common somatic changes in ALS-affected individuals that are consequent to their disease. 141 Ingenuity Pathway Analysis [29] (IPA) of the genes harbouring DMCs (n=74) revealed 142 enrichment for several pathways, the most significantly enriched being 'GABA receptor 143 8 signalling' (Fig 2F). IPA also identified four gene networks in which the affected genes function 144 (Fig S1). The network containing genes involved in GABA signalling, shown in Fig 2G, centred  145 around TNF. The other three pathways (two headed by cancer, and one by lipid metabolism) 146 ( Fig S1) have no obvious pathogenetic link to ALS, but since so little is known about the cause 147 of ALS these networks warrant further investigation. 148 Outlier analysis of RRBS data reveals characteristic epigenetic differences between ALS 149 affected and unaffected twins 150 While the RRBS case-control analyses revealed interesting changes common to all twins, the 151 necessary grouping of individuals for analysis means large changes of potential biological 152 significance in only one or two ALS-affected individuals would be lost to statistical analysis. 153 The 'power of the twin' would also be lost; this is particularly relevant in epigenetic studies, 154 where underlying DNA sequence can influence or even determine epigenetic state [30]. Given 155 the clinical and genetic heterogeneity of ALS, the pathogenesis of motor neuron loss may be 156 distinct in each affected twin. RRBS methylation patterns were therefore compared between each 157 affected and unaffected individual in co-twin analyses. 158 We began by performing a Pearson's correlation of methylation levels between co-twins. 159 Co-twin CpG methylation was highly correlated overall (r=0.978, range 0.972-0.982), and 160 showed a generally bimodal distribution with most sites being either heavily methylated or 161 largely unmethylated (Fig 3A). CpG sites present at >20x coverage in both twins within a pair 162 were considered for further analysis. Those CpGs ≥ 5 residuals from the expected value from a 163 linear model of all sites were called as methylation 'outliers' (Fig 3B). The minimum magnitude 164 of difference in methylation at outliers between co-twins at this stringent cut-off was ~40%. 165 Using this approach we identified more than 1,000 methylation outliers in each twin pair ( Fig  166  3C; Table S3). Although there was a preponderance for methylation outliers to be 167 hypomethylated in the ALS twins relative to the non-ALS twins, whole genome levels of 5-168 methylcytosine as measured by liquid chromatography-tandem mass spectrometry (LC-MS/MS) 169 did not differ between affected and unaffected individuals (Fig 3D), as has been previously 170 suggested for ALS [31]. 171 Genomic annotation of the outliers showed that, relative to all sites captured by RRBS, outlier 172 sites were less likely to be in a CpG island (Fig 3E). Like the common DMCs identified by 173 methylKit, outlier sites were predominantly in intronic and intergenic regions (Fig 3F). The 174 majority of outlier CpGs were idiosyncratic to a twin pair, with little overlap among the twin 175 pairs (Fig 3G). But when considering the genes harbouring the outlier CpG sites, the overlap 176 among twins was greater, with ten genes (ABR, NCOR2, SORCS2, HDAC4, SHANK2, RBFOX3, 177 RXRA, MAD1L1, PTPRN2, GRIN1) harbouring one or more methylation outliers in all five twin 178 pairs (Fig 3H). Despite this overlap at the gene level, at least half of the affected genes were 179 unique to a twin pair. 180

ALS methylation outliers cluster in disease-relevant ontologies and pathways 181
We next took the genomic coordinates of the outlier CpGs and used the Genomic Regions 182 Enrichment of Annotations Tool (GREAT) [32] to identify the ontologies of the sets of outliers 183 for each twin pair. The molecular functions overrepresented by the outliers had one ontology in 184 common across all twin pairs, 'sequence specific DNA binding' ( Table 2). This is not disease-185 specific, but suggests that genes encoding transcription factors are susceptible to varying in 186 epigenotype between identical genotypes. The significantly enriched biological functions 187 revealed a large number of associated ontologies ( Table S4)

209
We have taken advantage of the genetic and early environmental similarity of identical twins 210 discordant for ALS to gain insight into the nature and extent of epigenetic changes in this 211 disease. Together our findings demonstrate that ALS has epigenetic signatures in peripheral 212 blood DNA that could potentially be exploited as biomarkers of disease. Our findings are 213 consistent with widespread disruptions to epigenetic patterns in ALS that either underlie disease 214 etiology, or represent changes consequent to pathology. 215 Familial ALS is genetically heterogeneous, but clinically very similar to SALS, which prompted 216 us to use our data to first examine methylation at genes known to be mutated in familial ALS. 217 Germline epimutation, characterised by soma-wide aberrant silencing of a gene, can phenocopy a 218 genetic mutation [34], and is usually associated with dense hypermethylation at the promoter of 219 the affected gene. However none of the individuals exhibited any aberrant methylation at known 220 14 ALS gene promoters in their peripheral blood. This finding does not necessarily preclude an 221 inborn epigenetic defect as the basis for an affected twin's predisposition to ALS, but it excludes 222 this possibility at known ALS genes. 223 Unbiased case-control analyses are designed to detect commonalities between groups. It is of 224 particular interest that our RRBS analyses revealed affected twin-concordant methylation 225 changes at genes that cluster in GABA receptor signalling. Cortical hyperexcitability is one of 226 the earliest identifiable changes in patients with ALS, caused at least in part by degeneration of 227 inhibitory cortical circuits and reduced cortical GABA levels [35,36]. Given that ALS is such a 228 heterogeneous disease [37], these epigenetic changes common to all our ALS affected twins 229 could be to secondary to the many pathogenetic pathways found in ALS, rather than being 230 causally related to the disease. If so, these changes hold the potential to be exploited as blood-231 based biomarkers for an early diagnosis of ALS. 232 When considering methylation differences between twins we found a considerable number of 233 differences of large magnitude and defined these as 'methylation outliers'. Based on the 234 magnitude of difference in methylation between co-twins at outliers and the stringent parameters 235 we used to identify them, it is unlikely that these outliers merely reflect experimental noise. We 236 do not expect, however, that all methylation outliers between co-twins will be representative of 237 ALS discordance, since many differences may reflect or underlie other phenotypic discordances, 238 or individual exposure to environmental factors [12]. For example, one of our individuals was a 239 smoker at the time of sample collection and her co-twin was not; in this pair we were able to 240 identify the expected difference in methylation levels at an intronic CpG in the AHRR gene, 241 known to robustly associated with active smoking [38] (Fig S2). This particular difference fell 242 just under our outlier threshold of ≥ 5 residuals, but given that twin pairs carry thousands of 243 outlier sites of greater magnitude than this, at least some of them will be expected to reflect the 244 discordance for ALS, a supposition supported by the gene ontology and pathway analyses of 245 outliers. Genome-wide analyses of outliers identified in healthy twins (performed in a similar 246 manner [12]) revealed between-twin differences that cluster largely in ontologies related to the 247 tissue being examined; between-twin DMCs in adipose tissue clustered in functions related to 248 lipid metabolism while peripheral blood DMCs clustered in haematological functions [12]. 249 The thousands of outlier sites we identified in each twin pair showed only a modest overlap in 250 genes affected, but all five twin pairs harboured outliers in ten common genes. Three of these 251 genes have previously been implicated in ALS: SORCS2, RXRA, and HDAC4, which have 252 prominent roles in inflammation and epigenetic regulation [39][40][41]. GRIN1, another of the ten 253 common genes, encodes a subunit of the glutamate NMDA receptor, the major mediator of 254 excitotoxicity; splicing of GRIN1 requires the RNA binding protein TAF15, another molecule 255 implicated in ALS [42]. The remaining genes, including ABR, SHANK2, RBFOX3 and PTPRN2 256 have no obvious link to ALS, but are notable for being highly expressed in the central nervous 257 system. The genes which are affected in all our cases could be considered candidates in follow-258 up studies of larger SALS cohorts. 259 The overlap in functional pathways and networks associated with ALS methylation outliers was 260 the most striking finding of this study. Neurobiological functions or pathways relevant to ALS 261 were overrepresented in every twin pair, even with the modest lack of gene overlap, and more 262 importantly, with the tissue that was examined (white blood cells, not CNS). We were not able to 263 adjust for blood cell composition but such differences, if present, would not be expected to result differences between the co-twins are shown in Table 1. Venous blood samples were taken from 297 an antecubital vein at the same time in each twin pair. DNA was extracted from white blood cells 298 using the QIAmp blood kit (Qiagen) and stored at -20°C until used. 299

Total 5-methylcytosine (5mc) content 300
Total 5mc content of each DNA sample was analysed by liquid chromatography-mass 301 spectrometry (LC-MS/MS). Approximately 1 µg of genomic DNA was used in hydrolysis using 302 DNA Degradase Plus (Zymo). The reaction mixture was incubated at 37°C for two hours to 303 ensure complete digestion prior to LC-MS/MS, as described previously [43]. 304

RRBS case-control analysis 312
Differentially methylated CpG sites between all cases and controls were identified using the 313 Bioconductor R package methylKit [27]