Burden Analysis of Rare Microdeletions Suggests a Strong Impact of Neurodevelopmental Genes in Genetic Generalised Epilepsies

Genetic generalised epilepsy (GGE) is the most common form of genetic epilepsy, accounting for 20% of all epilepsies. Genomic copy number variations (CNVs) constitute important genetic risk factors of common GGE syndromes. In our present genome-wide burden analysis, large (≥ 400 kb) and rare (< 1%) autosomal microdeletions with high calling confidence (≥ 200 markers) were assessed by the Affymetrix SNP 6.0 array in European case-control cohorts of 1,366 GGE patients and 5,234 ancestry-matched controls. We aimed to: 1) assess the microdeletion burden in common GGE syndromes, 2) estimate the relative contribution of recurrent microdeletions at genomic rearrangement hotspots and non-recurrent microdeletions, and 3) identify potential candidate genes for GGE. We found a significant excess of microdeletions in 7.3% of GGE patients compared to 4.0% in controls (P = 1.8 x 10-7; OR = 1.9). Recurrent microdeletions at seven known genomic hotspots accounted for 36.9% of all microdeletions identified in the GGE cohort and showed a 7.5-fold increased burden (P = 2.6 x 10-17) relative to controls. Microdeletions affecting either a gene previously implicated in neurodevelopmental disorders (P = 8.0 x 10-18, OR = 4.6) or an evolutionarily conserved brain-expressed gene related to autism spectrum disorder (P = 1.3 x 10-12, OR = 4.1) were significantly enriched in the GGE patients. Microdeletions found only in GGE patients harboured a high proportion of genes previously associated with epilepsy and neuropsychiatric disorders (NRXN1, RBFOX1, PCDH7, KCNA2, EPM2A, RORB, PLCB1). Our results demonstrate that the significantly increased burden of large and rare microdeletions in GGE patients is largely confined to recurrent hotspot microdeletions and microdeletions affecting neurodevelopmental genes, suggesting a strong impact of fundamental neurodevelopmental processes in the pathogenesis of common GGE syndromes.


Introduction
The epilepsies comprise a clinically heterogeneous group of neurological disorders defined by recurrent spontaneous seizures due to paroxysmal excessive and synchronous neuronal activity in the brain [1]. Epilepsy affects about 4% of the general population during their lifetime [2] and about 40% of all epilepsies are thought to have a strong genetic contribution. The genetic generalised epilepsies (GGEs) represent the most common group of epilepsies with predominant genetic aetiology, accounting for 20% of all epilepsies [3]. Their clinical features are characterised by unprovoked generalised seizures with age-related onset, generalised spike and wave discharges on the electroencephalogram and no evidence for an acquired cause [4,5]. Despite their strong familial aggregation and heritability [6][7][8][9], the genetic architecture of common GGE syndromes is likely to display a biological spectrum, in which a small fraction (1-2%) follows monogenic inheritance, whereas the majority of GGE patients presumably display an oligo-/polygenic predisposition with extensive genetic heterogeneity [10]. Although causative mutations for rare GGE with monogenic inheritance have been identified in genes primarily affecting neuronal excitability, synaptic transmission, and neurodevelopmental processes [11,12], the genetic basis of the majority of patients with GGE remains largely unsolved.
In the present genome-wide burden analysis, we used the Affymetrix SNP 6.0 array to screen large (! 400 kb) and rare (< 1%) autosomal microdeletions with high calling confidence (! 200 markers) in European case-control cohorts of 1,366 GGE patients and 5,234 population controls. We aimed to: 1) assess the genetic burden of large and rare microdeletions in common GGE syndromes, 2) evaluate the contribution of recurrent hotspot and unique microdeletions to the genetic burden of GGE, and 3) identify novel candidate genes for GGE. Specifically, we tested the hypothesis whether microdeletions affecting genes involved in neurodevelopmental processes account for a significant fraction of the genetic risk of GGE syndromes.

Gene-set enrichment analyses of neurodevelopmental genes
To explore the hypothesis whether neurodevelopmental genes affected by the microdeletions have an impact on the genetic risk of common GGE syndromes, we performed enrichment analyses of the deleted genes, using two previously published sets of genes implicated in neurodevelopmental disorders (ND): 1) ND-related genes (n = 1,547) compiled by literature and database queries [30], and 2) genes implicated in autism spectrum disorder (ASD-related genes) comprising 1,669 brain-expressed genes with an enrichment of deleterious exonic de novo mutations in ASD [31]. Microdeletions carrying at least one ND-related gene were 4.6-fold enriched in the GGE patients as compared to the controls (P = 8.02 x 10 -18 ; OR = 4.58, 95%-CI: 3.09-6.82) (Table 1). Likewise, microdeletions encompassing at least one ASD-related gene showed a 4.1-fold excess in the GGE patients relative to the controls (P = 1.29 x 10 -12 ; OR = 4.11, 95%-CI: 2.64-6.40) ( Table 1). To explore the impact of neurodevelopmental genes that are not covered by the recurrent hotspot microdeletions, we combined the ND-and ASDrelated gene lists [30,31] and removed all genes affected by observed recurrent hotspot microdeletions. Non-recurrent microdeletions carrying at least one of the 2,495 selected ND/ASDrelated genes showed a 2.3-fold excess in GGE patients (n = 1,328) compared to control subjects (n = 5,214), when individuals with recurrent hotspot microdeletions were excluded (P = 4.56 x 10 -4 ; OR = 2.48, 95%-CI: 1.42-4.30). To rule out an artificial enrichment of microdeletions in the GGE patients, we compiled two control gene assemblies comprising: 1) 3,256 randomly selected autosomal protein-coding RefSeq genes, and 2) 3,837 autosomal protein-coding RefSeq genes not expressed in the brain [28]. Both control gene assemblies did not show evidence for an increase of the microdeletion burden in GGE patients compared to controls (P > 0.40) ( Table 1).

Functional enrichment and network analyses
The Disease Association Protein-Protein Link Evaluator (DAPPLE v2.0) tool [85] was applied to identify significant physical connectivity among proteins encoded by genes affected by microdeletions. Therefore, we separately tested the gene assemblies for the GGE patients and the control subjects. Based on an initial regional query we extracted 191 seed genes from 103 microdeletions found in the GGE patients and 221 seed genes from 214 microdeletions observed in controls. There was an overlap of 61 genes between the two assemblies. DAPPLE network analyses revealed a significant enrichment for direct connections between the seed genes (P = 0.01) in the GGE microdeletion carriers, while the control gene network did not show evidence for an enrichment (P = 0.40). Finally, in GGE we found eleven genes with significant connectivity: Utilising the Enrichr tool [86], functional enrichment analysis of the gene assembly affected by the microdeletions in the GGE patients revealed a significant enrichment of the MGI Mammalian Phenotype term "abnormal emotion/affect behaviour" (MP:0002572; P adj = 1.30 x 10 -3 ) and the GO biological process term "cognition" (GO:0050890; P adj = 0.012) ( Table 4). Enrichr network analysis identified one significant PPI Hub in the GGE patients based on an enrichment of nine deleted genes (ARC, TJP1, MAPK3, MYH11, EXOC3, NRXN1, PARK2, PLCB1, GRM1) among 219 network genes (P adj = 0.018), for which GRIN2B encodes the shared interacting protein.

High burden driven by recurrent hotspot microdeletions
The present burden analysis applied a screening strategy that focused on both large (! 400 kb, ! 200 markers) and rare (< 1%) autosomal microdeletions to ensure a high calling accuracy [87] and to enrich pathogenic microdeletions among confounding benign copy number polymorphisms [88][89][90]. We found a significant 1.9-fold excess of microdeletions in the patients with GGE compared to the controls (Table 1). Overall, 7.3% of the 1,366 GGE patients carried at least one microdeletion compared to 4.0% in 5,234 controls. These findings highlight the important impact of microdeletions on the genetic susceptibility of common GGE syndromes with an attributable risk of about 3.3%.

Enrichment of microdeletions involving neurodevelopmental genes
In line with our neurodevelopmental hypothesis, we found a significant 4.6-fold excess of microdeletions carrying at least one ND-related gene [30] and a 4.1-fold enrichment of microdeletions affecting at least one ASD-related gene [31] in the GGE patients compared to the control subjects. In contrast, the two control gene assemblies did not show an increase of the microdeletion burden in GGE patients compared to controls (P > 0.40). Accordingly, the intriguing enrichment of ND-and ASD-related genes demonstrates that genes involved in neurodevelopmental processes play an important role in the epileptogenesis of common GGE syndromes. Notably, the moderate overlap of the previously published assemblies of ND-and ASD-related genes implicates a large number of neurodevelopmental genes contributing to the risk of common GGE syndromes and extensive genetic heterogeneity. The emerging overlap of gene-disrupting microdeletions and the rapidly evolving landscape of loss-of-function gene mutations in rare and common epilepsy syndromes will facilitate the prioritisation of causal epilepsy genes and the elucidation of the leading molecular pathways of epileptogenesis [101,102].

Non-hotspot microdeletions implicating potential GGE genes
We identified 27 gene-covering microdeletions in non-hotspot genomic regions that were present only in GGE patients (Table 3 and S3 Fig). These autosomal microdeletions involved several genes previously implicated in epilepsy and neurodevelopmental disorders. Although it remains challenging to distinguish benign and pathogenic microdeletions, several of these contain plausible candidate genes for epilepsy. Of particular interest were seven genes at seven microdeletion loci that have been associated with epilepsy.
Three of the epilepsy-associated microdeletions have been reported in two previous publications demonstrating an association of microdeletions affecting the 5´-terminal exons of the neuronal genes encoding the adhesion molecule neurexin 1 (NRXN1; 2p16.3, chr2: 50,145,642-51,259,673, hg19) and the splicing regulator RNA-binding protein fox-1 homolog (RBFOX1; 16p13.3, chr16: 5,289,468-7,763,341, hg19) [26,27]. The microdeletions involving NRXN1 exons 1-2 were observed in two female GGE patients with genetic absence epilepsies [26]. The 5´-terminal untranslated RBFOX1 exons 1-2 were deleted in a female patient with childhood absence epilepsy [27]. Deleterious mutations and microdeletions of the genes, NRXN1 and RBFOX1, have been reported in a large number of patients with a broad range of neuropsychiatric disorders, who were frequently also affected by epilepsy [40,41,54,72,81]. A recent study demonstrated that the splicing regulator Rbfox1 controls neuronal excitation in the mammalian brain and the Rbfox1 knockout in mice results in an increased susceptibility to spontaneous and kainic acid-induced seizures [71]. Furthermore, molecular, cellular, and clinical evidence supports a pivotal role of RBFOX1 in human neurodevelopmental disorders [73,103].
A 3.45 Mb microdeletion harbouring the protocadherin PCDH7 gene (chromosomal location: 4p15.1, chr4: 30,721,950-31,148,422, hg19) was found in a female GGE subject with juvenile myoclonic epilepsy. An international GWAS meta-analysis including 8,696 epilepsy patients and 26,157 controls highlights PCDH7 as susceptibility gene for epilepsy in general and GGE syndromes in particular [45]. The PCHD7 gene encodes a calcium-dependent adhesion protein that is expressed in neurons of thalamocortical circuits and the hippocampus [46]. PCDH7 has been implicated as neuronal target gene of MECP2 [47], the gene for Rett syndrome (OMIM #312750), which manifests as a progressive neurodevelopmental disorder with recurrent seizures. Moreover, mutations in the X-chromosomal protocadherin gene PCDH19 cause epilepsy and intellectual disability in females [48]. These lines of evidence suggest an involvement of PCDH7 in epileptogenesis.
A 788 kb microdeletion involving the Shaker-like voltage-gated potassium channel gene KCNA2 (1p13, chr1: 111,136,002-111,174,096, hg19) was identified in a male GGE patient with generalised tonic-clonic seizures starting at the age of 14. The Kv1 subfamily plays an essential role in the initiation and shaping of action potentials, influencing action potential firing patterns and controlling neuronal excitability as well as seizure susceptibility [36,38,39]. De novo loss-or gain-of-function mutations in KCNA2 have been identified to cause human epileptic encephalopathy [39]. Furthermore, the Kcna2 knockout mice exhibit spontaneous seizures and have a reduced life span [35,37].
A 582 kb microdeletion encompassing exon 1 of the gene encoding the RAR-related orphan receptor B (RORB; 9q21.13, chr9: 77,112,251-77,303,533, NM_006914, hg19) was found in a male patient with childhood absence epilepsy, overlapping with the critical region of a novel microdeletion syndrome at 9q21.13 characterised by intellectual disability, speech delay, facial dysmorphisms and epilepsy [63]. The RORB gene is a strong candidate for the neurological phenotype because RORB was deleted in all affected individuals [63], it is expressed in the cerebral cortex and thalamus, and genetic associations of RORB with bipolar disorder [64] and verbal intelligence [65] have been reported.
The gene encoding the enzyme phospholipase C-beta 1 (PLCB1; 20p12.3, chr20: 8,112,911-8,865,546, hg19) was partially deleted (exons 1-3, NM_015192, hg19) in a male GGE patient with childhood absence epilepsy. PLCB1 catalyses the generation of inositol 1,4,5-trisphoshate and diacylglycerol from phosphatidylinositol 4,5-bisphosphate, a key step in the intracellular transduction of many extracellular signals. Homozygous microdeletions of chromosome 20p12.3, disrupting the promoter region and first three coding exons of PLCB1, have previously been reported in two consanguineous families with early infantile epileptic encephalopathy [74]. Mutation analysis of a family with severe intractable epilepsy and neurodevelopmental delay revealed compound heterozygous mutations in PLCB1 composed of a 476 kb microdeletion encompassing PLCB1 and a deleterious PLCB1 splice site mutation [75]. Girirajan et al. [54] found an enrichment of microdeletions and duplications involving the PLCB1 gene in individuals with autism. Together, these findings implicate that the PLCB1 gene contributes to the genetic risk of neurodevelopmental disorders including epilepsy.

Summary
Our burden analysis of large and rare autosomal microdeletions (size ! 400 kb, frequency < 1%) revealed: 1) a nearly 2-fold excess of microdeletions in GGE patients relative to the population controls, 2) a 7-fold increased burden for known hotspot microdeletions previously associated with neurodevelopmental disorders, and 3) a more than 4-fold enrichment of microdeletions carrying a gene implicated in neurodevelopmental disorders. Recurrent microdeletions at seven genomic rearrangement hotspots accounted for 37% of all microdeletions identified in the GGE patients and predominantly contributed to the excess of microdeletions in GGE patients. Comorbidity of GGE with other neurodevelopmental disorders, such as intellectual disability, ASD and schizophrenia, may result in even higher prevalence of recurrent hotspot microdeletions [17] and emphasises a valuable diagnostic contribution to the clinical management of these severely affected comorbid patients with GGE. The remarkable phenotypic variability observed for the recurrent hotspot microdeletions suggests a shared susceptibility of a wide range of neuropsychiatric disorders and GGE [105]. Several genes affected by microdeletions that were found only in GGE patients highlight novel candidate genes for GGE. Altogether, the present findings reinforce converging lines of evidence that genes affected by microdeletions in GGE patients reside in fundamental neurodevelopmental processes.

Case-control cohorts
The study protocol was approved by the local institutional review boards of the contributing clinical centres. All study participants provided written informed consent. Genomic DNA samples of all study participants were processed by the Affymetrix SNP 6.0 array. For the genomewide CNV burden analysis, we did not include individuals with excessive CNV counts (> 50 autosomal deletions per individual for deletions spanning > 40 kb in size and covering > 20 markers). In addition, we excluded all Affymetrix SNP 6.0 array data derived from lymphoblastoid cell lines because of the clonal source of the DNA which is prone to CNV artefacts compared to genomic DNA samples derived from blood cells [21]. All study participants were of self-reported North-Western European origin.
Unrelated GGE patients of European descent were ascertained through the primary diagnosis of a common GGE syndrome according to the classification of the International League Against Epilepsy [1,4]. The standardised protocols for phenotyping of GGE syndromes as well as inclusion and exclusion criteria are available online at: http://portal.ccg.uni-koeln.de/ccg/ research/epilepsy-genetics/sampling-procedure/ [96]. GGE patients with a history of severe major psychiatric disorders (autism spectrum disorder, schizophrenia, affective disorder: recurrent episodes requiring pharmacotherapy or treatment in a hospital), or severe intellectual disability (no basic education, permanently requiring professional support in their daily life) were excluded. The GGE cohort comprised 1,366 patients (853 females, 513 males) with the following age-related GGE syndromes: childhood absence epilepsy (CAE, n = 398), juvenile absence epilepsy (JAE, n = 191), unspecified genetic absence epilepsy (GAE, n = 9), juvenile myoclonic epilepsy (JME, n = 540), epilepsies with generalised tonic-clonic seizures (GTCS) alone predominantly on awakening (EGMA, n = 94), and epilepsies with recurrent unprovoked GTCS alone starting before the age of 26 (EGTCS, n = 134). These 1,366 GGE patients were collected from Austria (n = 142), Belgium (n = 39), Denmark (n = 97), Germany (n = 801) and the Netherlands (n = 287). Notably, 1,052 of the GGE patients and 3,022 population controls investigated in the present study were part of a previous study that investigated six target microdeletions at genomic rearrangement hotspots [14].

CNV analysis and screening of autosomal microdeletions
Genomic DNA samples were investigated by the Affymetrix Genome-Wide Human SNP Array 6.0 (Affymetrix, Santa Clara, CA, USA). CNV analysis was performed as previously described [14,22], using the Birdsuit algorithm implemented in the Affymetrix Genotyping Console version 4.1.1. All annotations refer to the genome build GRCh37/hg19. The present genome-wide burden analysis focused on rare and large autosomal microdeletions to ensure a high reliability of the microdeletion calls [87] and to enrich pathogenic microdeletions [88][89][90]. Therefore, we filtered out autosomal microdeletions with high calling confidence according to the following criteria: a) size ! 400 kb, b) coverage of ! 200 probe sets, and c) microdeletion frequency < 1% in the entire study sample. The microdeletion size of at least 400 kb was selected because all known pathogenic hotspot microdeletions identified in neurodevelopmental disorders exceed this size in CNV scans with the Affymetrix SNP Array 6.0 [29,[88][89][90]. We did not include microduplications in the present burden analysis because the accuracy of CNV detection is lower for microduplications compared to microdeletions [110]. In particular, genomic DNA samples with substantial degradation are prone to spurious microduplication calls. Moreover, microduplications seem to exert pathogenic effects less frequently compared to microdeletions [88]. We excluded microdeletions with an overlap of > 10% with 12 chromosomal regions prone to artificial CNV calls according to a recently published "artefact list" [111]. For all QC-filtered microdeletions identified by SNP array screening, the segmental log2 ratios of the signal intensities and the SNP heterozygosity state were visually inspected by the Chromosome Analysis Suite v1.2.2 (Affymetrix, Santa Clara, CA, USA) to exclude spurious microdeletion calls. Validation of all 38 recurrent hotspot microdeletions and four GGE-associated microdeletions identified by SNP arrays in the GGE patients was carried out by realtime quantitative PCR (qPCR) according to the manufacturer´s instructions (Life Technologies, Carlsbad, CA, USA).
Specifically, we tested the hypothesis whether microdeletions affecting genes involved in neurodevelopmental processes account for a significant fraction of genetic risk of GGE syndromes. Therefore, we investigated two recently published assemblies of genes associated with neurodevelopmental disorders (ND): 1) ND-related genes compiling 1,547 genes that were associated with neuropsychiatric disorders, autism candidate genes and genes of known genomic disorders based on literature and database queries [30], and 2) ASD-related genes comprising 1,669 brain-expressed genes that were selectively enriched for deleterious exonic de novo mutations in ASD individuals relative to their healthy siblings [31]. To evaluate a spurious enrichment of microdeletions in the GGE patients relative to the population controls, we tested two control gene assemblies comprising: 1) 3,256 randomly selected autosomal genes, and 2) 3,837 autosomal genes not expressed in the brain [28], defined by the BrainSpan RNA-Seq transcriptome dataset. ND-and ASD-related genes, genes located in genomic rearrangement hotspots, or the artefact list were removed from the compiled control gene assemblies.

Functional enrichment and network analyses
Functional-enrichment tests, pathway and network analyses were performed with the Disease Association Protein-Protein Link Evaluator version 2.0 program (DAPPLE v2.0; http://www. broadinstitute.org/mpg/dapple/dappleTMP.php; [85]) and the gene-set enrichment tool Enrichr (http://amp.pharm.mssm.edu/Enrichr/index.html; [86]). Therefore, we compiled two lists of genes affected by microdeletions in either the GGE patients (number of genes; n = 329; n = 191 regional seed genes) or the controls (n = 428 genes; n = 221 regional seed genes). There was an overlap of 103 genes (n = 61 seed genes) in both gene lists. To explore potential physical interactions among proteins encoded by deleted genes, DAPPLE uses experimentally validated, protein-protein interaction (PPI) databases to identify network and protein connectivity. Empirically, 1,000 random networks were generated by permutation to determine whether the connectivity of each seed protein with the PPI reference network was greater than that expected by chance.
The gene-set enrichment tool Enrichr was applied separately to explore patient and control lists of genes affected by microdeletions for an overlap with pathway gene-set libraries, specifically the database PPI Hub Proteins [112], and gene-set libraries created from Gene Ontology [113] as well as MGI Mammalian Phenotype terms [114]. A pathway or ontology term was considered as significantly enriched if the false discovery rate (FDR, Benjamini-Hochberg) was lower than 5% for an assembly of more than two genes and occurred only in the GGE patients but not in the controls.

Statistical analyses
Burden analysis was performed by comparisons of the frequency of autosomal microdeletions in GGE patients and controls. The P-values and corresponding odds ratios (ORs) with the 95%-confidence intervals were calculated with a two-sided χ 2 -test or Fisher´s exact test if appropriate. The Wilcoxon-Mann-Whitney-Test was applied to compare differences in the genomic size of microdeletions. In addition, the individual burden of microdeletions was assessed for comparisons of microdeletion size. Nominal two-sided P-values < 0.05 were considered significant.
Supporting Information S1 Table. Clinical information of microdeletion carriers and details on microdeletion calling and its genomic organisation. GGE, genetic generalised epilepsy, CTR, population control; Chr: chromosome, start/end: genomic start and end position of the microdeletion, hg19; GGE syndromes: CAE: childhood absence epilepsy, JAE: juvenile absence epilepsy, JME: juvenile myoclonic epilepsy, EGMA: epilepsy with generalised tonic-clonic seizures alone predominantly on awakening, EGTCS: epilepsy with generalised tonic-clonic seizures alone, gsw: generalised spike and wave discharges on the electroencephalogram; the number in front of the GGE syndromes refers to the individual age-at-onset of afebrile generalised seizures. Bold gene symbols indicate genes previously implicated in epileptogenesis. Previously published microdeletion: Ã [14], ÃÃ [26], ÃÃÃ [27].