Evaluating whole genome sequence data from the Genetic Absence Epilepsy Rat from Strasbourg and its related non-epileptic strain

Objective The Genetic Absence Epilepsy Rats from Strasbourg (GAERS) are an inbreed Wistar rat strain widely used as a model of genetic generalised epilepsy with absence seizures. As in humans, the genetic architecture that results in genetic generalized epilepsy in GAERS is poorly understood. Here we present the strain-specific variants found among the epileptic GAERS and their related Non-Epileptic Control (NEC) strain. The GAERS and NEC represent a powerful opportunity to identify neurobiological factors that are associated with the genetic generalised epilepsy phenotype. Methods We performed whole genome sequencing on adult epileptic GAERS and adult NEC rats, a strain derived from the same original Wistar colony. We also generated whole genome sequencing on four double-crossed (GAERS with NEC) F2 selected for high-seizing (n = 2) and non-seizing (n = 2) phenotypes. Results Specific to the GAERS genome, we identified 1.12 million single nucleotide variants, 296.5K short insertion-deletions, and 354 putative copy number variants that result in complete or partial loss/duplication of 41 genes. Of the GAERS-specific variants that met high quality criteria, 25 are annotated as stop codon gain/loss, 56 as putative essential splice sites, and 56 indels are predicted to result in a frameshift. Subsequent screening against the two F2 progeny sequenced for having the highest and two F2 progeny for having the lowest seizure burden identified only the selected Cacna1h GAERS-private protein-coding variant as exclusively co-segregating with the two high-seizing F2 rats. Significance This study highlights an approach for using whole genome sequencing to narrow down to a manageable candidate list of genetic variants in a complex genetic epilepsy animal model, and suggests utility of this sequencing design to investigate other spontaneously occurring animal models of human disease.


Introduction
The genetic generalised epilepsies (GGE) comprise a group of epileptic disorders that include absence seizures as one of the classical seizure phenotypes [1][2][3] with evidence that some cases might be explained by a complex polygenic mode of inheritance [4][5][6][7][8][9][10]. Absence seizures are characterized by diffuse cortical bilateral 3Hz spike-and-wave discharge (SWD) on electroencephalogram (EEG) [1]. The Genetic Absence Epilepsy Rats from Strasbourg (GAERS) are a well-validated rat model of spontaneous GGE with absence seizures that were selectively bred from a Wistar colony that displayed spontaneous absence seizures [11,12]. In parallel, a counter strain was also bred from the same original colony, the Non-Epileptic Control (NEC) rats, which were selectively bred for the lack of seizures and SWDs [12,13]. Analogous to human absence epilepsy, GAERS have spontaneously occurring SWDs that start and end abruptly on a normal EEG background [5,12,13]. GAERS also exhibit a similar pharmacological profile when compared to human absence seizures, being suppressed or aggravated by the same anti-epileptic drugs [12][13][14][15][16][17][18]. Moreover, GAERS exhibit depressive-like behaviour and elevated anxiety, analogous to the human behavioural profile of GGE [19][20][21]. Extensive cross breeding studies have demonstrated that the GAERS epileptic phenotype is likely to be complex and polygenic [12]. A genome-wide linkage scan of GAERS x Brown Norway (BN) F 2 rats reported three quantitative trait loci (QTL) in chromosomes 4, 7, and 8, associating with specific aspects of the SWD phenotype [22]. Despite three decades of study, only one GGE-associated genetic variant has been identified that has been implicated with the epilepsy phenotype. We have published the missense (R1584P) variant in the low-threshold T-type calcium channel Ca V 3.2 gene (Cacna1h) as accounting for up to 33% of variance for the percentage of time in seizures and the number of seizures based on data from F 2 progeny of a cross between NEC and GAERS rats [23,24].
Taken together, the GAERS and NEC rats represent a powerful opportunity to identify neurobiological factors associated with the GGE phenotype. Here, we performed whole genome sequencing (WGS) on these two strains that have been selectively bred from the original colony to characterise the genomic differences between them. We present the sequence variation identified among the GAERS, NEC and their F 2 progeny selected for high and nonseizing phenotypes, using short-read next-generation sequence technology. We report a near complete catalogue of strain-specific variants between the affected and unaffected strain, and each compared to the reference BN genome.

Ethics statement
All procedures on rats were approved by The University of Melbourne Animal Ethics committee (ethics numbers 1011823) and followed the Australian Code of Practice for the care and use of animals for scientific purposes.

EEG electrode implantation surgery
To identify the SWD phenotype, rats were anesthetised with isoflurane, then single midline incision was made on the scalp posterior to the eyes to between the ears. Six holes were drilled through the skull without penetrating the dura, one on each side anterior to the bregma, two on each side posterior to the bregma and two to each side anterior to lambda. Recording electrodes were screwed into each hole. Each recording electrode comprised a 1.3 mm gold connector (Farnell Components, Chester Hill, Australia) soldered onto a nickel alloy jeweller screw. The recording electrodes were fixed in position by applying Vertex dental cement around the electrodes and over the skull. The incision was then sutured using nylon (4/0). Immediately after surgery, each rat received an intraperitoneal injection of 1 ml/kg analgesic solution containing intraperitoneal carprofen analgesic (5 mg/kg; Rimadyl; Pfizer Australia) in 0.9% sodium chloride. Post-surgical animals were assessed for neurological, weight and overall appearance changes twice daily for the first 5 days following surgery, and then every other day thereafter. Carprofen analgesia was given twice a day during the first 3 days after the surgery. Animals are humanely euthanized if they present sustained signs of suffering during the course of the experiments. All the animals used in our experiments had a satisfactory and prompt recovery after the surgery.

Animals
We whole-genome sequenced female GAERS (n = 2), NEC (n = 2) and F 2 GAERS x NEC (n = 4) rats aged 16 weeks from our breeding colonies in Royal Melbourne Hospital. The rats were originally obtained from the Strasbourg colony in 2007 (i.e. GAERS STRAS-MELB [24]). The GAERS and NEC lines used for sequencing are at F 82 and F 74 filial generations, respectively. All procedures were approved by the University of Melbourne Animal Ethics Committee (#1011823).

Breeding of the F 2
The double cross matings used in this study were bred based on the Ca V 3.2 R1584P variant. [23] First, GAERS homozygous for the R1584P variant were crossed with NEC rats that were homozygous for the major allele to produce an F 1 generation. Then, two F 1 generation rats were mated to produce an F 2 generation. On average, 25% of the F 2 progeny would be expected to be homozygous for a given GAERS or NEC variant, 50% heterozygous for the variant, and 25% to not carry the variant [23].

EEG recordings and spike and wave discharge analysis
Two weeks after surgery, the animals underwent two 24-hours periods of continuous EEG recording with at least 2 days between each recording to confirm the epileptic phenotype. EEG recordings were obtained using Profusion 3 software (Compumedics, Australia) unfiltered and digitized at 512Hz. EEG recordings were analysed for SWDs to give an electroencephalographic confirmation of absence seizures. The analysis of the EEG was blinded using the SpikeWave Complex Finder software (SWC, PLC van den Broek, Nijmegen University, Netherlands). The seizures detected were confirmed manually using the following criteria: a SWD complex with an amplitude 3 times greater than the baseline, SWD frequency of 7-12 Hz and SWD duration longer than 0.5s, as previously described [24]. The seizures detected were confirmed manually using the GAERS' standard criteria for the classification of seizures, which includes; total number of seizures, total time spent in seizure activity and average seizure duration were analysed. No seizures (or spike-wave discharges) were detected among our NEC rats (evaluated up until 12 months of age) or the no-seizing F 2 rats.

Tissue collection
2-5 days after the last EEG recording session, animals were anesthetized using isoflurane at 5% (Ceva isofluorane, Piramal Enterprises Limited, India), then culled using a lethal dose of intraperitoneal injection of lethabarb (150mg/kg IP; pentobarbitone sodium; Virbac, Aus.). Liver and brain tissue where collected, snap frozen over liquid nitrogen and stored at -80˚C.

Extraction of genomic DNA
Genomic DNA (gDNA) was extracted from liver using the DNeasy Blood & Tissue Kit (QIA-GEN) following the manufacturer's protocol. Samples were incubated for 2 hours at 37˚C with proteinase K (0.4μg) to digest the tissue and treated with 200μg of RNase to degrade any contaminating RNA. Once extracted, gDNA was stored at -80˚C.

Genome sequencing
We sequenced the GAERS and NEC genomes using paired-end sequencing libraries on an Illumina HiSeq 2000 machine. We used the rat reference genome, version 3.4 (rattus norvegicus), which represents >90% of the rat genome. Four female F 2 animals were selected for whole genome sequencing based on the number of seizures that they displayed. Two of the F 2 animals had no seizures and did not carry the Ca V 3.2 R1584P variant (non-seizing F 2 ). The other two F 2 animals selected were homozygous for Ca V 3.2 R1584P variant and displayed the highest number of seizures per hour, 5.8 and 4.6 seizures respectively (high-seizing F 2 ).

Concordance with genotyping chips
Rats from the GAERS and NEC strains have been independently genotyped in 2008 by the STAR consortium using a single-nucleotide polymorphism (SNP) rat genotyping microarray [25]. Of the 20,283 STAR consortium SNPs [25], 11,379 and 10,830 SNPs had been determined to be polymorphic between the BN reference and the GAERS and NEC strains, respectively.
At QUAL PHRED consensus score !30 and read depth !3, the sequencing data from the GAERS and NEC detected 11,292 and 10,648 of the 11,379 and 10,830 SNPs previously assigned as polymorphic (99.24% and 98.32% sensitivity). Of the 8,475 and 8,624 positions previously assigned as non-polymorphic, our sequencing data correctly assigned 8,408 and 8,440 as non-polymorphic (99.21% and 98.29% specificity). These results reflect high concordance rates between the previous genotyping array and our own Illumina HiSeq sequencing data. The minor discordances can be explained by differences in technologies and variant calling platforms resulting in discordant variant calls at the false positive and false negative variant sites. That the animals sequenced are being compared to genotype data from rats maintained in a different laboratory for the last 20 generations would also result in minor discordance. For the 67 (GAERS) and 184 (NEC) apparent false-positive SNPs, the sequencing consensus quality (QUAL) score was high (mean score 58. 22

Data access
The data sets supporting the results of this research article are included within the article; the supporting additional files are presented as supplementary tables (S1-S6 Tables).

Sequencing
Whole genome sequencing of rat DNA was performed on the Illumina Hiseq 2000. To determine coverage, all gaps (stretches of N's) in the rat reference genome (NCBI rattus norvegicus RGSC v3.4 (b4); Ensembl core database release 67) were excluded, resulting in a reference of 2,718,881,021 bases. After accounting for PCR duplicates and reads that did not align to the reference genome, in the GAERS, approximately 23.48 million reads (37.6 Gb) were mapped to the BN reference genome (RGSC v3.4) using BWA [26] and variant calling using samtools [27]. This resulted in 15.14-fold average coverage of the GAERS autosomal chromosomes, and 15.65-fold average coverage of the X-chromosome. Similarly, for the NEC rat, 26.48 million reads (42.7 Gb) mapped to the BN reference genome, equating to 17.2-fold average coverage of the NEC autosomal chromosomes, and 17.57-fold average coverage of the X-chromosome (Table 1). In total, within the GAERS sequence, 98.2%, 96.9%, and 84.2% of non-gap, non-N bases of the autosomal reference genome were covered by at least three, five or ten reads, respectively. Similarly, for the NEC sequence, 98.4%, 97.5%, and 90.3% of non-gap, non-N bases of the autosomal reference genome were covered by more than three, five, or ten reads, respectively ( Table 1).

Strain-specific variants
To enrich for higher confidence protein-coding calls we took the 3,571 GAERS-specific and 3,224 NEC-specific non-synonymous SNVs and focused on those variants where the sequence data of the corresponding strain achieved all of the following: a) !10-fold coverage, b) a QUAL score ! 30, c) a mapping quality (MQ) score ! 40, d) a genotype quality (GQ) score ! 20 and e) ! 80% of total reads supported the variant allele (See S1 Table for GAERS  and S2 Table for NEC). We further screened these higher confidence GAERS (n = 2,270) and NEC (n = 2,285) specific non-synonymous SNVs across other known rat variant databases: STAR rat consortium, dbSNP, Ensembl, the SHR rat sequence, and two more recently published consortia sequencing studies [28][29][30][31][32]. Herein, we refer to variants found in GAERS but not NEC, and vice versa, as "specific" variants. Variants that are also absent among the available rat variation reference cohorts are referred to as GAERS or NEC "private" variants. When screening out variants previously reported within external rat strains, the number of GAERS and NEC private SNVs with non-synonymous annotations was 183 and 168 SNVs, respectively (Table 5).
Of the 307 and 394 GAERS-and NEC-specific protein-coding indels variant calls, 110 and 154 passed the stringent quality control criteria as previously described for the SNV calls. Of those, 78 GAERS-and 123 NEC-specific indels were located outside of repeat regions as defined by the rattus norvegicus (rn4) repeatmasker, accessed from the University of California, Santa Cruz (UCSC) Genome Browser (https://genome.ucsc.edu/). (See S3 Table for  GAERS-and S4 Table for NEC-specific protein-coding indels).

Copy number variants
We used the estimation by read depth with single-nucleotide variants (ERDS) platform, based on read depth and paired-end mapping data [34] to identify copy number variants (CNVs) among the GAERS and NEC sequence data. To qualify, CNV calls by ERDS were required to be !5000bps and to result in either complete loss of sequence (for deletions) or at least 2-fold increase in instance (for duplications). As a result, 354 and 475 putative GAERS-and NECspecific CNVs were identified (S5 Table). Of the GAERS-specific putative CNVs, 39 (29 duplications and 10 deletions) are annotated to overlap with an exonic/splicing (coding) function. Similarly, 30 NEC-specific putative CNVs (22 duplications and 8 deletions) overlap with exonic/splicing coding sequence (S6 Table). CNVs of increasing size, as provided in field "length" in S6 Table, are of increased confidence, however, all CNVs reported in S5 and S6 Tables represent putative CNV calls and no CNVs have been independently validated by additional sequencing technologies. No GAERS or NEC CNV was found to overlap with a list of genes characterized by a "seizure" associated phenotype among the Mouse Genome Informatics (MGI) and/or with the list of candidate epilepsy genes as defined by Lemke and colleagues [35].

F 2 variant screen
GAERS rats were crossed with NEC rats to produce an F 1 generation. Heterozygous F 1 generation rats were then mated to produce an F 2 generation. To further identify variants with potentially large effects, we compared the presence or absence of the 183 GAERS private variants among four whole-genome sequenced F 2 rats selected at the extremes of the GAERS and NEC seizure phenotype spectrum (i.e. two high-seizing and two non-seizing F 2 rats respectively), and selected on the presence and absence of the previously reported Cacna1h variant. Of the GAERS private variants, no additional non-synonymous variant beyond the Cacna1h variant was homozygous among high-seizing F 2 and absent among non-seizing F 2 rats. We did not find any candidate variants that could be associated with epilepsy when we screened the 168 private NEC variants in which the non-seizing F 2 were homozygous and absent among the high-seizing F 2 .
We then relaxed the requirement of carrier zygosity and co-occurrence with published rat strains to permit identifying risk variants that may be present among other strains but not be sufficient to cause the seizure phenotype alone. Of the 2,270 higher-confidence GAERS-specific SNVs, thirty-three variants were found exclusively among the two high-seizing F 2 miceirrespective of zygosity (S1 Table). For this zygosity permissive screen we then required that less than~10% of the previously sequenced rat strains-none of which have reported to have a GAERS epileptic and behavioural phenotype-shared the variant (http://rgd.mcw.edu/pub/ strain_specific_variants/). Thus, to qualify, the segregating GAERS-specific variant must also be observed in less than four of the 28 rat genomes published by Atanur et al [31] and also in less than five of the 40 strains recently published in Hermsen et al. [32]. Through this more permissive screen we identified six rare GAERS-specific non-synonymous SNVs that were transmitted to the two high-seizing but not the two non-seizing F 2 animals, and were observed in less than 10% of the other published rat strains. Aside from the pre-selected Cacna1h variant, no variant occurred in a gene of obvious biological relevance. Using similar criteria we found fourteen NEC-specific non-synonymous SNVs transmitted to both non-seizing F 2 but not the two high-seizing F 2 animals (Table 6). We found that 13 of these 14 NEC-specific nonsynonymous SNVs were located within a large block of 52.2Mbp on chromosome 7 from positions 69181224-121380101 (rn4 reference genome). The relevance of this stretch of sequence to the non-epileptic phenotype is of interest for larger F 2 sample sequencing studies.

Discussion
The genetic architecture underlying some of the most common epilepsies remains undescribed. This study used WGS to identify genetic variants in a validated spontaneous animal model of GGE with absence seizures, the GAERS rat. Like the human condition, the GAERS rat appears to have a complex genetic architecture with the F 2 progeny of the GAERS and NEC supporting polygenic determinants.
Estimates show that approximately 40% of the euchromatic rat genome is part of the eutherian core (orthologous bases to both mice and humans) [36] with 97.5% of human ion channel genes, the most heavily implicated set of genes in epilepsy, having a characterized rat ortholog [37].
This sequencing study identified genetic variants that were found in the GAERS strain that were absent in the rattus norvegicus reference genome (BN rat). This was independently repeated for the NEC strain and here it would be able to detect epilepsy risk factors that might be shared between the GAERS and the BN rat. We subsequently compared the GAERS and NEC strain to find genetic variants that differed between the two inbred lines. Both strains originated from the same outbred Wistar rat colony, which selectively bred based on the presence (GAERS) or absence (NEC) of spontaneously arisen absence seizures and the corresponding SWD signatures.
We then screened strain-specific GAERS and NEC variants with variants found among F 2 rats with either high-or non-seizing phenotypes [23]. To both identify candidate variants for the seizure phenotype and to remove likely background rat genome variation, we focused on the GAERS-specific and NEC-specific variants that were present in less than~10% of the previously sequenced open-access rat strains [31,32]. Subsequently, to better prioritize variants with potential effects on the epilepsy phenotype we screened those candidate variants through the whole genome sequence data generated on four F 2 rats. We selected two F 2 rats at each extreme (two high-seizing F 2 and two non-seizing F 2 ) of the GAERS and NEC seizure phenotype spectrum. Through this screen we identified six GAERS-specific variants that were transmitted to the high-seizing but not the non-seizing F 2 animals. Conversely, we identified fourteen NEC-specific variants that were transmitted to the non-seizing but not the high-seizing F 2 animals. Beyond the selected for Cacna1h missense variant [23] this approach did not identify additional candidate variants of immediate interest to the phenotypes.
In the NEC filtering of the specific variants, three additional common variants (seen in over five existing rat strain sequences-S2 Table) were found in genes that have been associated with epilepsy and segregate with the two non-seizing F 2 rats; Abat (4-aminobutyrate aminotransferase), Cyp11b3 (11-beta-hydroxylase) and Cyp11b2 (aldosterone synthase) [3,12,[38][39][40][41][42][43][44]. While we do not yet have data to support any relevance of these observations, the NEC-specific variants are of interest since it is known that many rat strains will develop spontaneous SWDs, particularly during mature adulthood, while inbred NEC do not record any SWDs or absence seizures during the course of their life [13].
Spontaneously occurring animal models of complex human disease present a powerful opportunity to better understand the genetic architecture and disease pathogenesis beyond what is possible from monogenic animal models of disease. This work illustrates the application of whole genome sequencing on an inbred spontaneously-occurring epilepsy rat strain coupled with sequencing high-and non-seizing F 2 rats to identify a subset of genetic variants that might individually or collectively associate with the epilepsy phenotype. In this study we cannot rule out contributions coming from common variants with small-modest effects on absence epilepsy risk that are shared across many of the currently available rat strains. While we were able to narrow down individual candidate variants based on F 2 rat sequencing, this study is not equipped to quantitatively assess the contribution of the GAERS and NEC variants to their corresponding phenotypes. Future studies that generate sequence data across the full spectrum of seizure phenotypes observed in the F 2 rats is required to perform quantitative assessments of possible oligogenic/polygenic interactions.
The approach used to narrow down candidates is applicable to other animal models that have genetic predisposition to either genetic, like WAG/Rij, or acquired epilepsies. For instance, this whole genome sequence approach can be applied to try to identify susceptibility genes to the development of acquired epilepsy in the FAST (kindling-prone) and SLOW (kindling-resistant) rat strains. Like the GAERS and NEC, the FAST and SLOW rats are a result of selective breeding of genetic predisposition to develop focal seizures in the kindling model [45].

Conclusions
Overall, this study illustrates that the sequencing of naturally occurring inbred animal models of human disease, when a related inbred control strain is available, can facilitate addressing one of the long standing limitations encountered when investigating polygenic diseases in human populations. Additional F 2 sequencing along with extensive future studies of these SNVs, indels, and structural variants in larger cohorts of carefully characterised GAERS and NEC F 2 rats could identify additional variants that may contribute to the risk of absence epilepsy, and the contribution that each gene, individually and in combination, has to the variability of seizure characteristics that are observed in the F 2 rats.
Supporting information S1