Targeted high throughput sequencing in hereditary ataxia and spastic paraplegia

Hereditary ataxia and spastic paraplegia are heterogeneous monogenic neurodegenerative disorders. To date, a large number of individuals with such disorders remain undiagnosed. Here, we have assessed molecular diagnosis by gene panel sequencing in 105 early and late-onset hereditary ataxia and spastic paraplegia probands, in whom extensive previous investigations had failed to identify the genetic cause of disease. Pathogenic and likely-pathogenic variants were identified in 20 probands (19%) and variants of uncertain significance in ten probands (10%). Together these accounted for 30 probands (29%) and involved 18 different genes. Among several interesting findings, dominantly inherited KIF1A variants, p.(Val8Met) and p.(Ile27Thr) segregated in two independent families, both presenting with a pure spastic paraplegia phenotype. Two homozygous missense variants, p.(Gly4230Ser) and p.(Leu4221Val) were found in SACS in one consanguineous family, presenting with spastic ataxia and isolated cerebellar atrophy. The average disease duration in probands with pathogenic and likely-pathogenic variants was 31 years, ranging from 4 to 51 years. In conclusion, this study confirmed and expanded the clinical phenotypes associated with known disease genes. The results demonstrate that gene panel sequencing and similar sequencing approaches can serve as efficient diagnostic tools for different heterogeneous disorders. Early use of such strategies may help to reduce both costs and time of the diagnostic process.


Introduction
The spinocerebellar degenerative disorders; hereditary ataxias (HA) and hereditary spastic paraplegias (HSP) are heterogeneous disorders causing progressive gait difficulties due to degeneration of the cerebellum, corticospinal tracts, brainstem, and/or spinal cord [1]. These disorders are relatively rare with an estimated total prevalence of 13.9/100,000 in southeast Norway [2]. HA is characterized by progressive limb and gait ataxia, loss of coordination and disturbances of speech and oculomotor control. HSP is characterized by progressive spasticity a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 and weakness of the lower limbs, the weakness often being mild relative to the spasticity [1,2]. Onset is reported at all ages, and all monogenic modes of inheritances-autosomal dominant, autosomal recessive, and X-linked-have been identified [3]. To date, pathogenic variants in more than 100 genes have been identified in spinocerebellar degenerative disorders [4][5][6][7]. Identifying molecular diagnoses in such genetically heterogeneous disorders is challenging. Usually multitier, expensive and time-consuming investigations are performed. Nevertheless, a large number of affected individuals remain without a molecular diagnosis.
With the progress in sequencing technologies, there are several methods available to screen hundreds or thousands of genes at once and possibly identify a molecular diagnosis in a shorter time period at lower costs. Gene panel sequencing (GPS) or targeted high throughput sequencing, whole-exome sequencing (WES), and whole-genome sequencing (WGS) methods are currently being used by researchers and diagnostic laboratories. These methods have different advantages related to quality and interpretation of data, management of ethical issues, and economic effectiveness. Besides other high throughput sequencing methods, GPS has been proven successful in several heterogeneous neurological disorders [8][9][10].
In the present study, we have evaluated the use of GPS in 105 clinically well-characterized probands affected with HA or HSP in whom previous extensive investigations had failed to identify a genetic cause. The study provides insights into the value of this diagnostic strategy and illustrates the diversity of genetic causes of spinocerebellar degenerative disorders.

Participants
In 2002, a research study was initiated at the Department of Neurology, Oslo University Hospital, carefully registering patients with HA and HSP in Norway. In 2014 the database consisted of 683 individuals with a diagnosis of HA and HSP, of whom 446 were probands [2]. The database has been designed to comprehensively cover the South-Eastern Norway health region where 55.8% of the Norwegian population lives. In addition, patients have been referred from the rest of the country since 2002. Main inclusion criteria for HA were cerebellar gait and/or limb ataxia, and for HSP, spasticity in the lower limbs, brisk reflexes and positive Babinski sign [11,12]. In addition, most of the included probands had a known family history of disease. A minority had sporadic disease, which after thorough investigation was considered compatible with a hereditary type of spinocerebellar degenerative disorder. 17% of the HA probands and 37% of HSP probands had an exact genetic diagnosis (Fig 1) at start of the present study. Molecular investigations were carried out according to what was diagnostically available at the time of examinations. All HA probands were previously screened for SCA1, SCA2, SCA3, SCA6, SCA7, and for Friedreich ataxia in recessive and sporadic cases. HSP probands were screened for variants in the genes linked to SPG4, SPG3A, and most also for SPG31. To detect gene-dosage defects, multiple ligation-dependent probe amplification (MLPA) was performed in all HSP probands for SPG4 and SPG3A. Additional molecular tests were performed depending on the phenotype and the pedigree structure, including variants in the genes linked to; SPG7, SPG1, SPG2, FXTAS, POLG, SCA8, SPG11, AOA1, AOA2, Ataxia Telangiectasia, ARSACS, SPG8, DRPLA, and SPG42. Array comparative genomic hybridization (aCGH) was performed in all probands with cognitive impairment. Also, biochemical tests for metabolic disorders such as adrenoleukodystrophy and gangliosidosis, as well as biomarkers as carbohydrate-deficient transferrin, albumin, cholesterol, gamma globulins, alphafetoprotein and vitamin E were tested when relevant. Brain magnetic resonance imaging (MRI) was performed in most of the probands.
According to the protocol, 105 of the 328 probands without molecular diagnosis in the database could be selected for analysis in the study. They were selected from the database according to the following criteria: 1. Verified family history, 2. Completed thorough investigations, including screening for differential diagnoses and the above mentioned molecular analyses, and 3. Availability of probands. All probands (n = 89) fulfilling these three criteria were included. In addition,16 sporadic cases where other causes had been excluded, and HSP or HA remained the most likely diagnosis were included (see Fig 1, and Table 1). To validate our study design, we also included eight samples from the database with known pathogenic variants as positive controls (S1 Table). This project was approved by the Regional Committee for Medical and Health Research Ethics, southeast Norway under ethical agreement REK 2010/ 1579a. Written informed consent was obtained from all study participants.

Molecular genetics and bioinformatic analyses
The SureDesign tool (Agilent Technologies, Santa Clara, CA) was used to create a Haloplex custom gene panel targeting 159 genes (including the 10-bp flanking sequence on both sides of each exon) involved in different neurodegenerative disorders. The gene panel included 91 genes (S2 Table) reported to be definitively or possibly implicated in classical HA and HSP presentations at the time of study design (January 2014). Preparation of DNA pools from ten individuals was carried out as described before [13]. Target enrichment was performed according to the instructions of the Haloplex Target enrichment system for Illumina Sequencing Version D.5, May 2013. 100bp paired-end sequencing using a single lane on an Illumina HiSeq2000 instrument (Illumina, Santa Clara, CA) was performed at the Norwegian Sequencing Centre, Oslo. We also sequenced and analyzed 230 healthy controls using the same approach. The in-house bioinformatic pipeline has been described in details elsewhere [13]. For the variant filtering process, we considered only nonsense and missense variants, indels, and variants at canonical splice sites, excluding variants with minor allele frequency greater than 0.01 in different public and local resources; 1000g data (http://www.1000genomes.org), Exome Sequencing Project (ESP, http://evs.gs.washington.edu/EVS/), Exome Aggregation Consortium (ExAC, http://exac.broadinstitute.org) data, 176 ethnically-matched in-house exomes, and the 230 ethnically-matched internal controls. Moreover, we used the combined annotation dependent depletion (CADD) [14] tool to predict possible functional effects of a variant. We used a cut-off value of Phred-scaled CADD score >12, based on the value found for previously known pathogenic variants in our positive controls (S1 Table), as well as documented elsewhere [15]. The variants were examined by visual inspection of the sequence alignment/map format files to remove sequencing errors. Available non-affected and affected family members were tested for segregation of identified variants in the respective families. For any identified variant, all kinds of phenotypic presentations were considered in order to allow the clinical variability. After the initial filtering process, we followed the guidelines to interpret sequence variants provided by the joint consensus recommendations of the American College of Medical Genetics and Genomic (ACMG) and the Association for Molecular Pathology (AMP). This recommends the use of specific standard terminology to classify sequence variants into different classes; pathogenic, likely-pathogenic, and variants of uncertain significance (VUS) [16]. We will refer to these criteria as the ACMG criteria. All the presented variants have been submitted to the Leiden Open Variation Database (LOVD) server (http://databases.lovd.nl/shared), and any additional information on the sequencing data can be shared on request.

Sanger sequencing
Variants identified by GPS were confirmed and validated by Sanger sequencing (S1 Appendix). In order to investigate the location of the variants in the genome, as well as to assign evolutionary conservation score (PhyloP) and functional predictions to the variants by several insilico programs (Polyphen2, SIFT, and MutationTaster), Alamut 2.8.0v (http://alamut. interactive-biosoftware.com) was used.

Clinical presentation
The clinical characteristics of the 58 HA and 47 HSP probands are described in Table 1. The inheritance pattern was presumed autosomal dominant (AD) in 68 (65%) and autosomal recessive (AR) in 21 (20%). Sixteen (15%) were sporadic (SPO) cases. Of the autosomal recessive probands, seven had consanguinity in the family history. The clinical phenotype was pure in 45% and complex in 55% of cases. Four HA probands presented with episodic ataxia. The average age of onset was 30.7 years, with a range from birth to 79 years of age. 39% of the probands had childhood onset of disease, with first symptoms starting before 18 years of age. Disease duration at inclusion in the database [2] was on average 22 years with a range from 2 to 72 years.

Genetic analysis
High quality sequencing data was obtained with an average of 99% bases covered >80x in the targeted regions (S1 Fig). Our bioinformatic analyses identified 1182 variants, including single nucleotide variants and indels. All eight positive controls were identified in the data, confirming the sensitivity of the used method (S1 Table). By applying our filtering criteria and the ACMG guidelines for variant classification, we identified 20 probands (19%) carrying pathogenic and likely-pathogenic variants ( Table 2). The allele frequencies of these variants in local and public databases are presented in S3 Table. Ten probands (10%) were identified with VUS (Table 3). Together these accounted for 30 probands (29%). Of these, 16 are from HA and 14 from HSP categories ( [17][18][19][20][21][22][23][24], while the rest of the variants found are categorized as novel pathogenic or likely-pathogenic variants ( Table 2). Of the pathogenic and likelypathogenic variant carriers, 12 probands belonged to the childhood-onset category (<18 years), and eight had adult-onset, resulting in a diagnostic yield of 29% and 12.5% in the respective categories. The average disease duration in probands with identified pathogenic and likely-pathogenic variants was 31 years (range 4-51 years) ( Table 4). Diagnostic rates for different categories such as AD, AR, SPO, consanguinity, pure, and complex forms of the disease are presented in S4 Table. In all the 20 families with identified pathogenic or likely-pathogenic variants, the clinical symptoms and findings were concordant with previously published descriptions of the respective corresponding disorders. The phenotypic details of these 20 probands are documented in Table 4 as well as in S2 Appendix.
SPG30. Two novel variants in the KIF1A gene (SPG30, MIM 610357), p.(Ile27Thr) in probands HCT-024 (III-7) and p.(Val8Met), in HCT-026 (IV-6) were identified. Both variants segregated with the phenotype in these families with an autosomal dominant inheritance pattern (Table 2; Fig 2a and 2b). In the family of proband HCT-024 there were eight affected individuals in four successive generations (Fig 2a). DNA samples were available from five affected individuals with a pure HSP phenotype for segregation analysis, which revealed that all five carried the variant. Five individuals without subjective symptoms were also tested, of which one (III-8) carried the variant (Fig 2a). At the age of 31 years this subject had increased reflexes in the lower limbs. This was interpreted as a possible sign of disease, but extensor plantar reflex was not observed. Both families with KIF1A variants presented with a childhood onset, slowly     progressive spastic paraplegia ( Table 4). None of the affected individuals in these families had signs of cognitive impairment, ataxia or neuropathy, which may be present in complex HSP phenotypes. ARSACS. Two novel homozygous variants, p.(Gly4230Ser) and p.(Leu4221Val) in the SACS gene were identified in proband HCT-106 (V-3), presenting an autosomal recessive SACS (ARSACS, MIM 270550) phenotype (Fig 3a, Table 4). There was consanguinity in this ethnic Norwegian family, and both variants were homozygously present in the only affected member of the family (Fig 3a). It is difficult to determine which variant is causing the disease, or whether both are involved. Both variants are extremely rare and were predicted to possibly affect protein function, although the evidence is stronger for the p.(Gly4230Ser) variant by several in-silico predictions ( Table 2). The proband HCT-106 experienced slowly progressive clumsiness, and unsteadiness from 15 years of age. Brain MRI at ages of 37 and 44 years revealed general cerebellar atrophy with no signs of pontine linear hypointensities, as well as normal cervical cord and corpus callosum (Fig 3b, 3c, 3d and 3e). No retinal changes were found by fundoscopy or optical coherence tomography ( Table 4).
Variants of uncertain significance. Furthermore, we identified ten VUS in eight genes (Table 3). In five of the ten probands with VUS, the phenotype was considered to be concor-  Table 3). The phenotypic details of all VUS are described in S2 Appendix. Eight of these ten variants were found with a very low allele frequency in ExAC, including the five variants with concordant phenotypes. Variants located in BEAN1, RTN2, and TTBK2 are categorized under this category-mainly because the disease mechanism due to the missense variants has not been previously either established or well-consolidated in these genes (Table 3). Further independent reports and/or functional studies are warranted to establish whether these VUS could be relevant to the disease in these probands.

Discussion
The brain is the most complex and sophisticated organ in our body. 84% of the human genes are expressed in the brain [25]. A small perturbation in the expression of genes in the brain could lead to serious consequences and a number of neurological disorders including HA and HSP. Today, routine investigation of these disorders often involves a large number of serial independent molecular tests after the clinical diagnosis has been made. Certain mutations are very common in some populations, thus narrowing down the required number of tests. Other populations show high numbers of rare genotypes, as so far seen in the Norwegian ataxia population [2]. A correct molecular diagnosis is important for affected individuals, providing certainty, preventing unnecessary diagnostic tests and giving access to relevant supportive therapies and genetic counseling.
By using high throughput sequencing methods, the time from disease onset to the identification of molecular diagnoses may be substantially reduced. In the probands that were diagnosed in this study, there was notable average disease duration of 31 years. Our results therefore confirm that GPS based diagnostics or similar sequencing methods should be used earlier in the diagnostic process. However, trinucleotide expansion disorders (SCA1,2,3,6,7 and Friedreich ataxia) are relatively frequent in most HA cohorts, and such expansions are generally not detectable by high throughput sequencing techniques [26,27]. As suggested in guidelines, the most frequent trinucleotide expansions should be tested initially in HA [28], and if negative GPS and similar methods may be considered as the next level of investigation. GPS has some advantages compared to WES and WGS. Firstly, this method provides highquality sequencing data with excellent coverage of the selected genes. This means that the method can reliably identify variants. Previous studies using WES and WGS have demonstrated that a considerable proportion of coding regions of genes harboring disease-related variants are not covered [29][30][31]. Secondly, GPS can limit the genetic incidental findings that can raise issues of ethical approval and communication of the findings to the affected individuals or guardians. Recently, Neveling et al [32] reported that 10% of the families did not provide consent for DNA testing during pre-counseling because of the risk of incidental findings. On the other hand, pre-and post-counseling can be conveniently offered to the small minority of probands or families concerned about the incidental findings after WES or WGS analysis. However, there are guidelines and recommendations available on how to report incidental findings [33].
This study revealed a definitive molecular diagnosis in 19% of probands, a sizeable yield, particularly taking into account that this cohort was previously extensively investigated by a series of molecular and biochemical analyses. Previous studies have revealed a variable scale of diagnostic power. According to one study, 18% molecular diagnosis was achieved by studying 50 childhood and adult-onset HA probands with GPS [34]. In another study, a diagnostic yield of 25% was attained by GPS in SPG4-negative HSP cases [35]. A diagnostic yield of 21% was achieved by WES in a cohort of sporadic and familial HA cases [36]. Pyle et al [37] presented 64% diagnosis by WES in a mixed cohort of HA, although the number of probands (n = 22) screened was very low. Kara et al [38] performed a combination of Sanger and clinical exome sequencing in a cohort of complex HSP cases and found plausible genetic defects in 49% with overwhelming majority (31%) of SPG11 cases. Another clinical exome sequencing study in a cohort of HSP and HA revealed 22-34% range of diagnostic yield [39]. The clinical characteristics of the studied cohort can affect the variable diagnostic yield found in different studies. This is demonstrated by the higher diagnostic yield seen in childhood-onset cases (29%) as compared to adult-onset (12.5%) in our study, as is also seen in previous studies [40,41]. However, our study cohort consisted of previously extensively diagnosed probands, which introduces a selection bias compared to naive patient populations.
A large number of cases remained unsolved. There are several possible reasons that could contribute to this. Firstly; a subset of probands might have been explained by causal variant in novel HA/HSP genes that are yet to be identified or were found during the study period. Such newly identified genes can be added into gene panels on a regular basis. Secondly; some disease-causing variants might be localized to the non-coding part of DNA. Thirdly; somatic variants, also including mosaicism could be the cause in some of the individuals. Fourthly; coding variants might have been missed due to problems related to target capture, sequencing, bioinformatic analyses or our data filtering strategy. The DNA pooling strategy used in our study might have caused a reduced sensitivity to identify certain variants, although our present studies have found high sensitivity of our protocol [13]. In general, current high throughput sequencing technologies are less efficient for identification of indels as well as large-scale copy number variations (CNV) than single nucleotide variants, and our chosen study design has limitations in this regard. Of note, in one of the probands in our study, a parallel WES study has identified an in-frame deletion in SPTBN2 that was not detected by our bioinformatic analyses, but was witnessed upon direct inspection of aligned reads. On the other hand, we identified a molecular diagnosis in two probands HCT-020 (SPG4, SPAST) and HCT-049 (SPG31, REEP1) where the pathogenic variant was not identified by previous conventional single gene sequencing, further highlighting the quality and comprehensiveness of the method used here.
Our bioinformatic analysis was unbiased in the sense that we looked for variants independent of known inheritance patterns. This leads to some interesting findings, further expanding and/or confirming the clinical and genetic heterogeneity and phenotypic spectrum for certain entities. The KIF1A gene was initially reported in autosomal recessive HSP (SPG30) [42]. However, recently several independent reports have identified variants in this gene in autosomal dominant forms of HSP (MRD9, MIM 614255). Twenty-two probands with de novo variants are reported with complicated form of HSP including a recent case of PEHO syndrome (MIM 260565) [43]. However, a pure HSP phenotype has previously been presented in one family, with a dominantly segregating variant, p.(Ser69Leu) [44]. In a most recent study, two additional segregating dominantly inherited variants, p.(Tyr74Cys) and p.(Gln632 Ã ) have been identified [39]. In our study, we have identified two dominantly segregating KIF1A variants, p.(Val8Met) and p.(Ile27Thr), in two independent families. This further confirms the dominant mode of inheritance and allelic heterogeneity associated with KIF1A. Both variants are located within the functional motor domain of the KIF1A protein. Interestingly, affected individuals of both of our families presented with pure HSP with a childhood onset of the disease, concordant with the reported families in which dominant inherited variant was found. Based on these findings, we suggest testing KIF1A in HSP regardless of the phenotypic variability and inheritance pattern.
We identified one proband with two homozygous missense variants, p.(Gly4230Ser) and p.(Leu4221Val) in SACS with a relatively slowly progressive recessive spastic ataxia with onset in the teens. The phenotype was consistent with the mild ARSACS phenotype often seen in non-Quebec-born individuals, with late-onset and absence of the characteristic retinal findings described in Quebec-born ARSACS individuals. Radiologically, the findings were stable over the last seven years with cerebellar atrophy. Remarkably, the brain MRI showed no signs of the previously described characteristic features of ARSACS [45]. This demonstrates that the clinical course was not sufficient for diagnosis, and systematic unbiased methods such as GPS could identify atypical or previously unreported phenotypes.
We have found ten variants that are categorized as VUS. Some uncertainty regarding the involvement of these variants in disease will remain until further individuals are reported from other studies and/or specific functional data from in-vitro or in-vivo studies become available.
It is well-established that HSPs and HAs often overlap, both clinically and genetically. While performing molecular diagnosis, the choice of gene panel for these disorders is critical. In most of the contemporary GPS or clinical exome studies, the gene panel selection has been variable. Our gene panel covered a broad range of genes-known to be involved in spinocerebellar degenerative disorders at the time of study. By developing a broad gene panel, one can avoid spending additional costs and time on single gene analyses or different limited/sub gene panels that are usually commercially available. Overall, because of the recent advancement in sequencing technologies, cost is less of an issue when it comes to broad gene panels or clinical exome sequencing/WES, as several parallel cheap and efficient sequencing methods are available today. Conversely, repeated update of the gene panels can increase the total costs as compared to WES, which is a downside of the GPS. Moreover, in case of clinical exome sequencing, with an updated ethical approval the WES data can be re-analyzed later to further explore novel genes responsible for disease in undiagnosed cases: this cannot be done with GPS and is an obvious limitation of the GPS method.
In conclusion, GPS and similar sequencing methods are effective choices for diagnostic procedures in order to reduce the duration to obtaining a correct molecular diagnosis. To date, these procedures are not available or implemented in most clinics in the world, and consequently many affected individuals lack a specific genetic diagnosis. A similar strategy is relevant for other heterogeneous neurological disorders. The affected individuals from different categories; childhood to adult-onset, familial-to-sporadic and pure-to-complex phenotypes can benefit and be diagnosed earlier using modern high throughput sequencing technologies.