Whole-genome sequencing suggests mechanisms for 22q11.2 deletion-associated Parkinson’s disease

Objectives To investigate disease risk mechanisms of early-onset Parkinson’s disease (PD) associated with the recurrent 22q11.2 deletion, a genetic risk factor for early-onset PD. Methods In a proof-of-principle study, we used whole-genome sequencing (WGS) to investigate sequence variants in nine adults with 22q11.2DS, three with neuropathologically confirmed early-onset PD and six without PD. Adopting an approach used recently to study schizophrenia in 22q11.2DS, here we tested candidate gene-sets relevant to PD. Results No mutations common to the cases with PD were found in the intact 22q11.2 region. While all were negative for rare mutations in a gene-set comprising PD disease-causing and risk genes, another candidate gene-set of 1000 genes functionally relevant to PD presented a nominally significant (P = 0.03) enrichment of rare putatively damaging missense variants in the PD cases. Polygenic score results, based on common variants associated with PD risk, were non-significantly greater in those with PD. Conclusions The results of this first-ever pilot study of WGS in PD suggest that the cumulative burden of genome-wide sequence variants may contribute to expression of early-onset PD in the presence of threshold-lowering dosage effects of a 22q11.2 deletion. We found no evidence that expression of PD in 22q11.2DS is mediated by a recessive locus on the intact 22q11.2 chromosome or mutations in known PD genes. These findings offer initial evidence of the potential effects of multiple within-individual rare variants on the expression of PD and the utility of next generation sequencing for studying the etiology of PD.


Introduction
Over the past two decades new knowledge has elucidated a genetic basis for an increasing proportion of patients with Parkinson's disease (PD) [1]. We identified the hemizygous 22q11.2 microdeletion associated with 22q11.2 deletion syndrome (22q11.2DS; OMIM #192430, #188400) as a novel genetic risk factor for neuropathologically confirmed, L-dopa responsive early-onset PD [2]. This structural variant has since been found to be significantly enriched in early-onset PD cohorts and may account for~0.5% of cases [3]. Genetic variants on the intact 22q11.2 chromosome, or genome-wide outside of the deletion region, may influence the likelihood of expression of PD in 22q11.2DS. Using this genetic model to identify such variants could provide clues to variants that affect susceptibility to idiopathic forms of PD and general disease mechanisms.
As a proof-of-principle to assess this model, we used whole-genome sequencing (WGS) data, adapting a strategy successfully used to study expression of schizophrenia in 22q11.2DS [4]. We compared variants from 22q11.2DS patients with early-onset PD [2] to those without PD. To maximize statistical power in this initial study, we investigated rare variant burden for gene-sets with higher a priori likelihood of contributing to PD risk. These included variants affecting candidate genes in the 22q11.2 deletion region, such as COMT, SEPT5, and six mitochondrial function genes, as well as other genome-wide PD-relevant gene-sets. We also investigated common variant contribution using a polygenic risk score model. We found evidence for rare variants outside the 22q11.2 region perturbing gene networks relevant to PD, supporting the utility of this genetic model for early-onset PD.

Whole-genome sequencing
The overall WGS approach was based on methods used in recent studies of schizophrenia [4] and autism [9]. Detailed laboratory and bioinformatics methods are described elsewhere [4]. In brief, genomic DNA was extracted from whole blood and sequenced using the Complete Tetralogy of Fallot (PD2); ventricular septal defect (NPD1); ventricular septal defect and atrial septal defect (NPD4, NPD6) c Lifetime history; Other psychiatric disorders were obsessive compulsive disorder (NPD1), generalized anxiety disorder (NPD5, NPD6) d Genomics platform (pipeline and assembly version 2.2). On average, 99.0% of the genome was covered with at least 5x sequence depth relative to the NCBI build 37 human genome reference sequence. Of the exome, 94.4% and 72.3% was covered with at least 20x and 40x sequence depth, respectively. Only high-quality variants, defined as those with Complete Genomics "highquality" variant call scores and that also met stringent in-house quality criteria [4,9] were included. The effects (e.g., nonsense, missense, or frameshift mutations) and classifications (e.g., in exonic, intronic, or intergenic regions) of variants across the genome were annotated using ANNOVAR (November 2014) software [11]. Rarity was annotated using the three major publicly-available reference datasets based on WGS and whole exome sequencing, i.e., 1000 Genomes [12], NHLBI-Exome Sequencing Project [13] and Exome Aggregation Consortium [14], and two in-house platform-matched whole-genome reference data-sets. We defined rare variants using a standard rarity threshold of those not exceeding the 1% alternate allele frequency threshold (minor allele frequency [MAF]<0.01) in all datasets, including specific ethnic subgroups.

Non-synonymous and structural variants
We examined rare single-nucleotide sequence variants (SNVs) and in/dels affecting coding genes that included loss-of-function (stop-gain/nonsense, frameshift, and core splice site alterations) and missense mutations categorized as deleterious using standard variant impact predictor tools and genomic conservation indexes, as reported previously [4,9]. Missense variants were predicted damaging (deleterious) when they met at least four of seven criteria: high conservation by PhyloP placental mammal (!2.3) and PhyloP 100vertebrate (!4) [15] and predicted damaging by SIFT 0.05 [16], PolyPhen2 !0.90 [17], Mutation Assessor !1.9 [18], CADD Phred score !15 [19], and MutationTaster score >0.5 [20]. We restricted the analyses to diploid regions of the genome with the exception of the 22q11.2 deletion to help manage the risk of false positives.
Copy number variant and other structural variants detected by Complete Genomics pipeline and assembly version 2.2 were annotated and filtered as described elsewhere [4,9]. We considered only high-quality rare variants that overlapped a coding gene exon of a RefSeq gene [4,9]. 22q11.2 deletions were confirmed in all patients (Table 1).

Gene-set analyses
The analysis was restricted to a candidate gene approach of rare variants involving coding regions of the genome. Rare coding sequence variants are enriched for those that are deleterious and have a moderate to large effect on disease risk. We have previously shown that this approach, even for the small sample selected for our pilot WGS study of 22q11.2DS, can yield substantial effect sizes for a neuropsychiatric phenotype [4]. Three hypothesis-driven gene-sets were selected a priori for testing based on proposed possible PD mechanisms in 22q11.2DS [2,21]: (1) 22q11.2 deletion region genes (n = 46) [22]; (2) known causative and PD risk candidate genes (n = 43); and (3) genes identified as functionally-relevant to PD (n = 1000) based on physical protein interactions and co-expression with known PD gene candidates, excluding the 22q11.2 region and the known causative and risk PD genes that were examined as separate gene-sets.
The known candidate PD gene-set (S1 Table) included genes described in the scientific literature and/or, OMIM entries for Parkinson's disease (#168600), and from the PDgene database of genome-wide association study common variant findings (www.pdgene.org; accessed October 2015) [23,24]. We used this approach [25] because rare and common variants can involve the same PD risk gene (e.g., SNCA) [1,23,24]. We opted to assess all genes (e.g., confirmed, unconfirmed, and rarely) reported to be involved in Parkinson's disease in order to 22q11.2 deletion-associated PD mechanisms limit the possibility of a false negative with respect to susceptibility to Parkinson's disease in patients with 22q11.2DS.
The gene-set of 1000 "PD-relevant" genes was generated using the genome-wide candidate gene prioritization tool, Endeavour [26]. Endeavour prioritizes genes by assessing how similar they are to genes already known to be involved in the process of interest [26][27][28]. To generate a genome-wide ranked list of candidate "PD-relevant" genes, the 43 genes comprising the known PD gene-set (S1 Table) were used as input training genes with data imported from standard annotation databases (Gene Ontology, Kegg, SwissProt), protein-protein interaction databases (BIND, BioGRID, Hprd, InNetDb, Intact, Mint), and a human gene expression database [29]. This yielded 1561 ranked genes mapped to current unique HGNC gene symbols (S2 Table). We a priori restricted the PD-relevant gene-set to the highest-ranked 1000 genes ("top 1000 genes"; S2 Table) to help prevent the inclusion of genes with weak evidence for involvement in PD-related function, while including a sufficient number of genes to perform a proofof-principle genome-wide burden analysis [4]. As a negative control, we generated 10 000 random gene-sets composed of 1000 "non-PD relevant" genes each from a set of 14 489 genes that met the following criteria: (1) at least one Gene Ontology [30] annotation with 500 genes, to ensure that the genes were annotated with a functional term that was sufficiently specific, and (2) were not in the known candidate PD gene-set or the PD-relevant gene-set (top 1000).

Common variant polygenic risk score
We calculated the polygenic score using the approach previously described for 22q11.2DSassociated schizophrenia [4] and the International Schizophrenia Consortium [31]. Our analysis was restricted to the top 10 000 most significant SNPs with original nominal association pvalues and odds-ratios that are publicly available from PDgene [23,24]. Of these, 3534 SNPs were mapped to variants passing quality filters in all nine genomes assessed in this study. Allele counts were computed as the number of alleles that matched to the allele used for association analysis. The SNP-wise risk score was calculated as the product of the allele count and the log (odds-ratio). The polygenic risk score [4] for each 22q11.2DS subject was calculated as the sum of all respective SNP-wise risk scores using nominal association p-value thresholds 1e -3 and 1e -5 , as well as one more stringent ( 1e -7 ) corresponding to 3410, 1315, and 388 SNPs each.

Statistical analyses
We used one-sided independent t-tests in these exploratory analyses to assess rare variant burden count and common variant polygenic risk score between groups, performed with SAS version 9.4 software. Statistical significance was defined using nominal uncorrected p<0.05. The false discovery rate (FDR) was calculated using the Benjamini-Hochberg procedure. Table 1 summarizes the demographic, clinical, and 22q11.2 deletion data for the subjects studied. No feature of 22q11.2DS, apart from the presence of neuropathologically-confirmed PD, was unique to the 22q11.2DS-PD cases. The clinical heterogeneity among the subjects was typical for 22q11.2DS [32][33][34]. Expression of schizophrenia was similarly common (66.6%) in both groups. Among the subjects with PD, there was a history of borderline intellectual disability and an isolated seizure in one subject and a congenital heart defect in another. Major comorbid conditions among the patients without PD included intellectual disability (n = 2 mild; n = 4 borderline), seizures (n = 3 recurrent; n = 1 single), congenital heart defects (n = 3), and other psychiatric disorders including obsessive compulsive disorder (n = 1) and generalized anxiety disorder (n = 2). Patients in the PD group were diagnosed with 22q11.2DS at an older mean age (49.6±4.9 years) than those without PD (29±14.9 years, p = 0.048). One subject with PD had a 1.4 Mb nested proximal 22q11.2 deletion (S3 Table). The deletion breakpoints were consistent with typical deletions in the other eight subjects (n = 7, 2.6 Mb deletion including two with PD; n = 1, 2.9 Mb deletion; S3 Table). There were no deleterious nonsynonymous variants near the 22q11.2 deletion breakpoints (i.e., within 4 Mb).

Results
WGS results for the 22q11.2 deletion region revealed only a single rare missense variant involving different brain-expressed genes in each of three subjects (Table 2). None involved a putative 22q11.2 region PD candidate gene [2,21] and the variants in cases PD1 and PD3 involved genes TRMT2A and DGCR2, encoding proteins of largely unknown function (Table 2) [35,36].
With respect to the gene-set of 43 putative candidate PD genes (S1 Table), no deleterious nonsynonymous variants (Table 2), or copy number or other structural variants, were identified in any of the nine subjects.
Overall the genome-wide burden of rare deleterious missense variants in coding sequence genes was non-significantly greater in the PD cases than in the patients without PD (Table 2). We assessed a candidate gene-set functionally restricted to the top 1000 genome-wide ranked PD-relevant genes, excluding those in the 22q11.2 deletion region or in known PD genes. This revealed a greater burden of rare deleterious missense variants in the 22q11.2DS-PD cases (nominal p = 0.03; FDR = 10%; Table 2). The result remained significant (p = 0.04) after correcting for the total number of rare deleterious missense variants per subject. To test the specificity of these findings, we assessed between-group differences for burden of rare deleterious missense variants in 10 000 random non-PD relevant gene-sets. We found that <5% of the resulting p-values were less than the p-value yielded from the top 1000 genome-wide ranked

Genome-wide total
Loss-of-function variants  Table) e A false positive missense variant in PARK2 (subject PD3) was not confirmed: Sanger sequencing showed no mutation [2]. f Top 1000 genome-wide genes ranked as potential PD-relevant genes using the genome-wide candidate gene prioritization tool, Endeavour https://doi.org/10.1371/journal.pone.0173944.t002 22q11.2 deletion-associated PD mechanisms PD-relevant genes (p = 0.03), suggesting the results were specific to the PD-relevant gene-set. Results for the few loss-of-function variants were non-significant (Table 2). There were no genome-wide PD-relevant copy number or other structural variants identified [4]. Secondary analyses showed that the genes with these rare missense variants ranked significantly higher in the PD-relevant gene-set in the PD patients (mean rank, 196) than those in the cases without PD (mean rank, 316; p = 0.009), and that burden results remained significant using a higher stringency threshold (top 500 ranked candidates in the PD network gene-set, S2 Table; 22q11.2DS-PD mean = 5.7, SD = 2.1 vs. 22q11.2DS-NPD mean = 2.0, SD = 1.7; p = 0.03).
Also, a negative control analysis using lifetime history of schizophrenia or seizure(s) in place of PD as the phenotype of interest [4] showed no significant differences in PD network missense variant burden between groups at either stringency threshold (schizophrenia: top 1000 ranked PD-relevant genes, p = 0.81; top 500 genes, p = 0.54; seizure(s): top 1000 ranked PD-relevant genes, p = 0.17; top 500 genes, p = 0.77). Table 3 shows the PD-relevant genes with rare nonsynonymous variants in the subjects with PD. These included KLF11, a regulator of monoamine oxidase B expression [37], and MAP2, a neuronal cytoskeletal protein found in Lewy bodies in PD patients (Table 3) [38]. There were two genes with variants that affected more than one case with PD ( Table 3). The same rare variant (MAF 0.01) in LARS2 (rs116826217) was identified in subjects PD2 and PD3. This gene encodes a mitochondrial aminoacyl-tRNA synthetase reported to be significantly down-regulated (~1.45 fold) in substantia nigra dopaminergic neurons of patients with idiopathic PD [39]. Two different rare variants in TTN were identified in subjects PD1 and PD3. Although ranked in the top 100 candidates in the PD-relevant gene-set, TTN is highly polymorphic and one of the largest genes in the genome, thus caution is warranted in the interpretation of the potential pathogenicity of variants [40].
At the most stringent nominal p-value threshold of 1e -7 (corresponding to just 388 SNPs), the mean polygenic risk score was non-significantly greater in the PD cases (-18.6 vs. -23.7; p = 0.17; S1 Fig). A similar non-significant trend with more modest relative differences was observed at more lenient SNP nominal p-value thresholds.

Discussion
This first ever study using WGS in PD provides initial proof-of-principle of the utility of nextgeneration sequencing for studying the etiology of PD, and the potential advantage of using a genetic model, here, 22q11.2DS. The results provide preliminary evidence that genome-wide burden of rare deleterious variants in genes functionally relevant to PD may collectively act to increase the likelihood of expression of PD in the presence of a 22q11.2 deletion. This is consistent with, and extends to PD, findings using this WGS approach for schizophrenia in 22q11.2DS [4]. These findings will require confirmation in adequately powered larger samples.
The results of this proof-of-principle study suggest that hemizygosity of the 22q11.2 deletion region, together with each individual's cumulative genome-wide burden of rare deleterious variants in PD-relevant pathways, with perhaps some modification related to cumulative common variants, may form a "multi-hit" pathway to the expression of PD in 22q11.2DS. Collectively, the findings provide an initial glimpse of the potential for WGS to reveal a more complete view of the complex genetic architecture of PD at the individual level that could potentially be generalizable to other forms of PD. Reduced gene dosage in the 22q11.2 region appears to be a more plausible mechanism for increasing risk of early-onset PD in 22q11.2DS than does the unmasking of a recessive allele on the intact 22q11.2 chromosome.
With this sample size, we could not find evidence of a contribution to risk from pathogenic mutations in well-known or other risk genes for PD, nor loss-of-function mutations affecting 22q11.2 deletion-associated PD mechanisms genes in the broader PD-relevant network. Notably, enrichment of additional rare ATP13A2 variants was recently reported in LRRK2-associated PD [41]. Larger studies of patients with 22q11.2DS-associated PD will help clarify if additional rare variants in known PD genes could impact penetrance of PD in this genetic population. Though non-significant, the finding of a less negative polygenic score in the 22q11.2DS-PD cases suggests the possibility that 22q11.2DS patients who develop PD may have fewer protective PD risk alleles. These preliminary findings provide hypotheses that await testing in larger, well-powered samples that will become feasible as more patients with 22q11.2DS-PD continue to be identified [2,3,[42][43][44][45][46][47][48][49].
The smaller proximal deletion that occurs in about 10% of cases [50,51] was identified in one of the 22q11.2DS patients with PD sequenced in this study [2]. Adequately-powered studies will be needed to investigate the molecular nature of the deletion and the potential impact of neighbouring mutations as a possible source of discordance for PD in 22q11.2DS. Notably, there appears to be no relationship between deletion length and expression of other major neuropsychiatric features in 22q11.2DS [52][53][54]. We found no evidence that nonsynonymous variants near the 22q11.2 deletion breakpoints impacted expression of PD.
The results of this study appear to be consistent with the polygenic genetic architecture expected for a common, complex neurological disorder such as PD. Studies using a comparable design in this and other genetic models of PD (e.g., LRRK2-associated PD) could help to reveal generalizable mechanisms relevant to the reduced penetrance associated with most mutations. Non-genetic factors may also be important to include in future designs. For example, Vitamin D deficiency may increase PD risk [55], and is a common finding in 22q11.2DS related to inadequate levels of parathyroid hormone secretion [32,33,56]. Prospective studies could help clarify if Vitamin D levels may be involved in mediating PD risk in 22q11.2DS.
The small sample size necessitated analyses focused on a targeted candidate gene-set approach. This included reliance on reported PD-associated genes to seed a network to prioritize rare variants across the genome. More individual PD genes are likely to be discovered, and others may be dropped from such lists as data accrue. Although this initial study produced a nominally significant enrichment of rare putatively damaging missense variants in PD-relevant genes among the PD cases, there was no correction for multiple comparisons. We limited the number of tests to the minimum by considering only gene-sets with relevance to PD. These findings will require replication in adequately powered samples. We were underpowered in this preliminary study to include clinical covariates, including other neuropsychiatric phenotypes, in our analyses. The clinical phenotype of 22q11.2DS patients with PD reported to date appears consistent with the variable expression and reduced penetrance of the associated features that is characteristic of 22q11.2DS [2,3,[42][43][44][45][46][47][48][49]. However, there were no significant differences in the PD network variant burden using other major neurological phenotypes (e.g., expression of schizophrenia or seizures) as the grouping variable, suggesting the findings may be specific to PD-relevant genes.
Advances in WGS bioinformatics methods will permit informative analyses of non-coding regions in future studies. The 22q11.2 region includes DGCR8, a key gene in the biogenesis of brain microRNAs, in addition to seven microRNAs [57]. It remains possible that one or more of the cases without PD may go on to develop PD at a later age, limiting the power to determine between-group differences. Evaluating clinical and neuroimaging phenotypes in genetic studies of patients with 22q11.2DS without a diagnosis of PD may help clarify disease risk. We recently found that adults with 22q11.2DS without a diagnosis of PD exhibit olfactory and motor deficits compared with age-matched healthy controls [58]. The true penetrance of PD in adults with 22q11.2DS remains to be reliably estimated, and there are as yet no predictive clinical markers of PD. Longitudinal studies will help resolve such questions. For the six cases without PD, all but one (NPD1) had reached or was beyond the reported age-at-onset range for 22q11.2DS-associated early-onset PD [2,3,[42][43][44][45][46]. The absence of PD pathology was confirmed in one case where brain tissue was available (NPD4) [2], and there was no evidence of nigrostriatal dopamine loss in two others (NPD1 and NPD6) on PET neuroimaging using 11 Cdihydrotetrabenazine relative to controls [58,59].
The results of this study represent an important first step in appreciating how WGS can help to elucidate the genetic etiology of early-onset PD in 22q11.2DS. These findings may have implications for other genetic models of PD and for idiopathic PD. Eventually such molecular results could help to inform early identification and intervention strategies for individuals at risk.