Influence of Coding Variability in APP-Aβ Metabolism Genes in Sporadic Alzheimer’s Disease

The cerebral deposition of Aβ42, a neurotoxic proteolytic derivate of amyloid precursor protein (APP), is a central event in Alzheimer’s disease (AD)(Amyloid hypothesis). Given the key role of APP-Aβ metabolism in AD pathogenesis, we selected 29 genes involved in APP processing, Aβ degradation and clearance. We then used exome and genome sequencing to investigate the single independent (single-variant association test) and cumulative (gene-based association test) effect of coding variants in these genes as potential susceptibility factors for AD, in a cohort composed of 332 sporadic and mainly late-onset AD cases and 676 elderly controls from North America and the UK. Our study shows that common coding variability in these genes does not play a major role for the disease development. In the single-variant association analysis, the main hits, none of which statistically significant after multiple testing correction (1.9e-4<p-value<0.05), were found to be rare coding variants (0.009%<MAF<1.4%) with moderate to strong effect size (1.84<OR<Inf) that map to genes mainly involved in Aβ extracellular degradation (TTR, ACE), clearance (LRP1) and APP trafficking and recycling (SORL1). These results were partially replicated in the gene-based analysis (c-alpha and SKAT tests), that reports ECE1, LYZ and TTR as nominally associated to AD (1.7e-3 <p-value <0.05). In concert with previous studies, we suggest that 1) common coding variability in APP-Aβ genes is not a critical factor for AD development and 2) Aβ degradation and clearance, rather than Aβ production, may play a key role in the etiology of sporadic AD.


Introduction
The cerebral deposition of Aβ 42 aggregates, insoluble neurotoxic derived of amyloid precursor protein (APP), is likely caused by an imbalance between Aβ production and clearance and represents a key event in Alzheimer's disease (Amyloid hypothesis) [1].
A growing body of evidence has pointed to the critical role of APP-Aβ metabolism in AD pathogenesis. First, the discovery of APP, PSEN1 and PSEN2 mutations showed that familial Alzheimer's disease is linked to Aβ 42 overproduction [2][3][4]. Second, genome-wide association studies (GWASs) identified several susceptibility loci associated with AD (APOE, BIN1, PICALM, CD33, ABCA7, CLU, MS4A6A, EPHA1, CR1, CD2AP, SORL1, CASS4) and regulating APP-Aβ levels [5][6][7][8][9][10][11][12]. Third, next generation sequencing laid the ground for the discovery of TREM2 risk variants, highlighting the possible role of microglia in Aβ clearance [13][14]. Finally, although recent studies have shown that rare coding variability in PSEN1 may influence the susceptibility for apparently sporadic late-onset AD (LOAD) [15][16], increases in Aβ production currently explain a minority of AD cases. By contrast, it is very likely that the majority of AD cases are caused by impaired degradation and clearance of Aβ, which is produced at normal levels throughout life [17][18]. Despite the importance of APP-Aβ metabolism in AD, the role of genes taking part in Aβ production and catabolism as susceptibility factors for AD is still elusive and has not been extensively investigated. Therefore, in this study, we selected 29 genes known to be involved in APP and Aβ processing: ADAM9, ADAM10, ADAM17, MEP1B, BACE1, BACE2, NCSTN, PSENEN, APH1B, LRRTM3, APLP1, APBA1, SORL1, TTR, GPR3, ECE1, ECE2, IDE, CST3, CTSB, CTSD, LYZ, MME, ACE, MMP3, A2M, PLAT, KLK6 and LRP1. We then analyzed the single, independent and the cumulative effect of protein coding variants in these genes from exome and genome sequencing data, in a cohort composed of 332 sporadic and mainly late-onset AD cases and 676 elderly controls from North America and UK.
These genes were chosen on the basis of PubMed based literature search and/or based on predicted protein interactions using STRING (http://string.embl.de/).
The discovery cohort was composed of 332 apparently sporadic AD cases and 676 elderly controls, neuropathologically and clinically confirmed, originating from the UK and North America. The mean age at disease onset was 71.66 years (range 41-94 years) for cases and the mean age of ascertainment was 78.15 years (range 60-102 years) for controls ( Table 1). The majority of the cases (77%) were late-onset (> 65 years at onset).
Among the cases and controls, 42% and 51% were female, respectively. 58% and 47% of the cases and controls carried the APOE ε4 allele, respectively. The APOE ε4 allele was significantly associated to the disease status in the NIH and ADNI series (p-value = 0.02 and 1.19x10 -9 , respectively). Importantly, all the BYU controls from the Cache County Study on Memory in Aging were heterozygous for APOE ε4 allele. However, given the fact that 1) they were elderly (mean age 80.8 years old [range: 75-94.59]) and without any clinical sign of dementia and 2) APOE ε4 allele is a risk but not a causative factor, we still considered them as controls and included in the study.
Written consent for participation was obtained in accordance with institutional review board standards. All samples had fully informed consent for retrieval and were authorized for ethically approved scientific investigation. The UCLH Research Ethics Committee number 10/ H0716/3, BYU IRB, Cardiff REC for Wales 08/MRE09/38+5, REC Reference 04/Q2404/130, National Research Ethics Service (NRES) specifically approved this study.

Exome sequencing
We performed whole exome sequencing on a cohort of 332 sporadic and mainly late-onset AD cases and 477 elderly controls. DNA was extracted from blood or brain both for cases and controls using standard protocols. Library preparation for next generation sequencing used DNA (between 1 μg and 3 μg) fragmented in a Covaris E210 (Covaris Inc.). Following fragmentation, DNA was end-repaired by 5'phosphorylation, using the Klenow polymerase. A poly-adenine tail was added to the 3'end of the phosphorylated fragment and ligated to Illumina adapters. After purification using an AMPure DNA Purification kit (Beckman Coulter, Inc), adapterligated products were amplified. The DNA library was then hybridized to an exome capture library (NimbleGen SeqCap EZ Exome v2.0, Roche Nimblegen Inc. or TruSeq, Illumina Inc.) and precipitated using streptavidin-coated magnetic beads (Dynal Magnetic Beads, Invitrogen). Exome-enriched libraries were PCR-amplified, and then DNA hybridized to paired-end flow cells using a cBot (Illumina, Inc.) cluster generation system. Samples were sequenced on the Illumina HiSeq™ 2000 using 2x100 paired end reads cycles.

Whole Genome sequencing
Genome sequencing was performed in 199 elderly, clinically healthy controls, from the Cache County Study on Memory in Aging. DNA was extracted from blood using standard protocols. All samples were sequenced with the use of Illumina HiSeq technology. Alignment was performed with the use of CASAVA software and variant calling was performed with the use of SAMtools [21] and the Genome Analysis Toolkit GATK [22]. This sequencing and variant calling were performed by our collaborators at Brigham Young University. Bioinformatic Sequence alignment and variant calling were performed against the reference human genome (UCSC hg19). Paired end sequence reads (2x100bp paired end read cycles) were aligned using the Burrows-Wheeler aligner [23]. Format conversion and indexing were performed with Picard (www.picard.sourceforge.net/index.shtml). GATK was used to recalibrate base quality scores, perform local re-alignments around indels and to call and filter the variants [22]. VCFtools was used to annotate gene information for the remaining novel variants. We used ANNOVAR software to annotate the variants [24]. Variants were checked against established databases (1000 Genomes Project and dbSNP v.134). The protein coding effects of variants was predicted using SIFT, Polyphen2 and SeattleSeq Annotation (gvs.gs.washington.edu/ SeattleSeqAnnotation). All variants within the coding regions of 29 candidate genes (A2M ) have been collected and analyzed. (S1 Table) (Further details are provided in the supplementary materials)

Statistical Analysis
For each variant, allele frequencies were calculated in cases and controls and Fisher's exact test on allelic association was performed. All computations were performed in R (version x64 3.0.2, http://www.r-project.org/). The threshold call rate for inclusion of both subjects and variants in the analysis was 95%.
A p-value of 0.05 was set as a nominal significance threshold. Based on simple Bonferroni correction for multiple testing, the thresholds for single variant and gene-based association are defined by p-value = 1.9e -4 (0.05/256 coding variants) and 1.7e -3 (0.05/29 genes), respectively.
For the gene-based analysis we have pooled together coding and non-coding variants with a MAF 0.05 and studied their cumulative effect on the AD trait.
C-alpha test and SKAT are closely related, being both non-burden tests, analyzing and collapsing the effect of genetic variants of different frequency (common and rare), effect (protective, damaging and neutral) and effect size (modest, moderate, strong). SKAT can be considered an expansion of the c-alpha test because overcomes some of its limits. Indeed, SKAT 1) can be applied also to the study of continuous traits; 2) does not need any permutation; 3) applies covariates to the study. In addition, we have used a set of 5-10 size matched genes not linked to APP-Aβ metabolism (based on a Pubmed and STRING search) as negative controls for the genebased association analysis. Moreover, we have assessed the reliability of our results, comparing the total variant frequency of the selected genes in our study with the one reported for the European-American cohort in the Exome Variant Server (EVS)(http://evs.gs.washington.edu/EVS/).

Results
The study population consisted of a total of 332 sporadic and mainly late-onset AD cases and 676 elderly controls of British and North American ancestry ( Table 1).
We do not report any pathogenic mutation in APP, PSEN1 and PSEN2 in our cohort. However, one of the controls was an heterozygous carrier of the protective variant APP p.A673T (MAF 7x10e -4 in our cohort and MAF 5x10e -4 among the European non-finnish, ExAC database, released 13 January 2015).
Moreover, 120 missense variants (46.8%) were described as damaging variants by at least 2 out of 3 in silico prediction softwares (SIFT, Polyphen and Mutation Taster). Importantly, genes involved in Aβ degradation and clearance harbor the highest relative frequency of rare coding and damaging variants (mean = 4.73 rare coding variants/Kbp of coding sequence and 3.16 damaging coding variants/Kbp of coding sequence, respectively). By contrast, genes taking part in APP processing and Aβ production, present the lowest relative frequency of rare coding and damaging variants (mean = 3.59 rare coding variants/Kbp of coding sequence and 1.5 damaging coding variants/Kbp of coding sequence, respectively), suggesting a higher degree of conservation of this last cluster of genes (S4 and S5 Tables).
However, none of the coding variants detected in the studied genes reached the statistical significance, based on a corrected p-value (p-value <1.9e -4 ), in the single-variant association test.
For all these main variants, with the APH1B (p.T27I) and SORL1 (p.D2065V) exceptions, the minor allele was substantially more frequent in cases compared to controls, suggesting a possible role as a risk factor for AD. The study possessed relatively low power to detect any significant association between cases and controls for low frequency and rare variants. Nevertheless, we analyzed these variants because we could not preclude the possibility that high effect risk alleles were present.

Gene-based association test
In addition to single-marker analysis, we performed gene-wide analysis to combine the joint signal from multiple coding and non-coding variants with a MAF 0.05 within a gene and to provide greater statistical power than that for single-marker tests. All the variants  (nonsynonymous, synonymous, UTRs) located within the studied genes and their exon-intron flanking regions were collapsed together and their joint effect has been studied and compared with 5 to 10 size and variant matched gene controls (S6A and S6B Table). Genes involved in Aβ degradation and clearance were enriched for the lowest p-values. The combined effect of variants in ECE1, LYZ and in TTR reached the nominal significance in the c-alpha and SKAT tests, respectively (1.7e -3 <p-value <0.05) (Tables 3 and 4). There was a partial overlap between genes identified in the single-marker analysis and those with SKAT and c-alpha tests. Importantly, TTR was the main finding both in the SKAT and single-marker association analysis. Therefore, suggesting TTR as a promising potential candidate risk gene for AD.

Discussion
The Amyloid cascade hypothesis is the main accepted hypothesis underlying AD pathology. Several genes within the APP-Aβ metabolism pathway have been reported as potential candidate genes for AD. However, coding variability among these has not been extensively investigated. The vast majority of reported studies are based on candidate gene approaches using array-based SNP genotyping and are focused mainly on genes involved in Aβ catabolism (http://www.alzgene.org/). Thus, leaving low frequency and rare coding variants and genes involved in Aβ production largely unexplored. GWASs and chip-based candidate gene approaches have shown that common and generally non-coding variability within these genes does not play a critical role for AD development. The only exceptions to this general rule are represented by SORL1 and ABCA7, which have been reported associated with late-onset apparently sporadic and familial AD both with GWASs, candidate gene approaches and exome sequencing [25][26][27][28].
In this study, we report a screening of genes known to be involved in the APP-Aβ metabolism (APP processing, Aβ production, degradation and clearance). We applied single-marker and gene-based association analyses, to investigate the independent and joint effect of coding variability within these genes in a cohort composed of 332 apparently sporadic and mainly late-onset AD cases and 676 elderly controls from North America and the UK. In our cohort, genes involved in Aβ degradation and clearance harbor the highest relative frequency of rare and predicted damaging variants (mean = 4.73 rare coding variants/Kbp of coding sequence and 3.16 damaging coding variants/Kbp of coding sequence, respectively). Conversely, genes encoding for proteins regulating Aβ production presented the lowest relative frequency of coding and likely damaging variability (mean = 3.59 rare coding variants/Kbp of coding sequence and 1.5 damaging coding variants/Kbp of coding sequence, respectively) (S4 and S5 Tables), suggesting a higher degree of conservation.
In the single-variant association analysis, the main hits were very rare coding variants (MAF 0.009%< MAF<1.4%) with strong effect sizes (1.84<OR<Inf) mapping to genes involved in Aβ extracellular degradation (ACE, TTR, CTSB, MMP3), APP trafficking and recycling (SORL1), and clearance (LRP1). Among the main single-marker hits only 2 genes (APH1B, NCSTN) were component of the γ secretase complex and therefore pivotal for APP processing. Our study was underpowered for the detection of rare and low frequency variants and these variants were nominally significant after Bonferroni correction (1.9e -4 <p-value <0.05). Importantly, in our cohort, TREM2 p.R47H, the second most common risk factor for sporadic AD, has been detected in 6 cases (1.8%) and 4 controls (0.59%) and, given our small sample size, with a MAF = 0.2%, was not significantly associated to AD (p-value = 0.09). Therefore, we suggest that the main variants detected in our study may be functional and warrant a follow up in an extended sample size.
Transthyretin (TTR) is a 55-kDa protein, particularly abundant in the CSF and human plasma, where it transports thyroxin from the peripheral blood circulation to the brain. TTR has been already reported linked to AD dementia. Particularly, it has been hypothesized that TTR may act as a scaffold protein, binding to amyloid and preventing its deposition and aggregation in plaques [29]. Moreover, TTR is a well established biomarker for AD: 1) a decrese in the CSF and serum TTR levels has been associated to an increased AD severity and faster rate of disease progression and 2) 5 SNPs and different TTR haplotypes have been related to hippocampal atrophy [30]. By contrast, TTR overexpression decreases Aβ deposition, protects against Aβ plaques formation and improves the cognitive function in different AD mouse model strains [31][32]. Finally, TTR, likewise other well conserved secreted proteins critically involved in dementia (PRP and PGRN), has no homologous proteins (http://string-db.org/), therefore implying that even subtle changes to its epitopes and/or domains may be functionally relevant and phenotypically manifest. In line with this hypothesis, TTR is the main hit both in the single-variant analysis and, with only 6 variants detected (4 coding and 2 3´UTR variants), in the gene-based analysis (SKAT). Thus, suggesting that either the joint effect of the coding and non-coding variability at the TTR locus is likely to be functional or that the signal is driven by TTR p.T139M, that has a very strong effect size (OR = 6.19, 95% CI = 1.099-63.091).
The second strongest signal in the single variant analysis maps to ACE (p.T916M), encoding angiotensin I converting enzyme 1 (ACE1), a zinc metalloprotease, which regulates blood pressure. Multiple lines of evidence have shown that ACE1 may ameliorate the cognitive decline either regulating the cerebral blood flow and/or converting Aβ 42 to Aβ 40 , a more soluble isoform and its activity is increased in AD brain, in proportion to the parenchymal Aβ load [33][34][35].
SORL1 encodes for Sortilin related receptor (SORLA, also known as LR11), which binds to APOE and mediates several intracellular sorting and trafficking functions through a VPS10 (vacuolar protein sorting protein 10) domain [36]. SORLA is highly expressed in the brain and modulates APP recycling through the retromer complex, thereby influencing levels of Aβ [37][38]. A growing body of evidence suggested SORL1 as an excellent positional and functional candidate for AD [12,25,[39][40][41][42][25][26]. Moreover, decreased expression and DNA methylation changes at the SORL1 locus have been reported associated to AD [43][44]. SORL1 harbors 3 of the top variants identified (p.E270K, p.D2065V and p.R11P). Importantly, p.E270K has been reported in 7 Carribean Hispanic families and in vitro studies showed that SORL1 p. E270K is a functional variant, leading to increased secretion of Aβ, when transfected in HEK293 cell lines and weakening the binding to APP [45]. Therefore, it is plausible that SORL1 (p.E270K) may increase the susceptibility also for apparently sporadic LOAD. Thus, this finding, with the other main SORL1 variants detected (p.D2065V and p.R11P) should be further investigated.
Finally, four of the main hits map to the low-density lipoprotein receptor related protein 1 (LRP1) gene. Importantly, LRP1 is a major Aβ clearance receptor in cerebral vascular smooth muscle cells and disturbance of this pathway contributes to Aβ accumulation in the brain [46]. In addition, LRP1 locus was identified as an AD candidate locus in consanguineous Israeli-Arab community [47].
The c-alpha and SKAT tests reported ECE1 and LYZ and TTR, respectively, as main genes associated with LOAD, although after the Bonferroni correction, the association was only nominally significant. Very interestingly, the main genes identified both with the single-marker and gene-based analysis play a pivotal role in the cardiovascular system and have been already signaled as potential risk factors for cerebral amyloid angiopathy (CAA) or vascular dementia [48]. First, ECE1 and ACE are key components of the renin-angiotensin cascade, that controls blood pressure [49]. Second, ECE1, ACE and TTR are mostly expressed by endothelial cells in the CNS (http://web.stanford.edu/group/barres_lab/brain_rnaseq.html). Third, acute and chronic hypoxia, through the release of the hypoxia inducible factor 1α (HIF-1α), exerts a critical epigenetic regulation on APP processing and Aβ catabolism key genes. Notably, HIF-1α has been reported to increase γ and β cleavage of APP and impair Aβ degradation and clearance mainly through the down regulation of pivotal proteins such as MME, ECE1 and TTR [50][51][52][53].
In summary, our study shows that 1) common coding variability within genes involved in APP-Aβ metabolism does not play a critical role for AD development; 2) genes regulating Aβ production are more conserved than genes playing a key role in Aβ degradation and clearance, thus less frequently involved in sporadic AD; 3) TTR, ACE, SORL1, CTSB and LRP1 harbor rare coding variants with strong effect size, likely to be functional and warrant further investigation in an extended cohort; 4) ECE1, LYZ and TTR play a critical role in the cardiovascular system and the joint effect of their variants may increase the susceptibility to AD. Finally, in concert with previous studies, our results support a potential overlapping biology with shared risk factors between CAA, vascular dementia and AD.
Supporting Information S1  Table. A. Size-matched negative controls for the gene-based association analysis (c-alpha test). B. Size-matched negative controls for the gene-based association analysis (SKAT). (XLSX)