Variations in the FRA10AC1 Fragile Site and 15q21 Are Associated with Cerebrospinal Fluid Aβ1-42 Level

Proteolytic fragments of amyloid and post-translational modification of tau species in Cerebrospinal fluid (CSF) as well as cerebral amyloid deposition are important biomarkers for Alzheimer’s Disease. We conducted genome-wide association study to identify genetic factors influencing CSF biomarker level, cerebral amyloid deposition, and disease progression. The genome-wide association study was performed via a meta-analysis of two non-overlapping discovery sample sets to identify genetic variants other than APOE ε4 predictive of the CSF biomarker level (Aβ1–42, t-Tau, p-Tau181P, t-Tau:Aβ1–42 ratio, and p-Tau181P:Aβ1–42 ratio) in patients enrolled in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. Loci passing a genome-wide significance threshold of P < 5 x 10−8 were followed-up for replication in an independent sample set. We also performed joint meta-analysis of both discovery sample sets together with the replication sample set. In the discovery phase, we identified variants in FRA10AC1 associated with CSF Aβ1–42 level passing the genome-wide significance threshold (directly genotyped SNV rs10509663 P FE = 1.1 x 10−9, imputed SNV rs116953792 P FE = 3.5 x 10−10), rs116953792 (P one-sided = 0.04) achieved replication. This association became stronger in the joint meta-analysis (directly genotyped SNV rs10509663 P FE = 1.7 x 10−9, imputed SNV rs116953792 P FE = 7.6 x 10−11). Additionally, we identified locus 15q21 (imputed SNV rs1503351 P FE = 4.0 x 10−8) associated with CSF Aβ1–42 level. No other variants passed the genome-wide significance threshold for other CSF biomarkers in either the discovery sample sets or joint analysis. Gene set enrichment analyses suggested that targeted genes mediated by miR-33, miR-146, and miR-193 were enriched in various GWAS analyses. This finding is particularly important because CSF biomarkers confer disease susceptibility and may be predictive of the likelihood of disease progression in Alzheimer’s Disease.


Introduction
Alzheimer's Disease (AD) is the most common form of dementia and to date there is still no cure. Understanding the factors influencing cognitive decline of AD will enable us to better search for therapeutics to intervene or preempt this process. It is well established now that the CSF biosignature of increased total tau (t-tau) and phosphor-tau (p-tau) species especially tau phosphorylated at the threonine 181 (p-tau 181p ) and decreased amyloid-β 1-42 peptide (Aβ 1-progression in MCI subjects, as measured by changes in the Clinical Dementia Rating-sum of boxes (CDR-SB), by performing a GWAS in two independent sample sets and identified several novel loci that achieved genome-wide significance (intronic SNPs in UBR5 and PARP6, and an intergenic SNP near ACOT11). [14] In this report, we also describe GWAS or GWAS meta-analysis using florbetapir PET as a quantitative trait and using a dichotomized measure of amyloid positivity, and disease progression/rate of cognitive decline in late mild cognitive impairment (LMCI) population.

Results
We performed a genome-wide association study via a meta-analysis of two discovery sample sets (discovery sample set 1: genotyped using the Illumina Human610-Quad; discovery sample set 2: genotyped using Illumina Omni2.5, Table 1 and Table A in S1 File) to identify genetic variants other than APOE ε4 predictive of the CSF biomarker level (Aβ 1-42 , t-Tau, p-Tau 181P , t-Tau:Aβ 1-42 ratio, and p-Tau 181P :Aβ 1-42 ratio) in patients enrolled in the ADNI study, and followed-up by replication of any locus passing the genome-wide significance threshold of P < 5 x 10 −8 . In the discovery phase, we identified variants from one single locus associated with CSF Aβ 1-42 level passing the genome-wide significance threshold. The most significantly associated markers predictive of CSF Aβ 1-42 level in our meta-analysis of discovery sample sets are, directly genotyped intronic SNV rs10509663 P FE = 1.1 x 10 −9 and imputed putative promoter region SNV rs116953792 P FE = 3.5 x 10 −10 , they are located in a~60kb interval in chromosome 10 that contains the rare FRA10A folate-sensitive fragile site FRA10AC1 ( Figure B1 in S1 File). This genetic association was replicated (rs116953792 P one-sided = 0.04) in the replication sample set (Table 1 and Table A in S1 File, n = 172 genotyped using Illumina OmniExpress). We also performed joint meta-analysis of both discovery sample sets together with the replication sample set. The FRA10AC1 association became stronger in the joint analysis (directly genotyped SNV rs10509663 P FE = 1.7 x 10 −9 , imputed SNV rs116953792 P FE = 7.6 x 10 −11 , Figs 1 and 2A). We identified an additional genome wide-significant locus within the15q21 locus (directly genotyped SNV rs4301994 P FE = 6.5 two sample sets with uncorrected significance levels of p = 0.0006 for the Upennbiomk_Hu-man610-Quad sample set and p = 1.86 x 10 −6 for the Upennbiomk5_Omni2.5 sample set, and P two-sided = 0.2 (P one-sided = 0.1) for the Upennbiomk6_OmniExpress sample set for the directly genotyped marker rs10509663. The imputed SNV rs116953792 however achieved P one-sided = 0.04 for the Upennbiomk6_OmniExpress sample set. The association between rs4301994 and CSF Aβ 1-42 is supported by all three sample sets with uncorrected significance levels of P = 0.003 for the Upennbiomk_Human610-Quad sample set, P = 6.6 x 10 −5 for the Upenn-biomk5_Omni2.5 sample set, and P = 0.03 in the Upennbiomk6_OmniExpress sample set. The regression beta coefficients and least square means of the minor allele dosage for rs10509663 (FRA10AC1) and rs4301994 (15q21 locus) are shown in the forest plot (Fig 3A and 3B) and Table 2, displaying a consistent trend across the three sample sets. Both directly genotyped markers rs10509663 and rs4301994 did not deviate from Hardy-Weinberg Equilibrium (P = 0.38 and 1, respectively based on Omni2.5 data among cognitively normal controls).   Table 3 contains those mostly independent variants (both the most significant directly genotyped and imputed markers are retained) that are significantly associated with Aβ 1-42 level from our meta-analysis, along with other variants with uncorrected P 1x10 -6 in any CSF biomarker meta-analysis. The full list of variants meeting this more liberal threshold appears in the S1 Table. Although the CSF Aβ 1-42 measurement using Upennbiomk5 (baseline Aβ 1-42 for 117 ADNI-GO subjects and 272 ADNI-2 subjects) were not taken at the exact same visit as the florbetapir PET imaging, the overall correlation for the overlapping subjects between the two measurements was negatively correlated (r = -0.72). We expected the results from the cerebral amyloid deposition GWAS to provide complementary evidence with CSF Aβ 1-42 meta-analysis even if the sample sets are not identical and the endpoints are not perfectly negatively correlated. In our florbetapir PET GWAS meta-analysis without correcting for APOE ε4 dosage, the APOE locus predicted florbetapir PET SUVR value (directly genotyped SNV rs429358 P = 7.99 x 10 −32 ) as expected from the published results ( Figure C1 in S1 File). There are uncommon variants (such as rs76117213, an intronic variant in WD repeat and FYVE domain containing 3 (WDFY3), P = 1.39 x 10 −7 without correction for APOE ε4 dosage, P = 4.08 x 10 −6 with correction for APOE ε4 dosage). The uncommon variant results, however, shall be interpreted with caution as these variants occurred at low minor allele frequency and the association statistics are based on small sample size for rare genotype groups. Table 4 contains the variants that were significant in florbetapir PET GWAS along with other variants with uncorrected P < 1x10 -6 in any florbetapir PET GWAS analysis, most variants are independent except for the chromosome 19 region. The full list of variants meeting this more liberal threshold appears in the S2 Table. Top 15q21 locus variants associated with CSF Aβ 1-42 exhibited a nominal association with florbetapir PET SUVR value (P = 0.002 for directly genotyped SNV rs4301994; P = 0.001 for imputed SNV rs1503351). Similarly, the imputed variant rs116953792 from FRA10AC1 also exhibited a nominal association with florbetapir PET SUVR value (P = 0.01). Results for the top CSF Aβ 1-42 variant rs10509663 from FRA10AC1 in this and other analyses (S3 Table) are discussed in the S1 File.
For the rate of cognitive decline GWAS in the late-MCI subgroup, there were some variants that achieved the conventional genome-wide significance threshold ( Figure D in S1 File), those variants occurred at low frequency (MAF < 0.05). An intronic variant rs2694777 in the GDNF family receptor alpha 1 gene (GFRA1) is among the common variants with suggestive association with rate of cognitive decline as measured by rate of change in CDR-SB (P = 1.2 x 10 −5 European ancestry sample set and P = 6.77 x 10 −6 in all races). The full list of variants with P 10 −6 appears in the S4 Table. The results from this study, for variants reported in the literature relevant/associated with CSF biomarkers, florbetapir PET, and disease progression were also reported in the S1 File.
Gene set enrichment analysis may yield signals of enriched gene sets in GWAS analysis despite the individual variants not reaching genome wide significance. Applying INRICH [15] enrichment analysis to the CSF biomarker GWAS results yields gene sets such as miR-33 target genes (P empirical = 0.0002, P corrected = 0.03) being enriched among p-tau 181p :Aβ1-42 suggestive association hits (P < 0.0005) ( Table 5 and S5 Table). miR-33 was identified to be a potent posttranscriptional regulator of lipid metabolism genes [16,17] and cause disruption of cellular cholesterol homeostasis leading to pathologic processes including AD. Among, the potential targets of miR-33, PRKAA1 (Protein kinase, AMP-activated, alpha 1 catalytic subunit) mediates an autophagic process to clear extracellular Aβ fibrils by microglia, the immune cells in the brain. [18] Other potential targets of miR-33 included ARID5B (AT rich interactive domain 5B (MRF1-like)), KCNMA1 (Potassium large conductance calcium-activated channel, subfamily M, alpha member 1), and LGI1 (leucine-rich, glioma inactivated). ARID5B was previously Table 3. Summary of CSF biomarker GWAS meta-analyses-SNPs with uncorrected p-value less than 1x10 -6 . Top variants were clumped using parameters-clump-p1 0.000001-clump-p2 0.05-clump-r2 0.2-clump-range entrez.gene.map-clump-range-border 20.  LGI1 is an extracellular matrix (ECM) molecule forming a complex with postsynaptic scaffolding proteins (postsynaptic density proteins 95 and 93, and the synapse-associated protein 97), presynaptic scaffolding proteins (Ca2+/calmodulinactivated serine-threonine kinase and Lin7), and presynaptic K+ channels (K v 1.1, K v 1.4 and K v β1 subunits) and demonstrated to be important in epilepsy, but ECM and chondroitin sulfate proteoglycans (CSPGs), one of the most abundant glycanated protein types found in the nervous system and a major ECM component, which form dense lattice-like structures, termed perineuronal nets (PNNs), are thought to be neuroprotective in AD. [22] Application of Aβ  to rodent primary neuronal cultures caused neuronal death of neurons not associated with PNNs, while the neurons expressing PNNs were not affected. [23] The same miR-33 target genes were observed to be enriched in other CSF biomarker suggestive hits, although the corrected enrichment p-value is greater than 0.05. In addition, miR-193 and miR-146 target genes are also enriched, miR-193 was one of the nine down-regulated miRNA identified in adult-onset AD Drosophila brains. [24] miR-193b were all upregulated in oxidative stressed (i.e. H 2 O 2 -induced) primary hippocampal neurons and different strains of senescence accelerated mice. [25]. miR-146 was reported to be related to upregulated immune and inflammatory in Alzheimer's disease. [26] Discussion In this study we have analyzed the ADNI Cohort to predict CSF and PET biomarker status and cognitive decline rates from genetic data and to assess if use of molecular biomarkers as quantitative traits provides extra power to uncover novel genotype-phenotype relationships in AD. Unequivocally, APOE ε4 allele is the strongest genetic predictor of CSF biomarker level, cerebral amyloid deposition, and disease progression. The effect size for other genetic markers is much smaller. CSF biomarker measurements have the advantage that their measurements are widely sensitive across different patient subpopulations ranging from cognitive normal to AD. Cognitive measurements on the other hand are sensitive for different patient population segments and this will limit the sample size available for GWAS if we study the rate of cognitive decline directly in the selected patient population. For example, the Clinical Dementia Rating-Sum of Boxes (CDR-SB) is most sensitive for mild cognitive impairment (MCI) patients, while the  Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) will be more appropriate for AD patients. In this sample set, variants in FRA10AC1 and 15q21 were associated with CSF Aβ 1-42 reaching genome-wide significance, replication was achieved with the FRA10AC1 variant. FRA10AC1 (fragile site, folic acid type, rare, fra(10)(q23.3) or fra(10)(q24.2) candidate 1) encodes a nuclear phosphorprotein of unknown function. The 5' UTR of this gene is part of a CpG island and contains a tandem CGG repeat region that normally consists of 8-14 repeats but can expand to over 200 repeats. The CGG repeat is~723 base pairs away from the rs116953792, the most significant SNP (P = 2.0 x 10 −10 , in LD with the directly genotyped variant rs10509663) associated with CSF Aβ 1-42 level. The expanded allele becomes hypermethylated and is not transcribed and an expanded repeat region has not been associated with any disease phenotype. The extent of LD between the CpG repeat and rs116953792, is unclear, given their close proximity.

SNP
WDFY3 is one of the biologically most interesting genes identified to have suggestive association with the florbetapir PET quantitative trait with or without correction for APOE ε4 dosage. WDFY3 encodes a phosphatidylinositol 3-phosphate-binding protein that functions as a master conductor for aggregate clearance by autophagy. This protein shuttles from the nuclear membrane to co-localize with aggregated proteins, and complexes with other autophagic components to achieve macroautophagy-mediated clearance of aggregated proteins. This protein is of particular interest given the proposed synergy between amyloid and tau aggregates in driving AD progression. Another variant (rs17009220, P = 0.00484, http://www.gwascentral.org/ marker/HGVM16286779/results?t=2) in WDFY3 exhibited nominally significant interaction (SNP Ã APOE ε4) in an AD case-control GWAS study. [27] GFRA1 gene, which encodes GDNF family receptor alpha 1, a member of the GDNF receptor family, is among the few genes with suggestive association to rate of cognitive decline in the LMCI subgroup. The GDNF family receptor alpha 1 is a glycosylphosphatidylinositol(GPI)linked cell surface receptor for both glial cell line-derived neurotrophic factor (GDNF) and neurturin (NTN), and mediates activation of the RET tyrosine kinase receptor. GDNF and NTN are two structurally related, potent neurotrophic factors that play key roles in the control of neuron survival and differentiation. Neuronal loss is a hallmark of AD, a neurodegenerative disease.
Human and mouse experiments have implicated the role of miRNA in the regulation of Aβ, tau, inflammation, and cell death as the main disease mechanisms of AD [28]. In our GWAS and INRICH analyses, it was intriguing that molecules involved in Aβ autophagy, inflammation, cell death and proliferation are enriched in INRICH analysis or among the top hits of GWAS analyses. APOE ε4 has been the most convincing genome wide signal in AD, and variants in APOE-APOC1-APOC4-APOC2 and TOMM40-APOE have previously been associated with total cholesterol, LDL cholesterol, and triglyceride concentrations. [29,30] Cholesterol metabolism was implicated to be enriched in the etiology of AD in previous study. [31] In the INRICH analysis, miR-33 targets are enriched in CSF GWAS analysis and miR-33 is critical in regulating cholesterol metabolism, affirming the interrelationship between cholesterol metabolism and AD process.
Studies comparing (a) non-demented individuals free of substantial Alzheimer's pathology (controls), (b) non-demented individuals with equivalent loads of amyloid-β plaques ("mismatches") and tangles, and (c) demented Alzheimer's cases, observed four main phenotypic differences between the groups, which include demented cases had significantly higher burdens of fibrillar thioflavin-S positive plaques and of oligomeric amyloid-β deposits reactive to conformer-specific antibody NAB61 than "mismatches". [32] Thus, florbetapir PET most likely could not distinguish between these different forms of amyloid deposition. Therefore, future amyloid phenotype differentiating the pathological forms of amyloid deposition will help genetic association study.
Future studies with larger sample sizes or replication samples are needed to further dissect the genetic basis of CSF and cerebral biomarkers. AD progression is a challenging problem as the patients are likely to be at different stages of the disease continuum, additionally, deterioration is not linear over the course of the disease progression. Furthermore, different neuropsycogntive instruments are sensitive for patients at different stages of the disease continuum, thus, genetic association studies further suffers from the sample size after stratification by disease stage.

Alzheimer's Disease Neuroimaging Initiative
Data used in this study were obtained from the ADNI database (adni.loni.usc.edu). The ADNI study was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organizations, as a $60 million dollar, 5-year public private partnership. The primary goal of ADNI study has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessments can be combined to measure the progression of MCI and early AD. For up-to-date information, see www.adni-info.org.
The data from ADNIMERGE R package dated 2014-06-11 together with genetic data (Figure A in S1 File) (http://www.loni.ucla.edu/ADNI) were utilized in this manuscript. The adnimerge table merges together several of the key variables from various case report forms and biomarker lab summaries across the ADNI protocols (ADNI1, ADNIGO, and ADNI2). The details of genotype data quality control (QC), imputation, and genetic association analysis are described in the S1 File.

Inference of APOE ε4 genotype
A fraction of subjects with OmniExpress genotype data had missing APOE ε4 dosage in the ADNIMERGE database. Genotype dosage for rs429358 which is the defining variant for APOE ε4 dosage were imputed using IMPUTE2 [33][34][35][36][37] (v2.3.0) and the rs429358 genotype was inferred using the best guessed genotype if the probability having that best guessed genotype exceeds 90%.

CSF biomarker GWAS meta-analysis
To minimize differences due to different CSF assay batches, three sets of samples (Table 1) were used in the GWAS and referred to as Upennbiomk_Human610-Quad, Upennbiom-k5_Omni2.5, and Upennbiomk6_OmniExpress. Sample sizes for CSF biomarkers after sample level QC and with phenotype data are listed in Table A in S1 File. CSF biomarker data at baseline visit of ADNI study were log transformed to approximate a normal distribution. APOE ε4 allele dosage, age and clinical diagnosis group (CN, EMCI, LMCI, or AD) at baseline visit were included as covariates. Fixed effects meta-analyses were carried out using PLINK. [38] Regional visualization of genome-wide association scan results was plotted using LocusZoom. [39] Cerebral amyloid deposition (florbetapir PET) quantitative trait GWAS The cerebral amyloid deposition quantitative trait as measured by florbetapir PET was taken from adnimerge table. The AV45 value in the adnimerge table is the average AV45 SUVR of frontal, anterior cingulate, precuneus, and parietal cortex relative to the cerebellum. Independent GWA analyses using florbetapir PET SUVR as a quantitative trait were performed for subjects genotyped with the Omni2.5 (N = 661) or the OmniExpress (N = 291) platform (see Table B in S1 File for basic demographic characteristics), followed by meta-analysis across the two sample sets. For patients with more than one florbetapir PET imaging data, the peak measurement was used in this analysis. Gender, age and clinical diagnosis (NL, MCI, or AD) at the time of peak florbetapir PET measurement were included as covariates. Other auxiliary analyses are described in the S1 File.

Rate of cognitive decline GWAS
LMCI subjects (N = 540) with genetic data were included in this analysis. The subset of overlapping markers shared across all samples was used to impute unobserved genotype data since chip platform is correlated with rate of cognitive decline to avoid the situation of using different sets of markers to infer unobserved genotyping being confounded with the phenotypic endpoint. The rate of cognitive decline is defined by the yearly rate of CDR-SB score change over one of the following: (1) the first two year period since baseline (if month 24 data are available); or (2) the first one year (if month 24 data are not available but month 12 data are available); or (3) the first 6 month (if only month 6 data are available). The rate of cognitive decline was also log transformed after addition of 3.5 to avoid log transformation of negative numbers using this formula ln (ΔCDRSB/duration + 3.5) to approximate normal distribution. Gender, age, baseline CDR-SB score, baseline MMSE score, and APOE ε4 allele dosage were included as covariates.

Gene Set Enrichment Analysis
INRICH is a pathway analysis tool for genome wide association studies, designed for detecting enriched association signals of LD-independent genomic regions within biologically relevant gene sets. [15] Reference gene sets used in the INRICH analysis include KEGG, Gene Ontology, and Molecular Signature Database. Top variants from CSF and florbetapir PET SUVR analyses with nominal association p-values less than 0.0005, 0.0001, 0.00005, 0.00001 were separately fed into PLINK to clump the variants into LD-independent genomic intervals (r 2 threshold using 0.2, 0.3, and 0.5 respectively), then LD-independent genomic regions were used for INRICH (version 1.0) analyses. No multiple testing correction was applied for running INRICH against multiple reference gene sets or for using multiple parameters (p-value cut-off and LD threshold).
Supporting Information S1 File. Supporting Information. (DOCX) S1 Table. Full list of variants with uncorrected p-value less than 1x10 -6 in any CSF biomarker analyses.