The association of clinical phenotypes to known AD/FTD genetic risk loci and their inter-relationship

To elucidate how variants in genetic risk loci previously implicated in Alzheimer’s Disease (AD) and/or frontotemporal dementia (FTD) contribute to expression of disease phenotypes, a phenome-wide association study was performed in two waves. In the first wave, we explored clinical traits associated with thirteen genetic variants previously reported to be linked to disease risk using both the 23andMe and UKB cohorts. We tested 30 additional AD variants in UKB cohort only in the second wave. APOE variants defining ε2/ε3/ε4 alleles and rs646776 were identified to be significantly associated with metabolic/cardiovascular and longevity traits. APOE variants were also significantly associated with neurological traits. ABI3 variant rs28394864 was significantly associated with cardiovascular (e.g. (hypertension, ischemic heart disease, coronary atherosclerosis, angina) and immune-related trait asthma. Both APOE variants and CLU variant were significantly associated with nearsightedness. HLA- DRB1 variant was associated with diseases with immune-related traits. Additionally, variants from 10+ AD genes (BZRAP1-AS1, ADAMTS4, ADAM10, APH1B, SCIMP, ABI3, SPPL2A, ZNF232, GRN, CD2AP, and CD33) were associated with hematological measurements such as white blood cell (leukocyte) count, monocyte count, neutrophill count, platelet count, and/or mean platelet (thrombocyte) volume (an autoimmune disease biomarker). Many of these genes are expressed specifically in microglia. The associations of ABI3 variant with cardiovascular and immune-related traits are one of the novel findings from this study. Taken together, it is evidenced that at least some AD and FTD variants are associated with multiple clinical phenotypes and not just dementia. These findings were discussed in the context of causal relationship versus pleiotropy via Mendelian randomization analysis.


Introduction
Genome Wide Association Study (GWAS) is a powerful approach in identifying genetic risk loci. However, the functional elucidation on how a risk locus is related to a disease still requires including samples from UK Biobank using family history as proxy and Alzheimer's Disease Sequencing Project (ADSP) allows identification of additional novel risk loci [29][30][31][32]. In wave 2, we further include 30 variants from the more recent GWAS meta-analyses [29][30][31][32] or variants identified earlier but not prioritized in wave 1 PheWAS. PheWAS approach has been previously applied to BioVU, Vanderbilt's DNA biobank where phenotypes are defined by EMR records namely ICD codes [33], or the 23andMe research database where phenotypes are defined by self-reports [34], or both [35]. It has the potential of validating target, nominating treatment indication and/or assessing safety signal especially if the effect of a genetic variant mimics the pharmacotherapy effect [36]. Given that the genes implicated in AD/FTD were implicated in cholesterol metabolism (APOE, CLU, and ABCA7) and immune response (CR1, CD33, CLU, ABCA7, TREM2, SPPL2A, SCIMP, HLA-DRB1), It is foreseeable that some of the AD variants may be also associated with metabolic/cardiovascular and/or immune-related traits. Recent development of using genetic variants as an instrument variable in GWAS summary statistics based Mendelian randomization (MR) [37] provides another means to dissect the pleiotropy vs. causal relationship between related traits.

Study participants
Cohort 1: 23andMe. All individuals included in the analyses were research participants of 23andMe who have provided electronic informed consent, DNA samples for genetic testing, and answered surveys online. The study was conducted according to human subject protocol, which was reviewed and approved by Ethical & Independent Review Services, a private institutional review board (http://www.eandireview.com). It is also consistent with the procedures involving experiments on human subjects in accord with the ethical standards of the Committee on Human Experimentation of the institution in which the experiments were done or in accord with the Helsinki Declaration of 1975. All data was completely anonymized and deidentified before access by the analyst for data analysis.
As described previously [38], DNA samples have been genotyped on one of four genotyping platforms. The v1 and v2 platforms were variants of the Illumina HumanHap550+ BeadChip (Illumina, San Diego, CA, USA), including about 25 000 custom single-nucleotide polymorphisms (SNPs) selected by 23andMe, with a total of about 560 000 SNPs. The v3 platform was based on the Illumina OmniExpress+ BeadChip, with custom content to improve the overlap with the v2 array, with a total of about 950 000 SNPs. The v4 platform is a fully custom array, including a lower redundancy subset of v2 and v3 SNPs with additional coverage of lower-frequency-coding variation, and about 570 000 SNPs. S1 Table shows which 23andMe genotype platform (v1-v4) the tested variant is genotyped on. It also shows the imputation statistics for the tested variant, including the average imputation dosages for the first (A) and second (B) alleles (freq.a and freq.b) and the average and minimum imputation quality across all batches (avg.rsqr and min.rsqr). The r 2 statistic is used to measure imputation quality, which range from 0 (worst) to 1 (best). The batch effect test is an F test from an ANOVA of the SNP dosages against a factor representing imputation batch.
Only participants enrolled by 2015 were included in this analysis. A similar approach using the same research database was previously described [34]. We tested the association with more than 1000 well-curated phenotypes (S2 Table), which were distributed among different phenotypic categories (e.g. cognitive, autoimmune, psychiatric etc.). GWAS were previously performed on these well-curated phenotypes and confirmed to replicate known associations and not to generate spurious false positives. For our standard PheWAS, we restricted participants to a set of individuals who have > 97% European ancestry, as determined through an analysis of local ancestry [39]. Briefly, this algorithm first partitioned phased genomic data into short windows of about 100 SNPs. Within each window, we used a support vector machine (SVM) to classify individual haplotypes into one of 31 reference populations. The SVM classifications were then fed into a hidden Markov model (HMM) accounting for switch errors and incorrect assignments as well as generating probabilities for each reference population in each window. Finally, we used simulated admixed individuals to recalibrate the HMM probabilities so that the reported assignments were consistent with the simulated admixture proportions. The reference population data is derived from public datasets (the Human Genome Diversity Project, Hap-Map, and 1000 Genomes-participants provided informed consent and all data was completely anonymized and de-identified before access and analysis), as well as 23andMe customers who have reported having four grandparents from the same country. A maximal set of unrelated individuals was chosen for each phenotype using a segmental identity-by-descent (IBD) estimation algorithm [40]. Individuals were defined as related if they shared more than 700 cM IBD, including regions where the two individuals shared either one or both genomic segments identical-by-descent. This level of relatedness (roughly 20% of the genome) corresponded approximately to the minimal expected sharing between first cousins in an outbred population.
The imputed dosages rather than best-guess genotypes were used for association testing in PheWAS. Participant genotype data were imputed against the September 2013 release of 1000 Genomes Phase1 reference haplotypes, phased with ShapeIt2 [41]. Genotype data for research participants were generated from four versions of genotyping chips as described previously [38]. We phased and imputed data for each genotyping platform separately. We phased using a phasing tool Finch, which implements the Beagle haplotype graph-based phasing algorithm [42], modified to separate the haplotype graph construction and phasing steps. Finch extended the Beagle model to accommodate genotyping error and recombination, to handle cases where there were no consistent paths through the haplotype graph for the individual being phased. We constructed haplotype graphs for European and non-European samples on each 23andMe genotyping platform from a representative sample of genotyped individuals, and then performed out-of-sample phasing of all genotyped individuals against the appropriate graph.
In preparation for imputation, we split phased chromosomes into segments of no more than 10,000 genotyped SNPs, with overlaps of 200 SNPs. We excluded SNPs with Hardy-Weinberg equilibrium p < 10 −20 , call rate < 95%, or with large allele frequency discrepancies compared to European 1000 Genomes reference data. Frequency discrepancies were identified by computing a 2x2 table of allele counts for European 1000 Genomes samples and 2000 randomly sampled 23andMe customers with European ancestry and identifying SNPs with a chi squared p < 10 −15 . We imputed each phased segment against all-ethnicity 1000 Genomes haplotypes (excluding monomorphic and singleton sites) using Minimac2 [43], using 5 rounds and 200 states for parameter estimation.
Association test results were computed using logistic regression for case control comparisons, or linear regression for quantitative traits. For survival traits, association test results using Cox proportional hazards regression were computed. We assumed additive allelic effects and included covariates for age, gender, and the top five principal components to account for residual population structure. The association test p value reported was computed using a likelihood ratio test, which was shown to be a better choice despite of its computational demands [44]. We reported raw p-values for the PheWAS association results, but interpret the results taking into account the number of variants and traits tested. An association with p < 0.05 / (13 � 1,234) = 3.12x10 -6 was deemed to be significant association, other associations with FDR < 0.05 was deemed to be suggestive associations.
Cohort 2: UK Biobank (UKB). Pre-computed UK Biobank PheWAS results based on Neale lab UK Biobank summary statistics were looked up via Open Target Genetics [28] (genetics.opentargets.org) for side by side comparison with PheWAS results based on the 23andMe cohort. There were three version of UKB results accessed via Open Target Genetics, Neale v1 PheWAS results were accessed in November 2019, and Neale v2 PheWAS results (http://www.nealelab.is/uk-biobank) were accessed in July 2020. UKB SAIGE is yet a different version of UKB PheWAS results by University of Michigan (http://pheweb.sph.umich.edu/ SAIGE-UKB/about). There are 4593 traits in total for the Neale v2 PheWAS analysis. We report raw p-values for the PheWAS association results, but interpret the results taking into account the number of variants and traits tested. An association with p < 0.05 / (48 � 4593) = 2.27 x 10 −7 was deemed to be significant association. No additional adjustment was made for Neale v1 PheWAS results (a few traits present in v1 were not present in v2) and/or UKB SAIGE PheWAS.

Whole-genome genetic correlations between significant PheWAS traits
For convenience of collecting whole-genome summary statistics, AD summary statistics from Jansen et al study [30] was used to calculate genetic correlation with other traits using LD Hub (v1.9.3) [45], which is a centralized database of summary-level GWAS results and a web interface for LD score regression (LDSC) [46].

Directional horizontal pleiotropy vs causal relationship
For the multiple clinical phenotypes associated with the AD/FTD variants identified in the PheWAS analysis, we attempted to untangle the relationship between trait A and trait B to determine if genetic variants impact trait A (also called exposure in the literature) and trait B (also called outcome in the literature) independently, or genetic variants' effect on trait B is mediated by trait A (or vice versa). We applied MR Egger intercept test [47,48] to test directional horizontal pleiotropy, where the variants affect both trait A (e.g. CAD) and trait B (e.g. AD) independently. MR uses genetic variants as a proxy for an environmental exposure/trait A (e.g. CAD), assuming that: 1) the genetic variants are associated with the exposure/trait A; 2) the genetic variants are independent of confounders in the exposure-outcome association; 3) the genetic variants are associated with the outcome only via their effect on the exposure, i.e. there is no horizontal pleiotropy whereby genetic variants have an effect on an outcome (e.g. AD) independent of its influence on the exposure (e.g. CAD). If the MR Egger intercept test had a significant p-value (p < 0.05) (i.e. violating assumption #3 from the MR analysis), the pair of traits was excluded from the bi-directional, two-sample MR test using inverse variance weighted (IVW) method among traits identified in the PheWAS study. In this case (Egger intercept p < 0.05), the gene-outcome vs gene-exposure regression coefficient is estimated using MR Egger regression to correct for the bias due to directional pleiotropy, under a weaker set of assumptions than typically used in MR [49]. Both IVW and MR Egger regression however do not protect against violation of assumption #2. The MR analysis is also only feasible if there is sufficient information from MR Base [50] for analysis or if the information could be supplemented by manually adding GWAS results from publications, e.g. the recent AD metaanalysis by Jansen et al. [30] In the MR analysis, we primarily leveraged variants implicated in a trait from public summary statistics (pre-compiled as a set of instruments from NHGRI-EBI GWAS Catalog [51] in the MRInstruments R package v0.3.2 https://github.com/MRCIEU/ MRInstruments) as an individual variant is unlikely to be powerful enough as an instrument variable unless the effect size is large. Instrument variables were constructed using the default independent genome wide significant SNPs (p < = 5 x 10 −8 ) for AD and other diseases/risk factors except for FTD where a p-value threshold of p < = 6 x 10 −6 was used because of the smaller GWAS samples size [52]. We assessed if bi-directional causal relationships exist between AD and a number of significant PheWAS traits identified in the PheWAS analyses. For FTD, only directional MR analysis was performed using FTD as the exposure as only top GWAS hits were available publicly. All analyses were performed using the MR-Base 'TwoSam-pleMR' v0. 5.4 package [50] in R and MR test with nominal p < 0.05 using inverse variance weighted and/or MR Egger method was reported. A p < 0.05/ # of PheWAS traits examined is considered significant, while a p < 0.05 is considered suggestive.

Results
Thirteen variants were successfully imputed from the four genotyping platforms in the 23andMe cohort with the average and minimum imputation quality across all batches (avg. rsqr and min.rsqr) ranging from 0.96 to 1 for avg.rsqr (average across 13 variants was 0.995) and 0.86 to 1 for min.rsqr (average = 0.982) (S1 Table).
AD-risk variants are highly associated with neurological, longevity, metabolic, cardiovascular, eye, and immune-related traits Selected PheWAS findings were summarized in Tables 1 and 2 for wave 1 and wave 2 PheWAS alongside the known associations reported in the NHGRI-EBI GWAS Catalog [51] for the SNPs previously associated with LOAD and/or FTD. The list of PheWAS association results using the 23andMe cohort with FDR < 0.05 is available as S3 Table, while the full list of Phe-WAS association results is available from S4 Table. An association in the 23andMe cohort with p < 0.05 / (13 � 1,234) = 3.12 x 10 −6 was deemed to be significant association, other associations with FDR < 0.05 was deemed to be suggestive associations. A number of the known associations was replicated. In addition, novel associations were identified. The two SNPs, rs429358 and rs7412, defining APOE ε2/ε3/ε4 alleles were known to be associated with multiple neurological, longevity, metabolic and cardiovascular traits (Fig 1, Table 1, and S3 Table). Subjects carrying the minor T allele of rs7412 are APOE ε2 protective allele carriers and subjects carrying the minor C allele of rs429358 are APOE ε4 risk allele carriers. PheWAS identified significant associations with metabolic traits (high cholesterol or taking drugs to lower cholesterol, body mass index (BMI)), neurological traits (AD family history, AD, cognitive decline, mild cognitive impairment, memory problems), longevity traits (nonagenarian-at least 90 years old, healthy old-over age 60 with no cancer or disease, centenarian family), cardiovascular diseases (coronary artery disease (CAD), metabolic and heart disease), and eye problems (nearsightedness, glasses usage, myopia vs. hyperopia), and serious side effects from statins (rs429358, p = 7.14 x 10 −7 ). The directionality of association is consistent with the protective vs. risk effect of two APOE SNPs in that the minor allele of rs7412 was associated with lower risk of high cholesterol, while the minor allele of rs429358 was associated with higher risk of high cholesterol (p = 6.6 x 10 −295 ). Additional suggestive associations were identified (FDR < 0.05) for age-related macular degeneration (AMD) or blindness (rs429358 p = 9.59 x 10 −5 , FDR = 0.004). Interestingly, rs11136000 from CLU is also strongly associated with multiple eye phenotypes (nearsightedness, myopia, glasses, astigmatism) ( Table 1, Fig 2, and S3 Table). Subjects carrying the minor allele of rs429358 had lower chance of nearsightedness (p = 1.4 x 10 −8 ), while subjects carrying the minor allele of rs11136000 had higher chance of nearsightedness (p = 4.5 x 10 −15 ). For the overlapping phenotypes, UK Biobank PheWAS results largely supported the 23andMe PheWAS findings.

PLOS ONE
PheWAS of AD/FTD loci and causal inference of significant PheWAS traits

PLOS ONE
PheWAS of AD/FTD loci and causal inference of significant PheWAS traits iqb.alzheimers_fh: cases report having any of their grandparents, parents, brothers, sisters, aunts, or uncles ever been diagnosed with AD; Alzheimer: AD, age 55 or older; cognitive_decline: Any report of cognitive impairment or memory loss, age 65 and older, excluding AD cases; iqb.mild_cognitive_imp_fh: "Have any of your grandparents, parents, brothers, sisters, aunts, or uncles ever been diagnosed with mild cognitive impairment (MCI)?"; iqb.low_hdl: Ever told by a medical professional that your high-density lipoprotein is too low; high_cholesterol: High cholesterol or taking drugs to lower cholesterol.
� chromosomal position based on genome build GRCh38 coordinate.

��
The Alleles column describes the two possible alleles at the variant location, listed in alphabetical order. In this study, the first allele will be called "A allele" and the second allele will be called the "B allele"; effect (

PLOS ONE
PheWAS of AD/FTD loci and causal inference of significant PheWAS traits

PLOS ONE
PheWAS of AD/FTD loci and causal inference of significant PheWAS traits

PLOS ONE
PheWAS of AD/FTD loci and causal inference of significant PheWAS traits

PLOS ONE
PheWAS of AD/FTD loci and causal inference of significant PheWAS traits

PLOS ONE
PheWAS of AD/FTD loci and causal inference of significant PheWAS traits

PLOS ONE
PheWAS of AD/FTD loci and causal inference of significant PheWAS traits in UKB cohort.
� chromosomal position based on genome build GRCh38 coordinate.

��
The Alleles column describes the two possible alleles at the variant location, listed in alphabetical order. In this study, the first allele will be called "A allele" and the second allele will be called the "B allele"; effect (β): The effect size, ln(Odds Ratio [OR]) for binary traits, defined per copy of the B allele.
In addition to APOE variants and rs646776, ABI3 variant rs28394864 was also associated with cardiovascular traits, such as hypertension (p = 1.60 x 10 −13 ), ischemic heart disease (p = 4.58 x 10 −9 ), coronary atherosclerosis (p = 6.47 x 10 −10 ), and angina (p = 6.51 x 10 −9 ). Despite CLU and ABCA7 are both implicated in cholesterol metabolism [22], neither the CLU variant nor the ABCA7 variant was strongly associated with metabolic/cardiovascular traits in the PheWAS analyses in either the 23andMe cohort or the UKB cohort despite the fairly substantial sample size for those traits in both cohorts.
The PheWAS results for the UKB cohort are available in S6

No significant genetic correlation between PheWAS traits and AD
Despite that multiple traits were associated with the same individual variants in PheWAS analysis, there was no significant genetic correlation among these traits (e.g. LDL/HDL cholesterol, Type 2 Diabetes, CAD, CeD, RA, UC, multiple sclerosis, and BMI) and AD at the genome level (S8 Table). As positive controls, IGAP AD [21] and UKB trait (Neale v1) Illnesses of mother: AD/dementia showed significant genetic correlation with Jansen et al AD results [30] (r g = 0.901, p = 3.10 x 10 −13 ; r g = 0.63, p = 1.09 x 10 −6 , respectively).  65], CeD [66], multiple sclerosis (MS) [67,68] were obtained. When treating AD as outcome and using p < 5x10 -8 to select variants as instrument variables, the MR Egger intercept test suggested a directional horizontal pleiotropy for extreme height (Egger intercept p = 0.009), total cholesterol (p = 0.02), RA (p = 0.02) and parents' age at death (Egger intercept p = 0.02). MR Egger analysis suggested that metabolic traits (e.g. LDL cholesterol (p = 4.7 x 10 −4 ) and total cholesterol (p = 9.8 x 10 −5 )) and RA had protective effect on AD with higher level of LDL or total cholesterol increasing the risk of AD and having RA reduce the risk of AD (S9 Table).

A causal role of cholesterol on AD revealed by MR analysis
Conversely, genetic variant instrument for AD [30] suggested that AD possibly had causal effect on MS (p = 0.0001 using inverse variance weighted method) and coronary heart disease (p = 0.003 using MR Egger method, S9 Table). For FTD, the results may be inconclusive due to few SNPs were used in the instrument variable and the SNPs chosen were suggestively significant from the GWAS with smaller sample size. These MR tests would be still significant after correcting for the number of traits tested (n = 15, p < 0.05/15~0.003). A full list of MR results is listed in S9 Table.

Discussion
The PheWAS study showed that both APOE variants defining ε2/ε3/ε4 alleles, ABI3 variant rs28394864, and rs646776 had significant associations with metabolic/cardiovascular and/or longevity traits. APOE variants were additionally significantly associated with neurological traits. HLA-DRB1 variant was associated with immune-related traits. Both APOE variants and CLU variant were significantly associated with eye phenotypes. The associations of ABI3 variant rs28394864 with cardiovascular traits (hypertension, ischemic heart disease, coronary atherosclerosis, angina), and asthma are novel findings from this study.
The novel finding of PheWAS associations of ABI3 variant is of most interest. Rare variant (rs616338, p.Ser209Phe, p = 4.56 × 10 −10 , OR = 1.43, MAF cases = 0.011, MAF controls = 0.008) in ABI3 was previously reported, and ABI3 is specifically expressed in microglia (S2 Fig, similar expression pattern in human compared to other AD genes implicated by human genetics including TREM2, HLA-DRB1, PLCG2, SORL1, SCIMP, and MS4A6A) and thought to play a role in microglia-mediated innate immunity in AD [69]. Given its role in immune response, the PheWAS association with asthma is not completely unexpected, and its association with cardiovascular traits might reflect the role of immune dysregulation on those disease processes.
The observed PheWAS associations of APOE variants with metabolic/cardiovascular traits are not surprising. While vascular and metabolic risk factors such as hypertension, hyperlipidemia /hypercholesterolemia, hyperinsulinemia, and obesity at midlife, diabetes mellitus (DM), and cardiovascular and cerebrovascular diseases (including stroke, clinically silent brain infarcts and cerebral microvascular lesions) are generally thought to increase the risk of dementia and AD [70][71][72][73], the directional impact of a factor could be age-dependent, for example, hypertension, obesity and hypercholesterolemia are risk factors at middle age (<65 years) for late-life dementia and AD, but protective late in life (age >75 years) [74]. It seems to be odd that AD patient had a lower risk of developing CAD [73], but it is consistent with a meta-analysis [72] and this meta-analysis also reported that metabolic syndrome decreases the risk of AD. In the MR analysis from this study, AD increased the risk of CAD (S9 Table), but this result was supported by MR Egger method only. Taking age into consideration may help better delineate the relationship. Furthermore, several cardiovascular risk factors demonstrated associations with more rapid cognitive decline as expected, however it was also reported that recent or active hypertension and hypercholesterolemia were associated with slower cognitive decline for AD patients [75]. These epidemiology studies suggested that it appears to be a complex interplay between AD and metabolic/cardiovascular risk factors and conditions, and the occasionally contradictory findings may be due to age of the population, sampling biases and/or other confounding factors. Nevertheless, the Finnish Geriatric Intervention Study to Prevent Cognitive Impairment and Disability (FINGER) study demonstrated that multidomain intervention (diet, exercise, cognitive training, vascular risk monitoring) had beneficiary effect on the primary outcome, i.e. change in cognition as measured through comprehensive neuropsychological test battery (NTB) in an at-risk elderly population (aged 60-77) with CAIDE (Cardiovascular Risk Factors, Aging and Dementia) Dementia Risk Score of at least 6 points and cognition at mean level or slightly lower than expected for his/her age group, suggesting targeting modifiable vascular and lifestyle-related risk factors could improve or maintain cognitive functioning [76]. The PheWAS analysis suggested the minor allele of rs7412 (defining ε2 allele), a known protective allele for AD (OR = 0.74), was also a protective allele for having high cholesterol, low HDL, having heart metabolic disease or CAD. Similarly, the minor allele of rs429358 (defining ε4 allele), a known risk allele for AD (OR = 2.17), was also a risk allele for having high cholesterol, low HDL, having heart metabolic disease or CAD. Our MR analysis demonstrated that LDL and total cholesterol had a causal relationship to the development of AD using MR Egger. This MR result is however sensitive to the MR methods used as other methods such as weighted mode, weighted median, or simple mode (not prespecified analyses) did not provide evidence or only provide suggestive evidence for the causal effect of LDL on AD. A recent MR analysis on 24 potentially modifiable risk factors [77] concluded that genetically predicted cardiometabolic factors were not associated with AD as there was no evidence of causal relationship after excluding one pleiotropic genetic variant (not disclosed in the publication) near the APOE gene (also near APOC1 and TOMM40 genes). The evidence we obtained was far weaker than that reported by Larsson et al., for all variants [77]. Despite there were few SNPs driving the causal evidence in single variant analysis, leave-oneanalysis did not differ substantially from the analysis including all variants for LDL trait except rs7412 (S1 Fig). This study opted to report the findings using the inverse variance weighted method (when Egger intercept is not significant) as also adopted by Howard et al. [78], where a minimal of 30 SNPs used in instrument variable was also imposed, or MR Egger regression results (when the intercept is significant). We did not filter out analysis with less than 30 variants. Both compromises are limitations in this study thus those results shall be interpreted with caution. In addition, both IVW and MR Egger methods do not protect against the violation of the MR assumption when the pleiotropic effects act via a confounder of the exposureoutcome association [49].
The observed PheWAS associations of rs646776 variant with metabolic/cardiovascular traits are also not surprising. SNP rs646776 was reported to be robustly associated with lowdensity lipoprotein cholesterol (LDL-C, p = 3 × 10 −29 ) with each copy of the minor allele decreasing LDL cholesterol concentrations by~5-8 mg/dl [79]. The association was strengthened in a meta-analysis of~100,000 individuals of European descent for LDL-C (p = 5x10 -169 ) and was also detected for total cholesterol (p = 7x10 -130 ) [80]. However, which gene is the causative gene for rs646776 effect is less clear despite it was selected to be included in the PheWAS analysis based on the association with plasma progranulin levels. Rs646776 at the 1p13 locus was also strongly associated with transcript levels of three neighboring genes: sortilin (SORT1) (p = 3 × 10 −26 ), cadherin EGF LAG seven-pass G-type receptor 2 (CELSR2) (p = 2 × 10 −12 ) and proline and serine rich coiled-coil 1 (PSRC1) (p = 3 × 10 −12 ) [79]. The conditional analysis suggested that SORT1 eQTL effect might be the dominant effect [79]. Rs599839, a SNP in LD with rs646776, was also reported to be associated with CAD [81]. The minor allele conferring lower level of LDL cholesterol also conferred lower risk of CAD. Rs646776 was also identified in a bivariate analysis to be associated with circulating IGF-I and IGF-binding protein-3 (IGFBP-3) (p = 6.87 x10 -9 ) in a meta-analysis of 21 studies including 30,884 adults of European ancestry [82]. The growth hormone/insulin-like growth factor (IGF) axis can be manipulated in animal models to promote longevity. IGF related proteins including IGF-I and IGFBP-3 have also been implicated in risk of human diseases including cardiovascular diseases and diabetes. This is particularly interesting given the observation that rs646776 is associated with longevity in the PheWAS analysis.
It is surprising and puzzling to see the effect of AD variants on multiple eye phenotypes especially myopia that have onset in early childhood or teens. The association with age related macular degeneration (AMD) was reported previously [83]. The AMD association is interesting because the histopathological hallmark of AMD is amyloid-β (Aβ) in optic never drusen [84]. Drusen of the macula are very small yellow and white spots that appear in one of the layers of the retina named Bruch's membrane and are remnant nondegradable proteins and lipids (lipofuscin), which is the earliest visible sign of dry macular degeneration. In addition to the amyloid phenotype, AMD and AD also share other common histologic feature such as vitronectin accumulation and immunologic features such as increased oxidative stress, and apolipoprotein and complement activation pathways [85]. The common etiopathogenetic and morphological manifestations of AD and age-related eye diseases in amyloid genesis may have a broader implication in understanding the disease mechanism, identifying new biomarkers and treatment [86]. A recent study showed that the soft drusen area in amyloid-positive patients was significantly larger than that in amyloid-negative patients [87]. Ocular and visual information processing deficit were other possible biomarkers for AD [88]. Recently it was also reported that thinner retinal nerve fiber layer is associated with an increased risk of dementia including AD, suggesting that retinal neurodegeneration may serve as a preclinical biomarker for dementia [89]. Risk variant for AD rs429358 in our PheWAS results had a protective effect for AMD and blindness (p = 9.6 x 10 −5 , FDR = 0.004), perhaps reflecting the equilibrium of Aβ in brain vs. retina like the situation between brain vs. CSF. A variety of other visual problems reported in patients with AD have been reviewed in details [90] including loss of visual acuity (VA), color vision and visual fields; changes in pupillary response to mydriatics, defects in fixation and in smooth and saccadic eye movements; changes in contrast sensitivity and in visual evoked potentials (VEP); and disturbances of complex visual functions, though they have not been studied as a risk factor of AD or outcome of having AD. In the MR analysis, we cannot directly test the causal relationship between AMD despite the GWAS with large sample size is available due to standard error of odds ratio was not reported in the paper [83].
Subjects with AD risk variant rs1113600 in CLU gene had a higher chance of being nearsighted, while subjects with AD risk variant rs429358 in APOE gene had a lower chance of being nearsighted. It was reported that wearing reading glasses correlated significantly with high mini-mental state examination for the visually impaired (MMSE-blind) after adjustment for sex and age (OR = 2.14, 95% CI = 1.16-3.97, p = 0.016), but reached borderline significance after adjustment for education [91]. There was a trend toward correlation between myopia and better MMSE-blind (r = -0.123, p = 0.09, Pearson correlation) [91]. On the other hand, myopia may be a surrogate phenotype for intelligence (or education), as a genetic correlation between myopia and intelligence was shown in a small cohort of 1500 subject (p < 0.01) [92]. Larsson et al., suggested that genetically predicted educational attainment was significantly associated with AD per year of education completed (OR = 0.89, p = 2.4 × 10 −6 ) and per unit increase in log odds of having completed college/university (OR = 0.74, p = 8.0 × 10 −5 ), while intelligence had a suggestive association with AD (OR = 0.73, p = 0.01) [77]. Our MR analysis did not provide evidence on the causal relationship of myopia phenotype on AD.
Furthermore, although genetic variants (rs429358 and rs1113600) are associated with multiple phenotypes, the associations are not necessarily independent of each other. In fact, MR Egger intercept test did not support the independent relationship except for height and a few other traits. Overall, the relationship and interpretation between traits seem to be complex and require further examination.
Other limitations of our study design also merit comment. The sample size for PheWAS varies from trait to trait depending on the prevalence rate of the trait and availability of data. For example, the cohort size for AD in the 23andMe database was not large in 2015 (~640 cases and~158K controls) when the PheWAS analysis for the 23andMe cohort was performed, which is a limitation for this study especially for replicating the association with AD. This may explain why some of the known SNP associations with AD were not replicated or only had nominally significant association in the 23andMe cohort. Furthermore, FTD is not a selfreported question collected in the 23andMe database and therefore could not be tested in the PheWAS analysis. Even if this was included, the sample size would have been smaller than that for AD based on population prevalence rate. Similarly, the cohorts for CD (n~3,600), UC (n~6,200), bipolar (~9,700), and schizophrenia (~700) were limited in size. However, the cohort sizes for other psychiatric disorders (e.g. depression, anxiety and panic) were sufficiently large (>250K cases). The PheWAS 23andMe cohort size for AD used in this study was limited, and therefore only APOE variants, the loci with the largest effect size, were confirmed to be associated with neurological traits. The sample size for a specific trait shall be taken into consideration when interpreting the PheWAS results. PheWAS typically uses "light" phenotyping (based on self-reported as in the 23andMe and surveys deployed by UKB or based on diagnostic ICD codes or medication / procedure usage pattern), the stringency of phenotype is certainly not as good as clinical ascertained phenotype, but the tradeoff is the power to survey a large number of diverse phenotypes within a single study.
The nominal causal effect between immune-related traits (except multiple sclerosis and rheumatoid arthritis) and AD/FTD would have been insignificant if correcting for the expanded list of diseases and risk factors from MR Base tested. Some of the instrument variable used consisted of small number of SNPs and may have weaken the real causal effect if exist. The observation does not seem to be purely by chance especially in light of the report on immune related enrichment of FTD where they found up to 270-fold genetic enrichment between FTD and RA, up to 160-fold genetic enrichment between FTD and UC, and up to 175-fold genetic enrichment between FTD and CeD. Overall, the immune overlap seems to be common to both FTD and AD at the genome level (represented by genome wide significant SNPs used as an instrument variable for AD and other diseases/risk factors except FTD where a p-value threshold of 5 x 10 −6 was used because of the smaller GWAS samples size), while there could still be specificity of neuroinflammation for risk variants in CR1, CD33, CLU, ABCA7, TREM2, SORL1, MS4A6A, SPPL2A, SCIMP, PLCG2, ABI3, and HLA-DRB1. Different MR analysis methods have different assumptions (which in reality do not always hold or even rarely hold) and power, the inference of the causal effect may be inconclusive or only suggestive unless the causal effect size is so huge that most of methods give unequivocal concordant results. Both IVW and MR Egger methods used in this study are vulnerable to false positives when the exposure and outcome traits are both affected by a heritable confounder [49]. Different exposure or outcome GWAS may also vary by study sample size, and number of variants with summary association statistics available (for outcome GWAS as this may limit the ability to leverage proxy SNPs in LD (default r 2 > = 0.8) with the set of SNPs in the instrument variable) and impact the strength of instrument variable and the power of MR analysis. Future reanalysis when studies with larger sample size and more complete summary association statistics will be warranted to interrogate the causal relationship.
Supporting information S1