Polygenic Analysis of Late-Onset Alzheimer’s Disease from Mainland China

Recently, a number of single nucleotide polymorphisms (SNPs) were identified to be associated with late-onset Alzheimer disease (LOAD) through genome-wide association study data. Identification of SNP-SNP interaction played an important role in better understanding genetic basis of LOAD. In this study, fifty-eight SNPs were screened in a cohort of 229 LOAD cases and 318 controls from mainland China, and their interaction was evaluated by a series of analysis methods. Seven risk SNPs and six protective SNPs were identified to be associated with LOAD. Risk SNPs included rs9331888 (CLU), rs6691117 (CR1), rs4938933 (MS4A), rs9349407 (CD2AP), rs1160985 (TOMM40), rs4945261 (GAB2) and rs5984894 (PCDH11X); Protective SNPs consisted of rs744373 (BIN1), rs1562990 (MS4A), rs597668 (EXOC3L2), rs9271192 (HLA-DRB5/DRB1), rs157581 and rs11556505 (TOMM40). Among positive SNPs presented above, we found the interaction between rs4938933 (risk) and rs1562990 (protective) in MS4A weakened their each effect for LOAD; for three significant SNPs in TOMM40, their cumulative interaction induced the two protective SNPs effects lost and made the risk SNP effect aggravate for LOAD. Finally, we found rs6656401-rs3865444 (CR1-CD33) pairs were significantly associated with decreasing LOAD risk, while rs28834970-rs6656401 (PTK2B-CR1), and rs28834970-rs6656401 (PTK2B-CD33) were associated with increasing LOAD risk. In a word, our study indicates that SNP-SNP interaction existed in the same gene or cross different genes, which could weaken or aggravate their initial single effects for LOAD.


Introduction
Alzheimer's disease (AD) is a clinically complex neurodegenerative disorder, affecting up to 81.1 million people worldwide [1]. It is characterized by memory and other cognitive decline, a variety of neuropsychiatric symptoms and restriction in the activities of daily living [2]. According to the age of onset, AD was classified as early-onset AD (EOAD, equal or less than 65 years) and late-onset AD (LOAD, more than 65 years), while the latter is the most common type of AD. The pathogenesis of AD is complicated, mainly caused by genetics, environment and normal aging [3]. To date, there are still only three causative genes reported in familial EOAD patients, including presenilin 1 (PSEN1), presenilin 2 (PSEN2) and amyloid precursor protein (APP) genes [4]. As the majority of patients were LOAD, many genetic association studies have been conducted in recent years to uncover the genetic contributions to LOAD, but the only gene variant considered to be an established LOAD risk factor was the APOEε4 allele [5].
However, in most cases, the identified single nucleotide polymorphisms (SNPs) have small or moderate effect sizes, and the proportion of heritability explained is quite modest. Like other complicated diseases, a polygenic analysis has been suggested to explain genetic contribution to the pathogenesis of the majority of LOAD cases. We hypothesized that SNPs might interact in subtle ways that led to substantially greater effects than the effect of any single SNP. Therefore, in this study, we adopted a series of statistical analysis methods to evaluate SNP-SNP interactions in a Chinese cohort consisting of 547 individuals. It was found that SNP-SNP interaction could weak or aggravate their single effect for LOAD. In addition, even if some variants had no effect on LOAD, their interactions could become significant effects associated with LOAD.

Sample subjects
A total of 547 subjects were recruited in this study, including 229 patients with LOAD (male 43.1%; age at onset: 75.2±5.0 years) and 318 controls (male 47.8%; age:71.6±2.5 years). All patients diagnosed as probable or definite AD met with the NINCDS-ADRDA criteria, and they were collected from the department of neurology, Xiangya Hospital. Analyses also included all unaffected individuals of matched geographical ancestry as healthy controls. The study was approved by the Ethics Committee of Xiangya Hospital, Central South University in China (equivalent to an Institutional Review Board). Written informed consent was obtained from all subjects (if the patient was no capacity to understand this study due to cognitive impairment, written informed consent was obtained from their legal guardians).
sequencing. APOE genotype was identified through polymerase chain reaction (PCR), and the primer information was listed in Table A in S1 File. We sequenced all PCR products with Big-Dye terminator v3.1 sequencing chemistry and performed on an ABI 3730xl DNA analyzer (Applied Biosystems). Finally, the DNA sequences were read by Sequencher software.

Statistical methods
Hardy-Weinberg equilibrium was analyzed using SHEsis software [17]. Logistic regression analysis was used to test for associations between each SNP allele and LOAD risk after adjusting age and gender. We then put all positive SNPs into multivariable logistic regression model to evaluate the association between each SNP and LOAD susceptibility. All above statistics were analyzed using SPSS 17.0 version. The p value<0.05 was defined as statistical significance.
Finally, Lasso-multiple regression (LMR) method was used to identify SNP-SNP pairs' interaction. The calculation involved two steps: In step one, the Lasso was used to select candidate SNPs with all pair-wise interaction terms in the regression model, which was conducted under the R package glnetmi. Ten-fold cross validation was performed in this step. Interaction terms were ranked by the absolute value of the coefficients from high to low. Thus, it could achieve the aim of feature selection. In the second step, multiple linear regression was performed on the candidate SNPs selected in the first step to evaluate the pure interaction effect. The variable significance was measured by the p-value of cross-product term. Stepwise AIC selection was applied on the multiple linear regression to obtain an optimal regression model [18][19][20].

Results
The frequencies of APOE alleles and genotypes of all subjects were listed in Table 1. The distribution of ε4 allele in LOAD cases was significantly higher than that of controls (p = 0.002), and lower than that of controls for ε2 allele (p = 0.009). With regard to genotype, there was a significant difference in ε3/4 and ε4/4 genotypes between them (p = 0.004, p = 0.047) ( Table 1).
All candidate SNPs were in Hardy-Weinberg equilibrium except rs2075650 and rs7274581, therefore, we deleted both of them in the further analysis. Totally, 13 significantly different allelic frequencies were identified between patients and controls after adjusting age and gender, including seven risk SNPs and six protective SNPs for LOAD. The former consisted of Table 2). To explore whether the identified seven risk SNPs or six protective SNPs had an interaction among them, we further analyzed them using multivariable logistic regression model. Finally, we found eleven SNPs had an independent effect for LOAD, except rs1160985 and rs157581 after adjusting age, gender and APOE ε4. (Tables A and B in S3 File).
It was noted that both risk and protective SNPs were identified in MS4A (rs4938933-risk and rs1562990-protective) and TOMM40 (rs1160985-risk, rs157581-protective, and rs11556505-protective). With regard to two SNPs in MS4A, a total of four subhaplotypes were grouped, while no significant difference was identified between cases and controls (Table 3). In addition, eight subhaplotypes were combined within three SNPs in TOMM40, and the only subhaplotype G-T-C (rs157581-rs11556505-rs1160985) was significantly susceptible to LOAD risk. (Table 4) We further used LMR method to analyze SNP-SNP pairs' interaction: in the first step, 20 SNPs were selected as candidates from 56 SNPs (Tables A and B in S4 File); in the second step, three SNP-SNP pairs were found to be statistically significant after adjusting multiple testing by Bonferroni correction (Table 5). Three SNP pairs were identified to be significantly associated with LOAD (adjusted p-value to be 0.017, 0.018 and 0.032), including rs6656401-rs3865444 (CR1-CD33), rs28834970-rs6656401 (PTK2B-CR1), rs28834970-rs6656401 (PTK2B-CD33). The pair (rs6656401-rs3865444) with combination (AG, GT) reduced the LOAD risk by 0.10 compared to (GG, GG), while the other two SNP pairs had an increase risk for LOAD. The pair (rs28834970-rs6656401) with genotype (CT, AG) increased the LOAD risk by 10.90 compared to (TT, GG) and genotype (CT, GT) on (rs28834970-rs3865444) had an increased risk by 1.54 compared to (TT, GG).

Discussion
LOAD was one of common diseases with strong genetic component [3]. Although APOE ε4 explained a portion of LOAD, other loci, especially identified through GWAS of LOAD, probably also participated in the process of LOAD. [21,22]. Researching these variants will lead to better understanding its biological function that can help in LOAD risk assessment, diagnosis and development of new therapies for LOAD. In this study, we conducted a comprehensive analysis of 56 SNPs using univariate analysis, multiple logistic regression and LMR methods. To our knowledge, it was the most comprehensive genetic analysis for LOAD patients from mainland China.
In current study, we found seven risk SNPs in CLU, CR1, MS4A, CD2AP, TOMM40, GAB2 and PCDH11X genes conferred to LOAD risk. CLU, known as apolipoprotein J probably increased AD risk through interacting with APOE [23]. A Meta-analysis showed rs9331888 was   significantly associated with AD risk in Caucasian population [24], and we successfully replicated this risk loci. The role of CR1 in AD development has been highlighted due to involving in erythrocyte amyloid β42 sequestration and clearance of Aβ from whole blood [25], and the missense variant rs6691117 (Ile!Val) may change the folding of CR1 protein and affect structural stability through affecting Hsp70 binding, then cause functional changes [26]. The third identified risk gene was MS4A, which may be associated with the control of intracellular free Ca2+ concentration, resulting in neuronal death and decline in cognition [27]. The SNP rs4938933-C in MS4A was previously reported to be associated with decreasing LOAD risk in white population [15], while our result displayed rs4938933-C was a susceptible loci for LOAD risk. CD2AP was probably linked to modulating amyloid β clearance and tau neurotoxicity [28]. In this study, we first reported the SNP rs9349407 in CD2AP was significantly associated with LOAD in Chinese Han population, which was first identified to be significantly associated with AD in European ancestry, while the recent studies failed to replicate this result in Japanese, African-American and Canadian [29]. The TOMM40 gene is located adjacent to APOE. These two proteins may interact with each other to affect mitochondrial dynamics, although the precise mechanism underlying this is unclear. An interesting study showed the SNP rs1160985 was found to be associated with serum triglyceride concentration [30]. GAB2 is well characterized as a risk gene for the development of AD, which probably interacts with APOE ε4 to further modify risk [31]. With regard to PCDH11X, the detail of its biological mechanism was unclear, which needs us do more research to know about their relation with LOAD.
Until now, only two variants were identified to protect individuals away from AD: rs63750847-A in APP and APOEε2 genotype [32]. However, these two genes were causative or risk for AD. Therefore, we hypothesized some risk genes might also carry some protective variants. In current study, six SNPs were found to be associated with decreasing LOAD risk in BIN1, TOMM40, MS4A, EXOC3L2 and HLA-DRB5/DRB1. However, among them, five SNPs were previously identified risk loci except rs1562990-C in MS4A. Our result indicated that the impact of susceptible genes varied in different ethnicities, which could help us better understand the contributions of genetics to LOAD from Chinese Han population. In addition, more samples should be confirmed these protective variants.
No matter risk SNPs or protective SNPs, further analysis using multivariable logistic regression indicated most of significant loci might have an independent effect on LOAD, which confirmed AD was one of multifactorial diseases with great complex genetic background. Meanwhile, due to our positive result including different genes, it is worth to be focused on exploring their relation between amyloid-β pathology or tauopathy and these genes functions (lipid metabolism, immune response and endocytosis) in the future.
Another goal for current study was to identify whether SNP-SNP interactions existed in the pathogenesis of LOAD. We first analyzed the interactions between positive SNPs in the same gene. For rs4938933 and rs1562990 in MS4A, although they had independent reverse effect on LOAD, both of their respective effects were loss when being considered together. With regard to rs157581, rs11556505 and rs1160985 in TOMM40, although they were involved in decline or increasing risk for LOAD, the only significant subhaplotype was G-T-C that could increase LOAD risk, which indicates the SNPs effect on LOAD could be influenced by other SNPs in the same gene.
In addition, we further used LMR analysis method to identify three SNP pairs significantly associated with LOAD risk. However, the significant effect was not to be found in any of them during the previous univariate analysis. Among them, one pair rs28834970-rs6656401 from PTK2B-CR1 gene should be concerned due to the high Odds Ratio score (10.90) for LOAD risk, suggesting these two genes may have an strong interaction effect in the pathogenesis pathway. Previous study has identified two significant SNP-pairs for LOAD risk: rs386544-rs670139 from CLU-MS4A4E and rs11136000-rs670139 from CD33-MS4A4E. Taken these results together presented above, we offered evidence that SNP-SNP interaction played a pivotal role in LOAD susceptibility. As we know, AD is a complicated disease with gene-environment interaction. Therefore, although three pairs PTK2B-CR1, PTK2B-CD33, CD33-CR1 interaction appeared to have strong effects on LOAD, there may be unmeasured environment factors participating in these interaction. In current study, three new SNP-SNP pairs left their inter-molecular mechanism unsolved. CD33 is a transmembrane protein which has been implicated as a negative regulator of myeloid cells. CR1 is found on myeloid cells as well. Both of them are key molecules in inflammatory cells, such as microglia [33], therefore, we speculate they may share the similar pathway in the progression of LOAD. One AD research team from the United States now is beginning to map the molecular consequences of CR1 and CD33 variants to uncover their functionally link susceptibility loci [33]. With regard to PTK2B, which was one of recent identified new susceptible gene in 74046 individuals from diverse ethnicities [16], it was a key component of signaling pathways involved in neurite growth and synapse formation [34]. Although PTK2B was located on chromosome 8 near the CLU gene, we did not find a SNP-SNP interaction in these two genes. However, we found PTK2B had a strong interaction with NEDD9, which was important for lymphocyte signaling and migration and played a role in T cell-mediated inflammation [34]. Therefore, we speculate PTK2B might participate in immune inflammation through regulating NEDD9 protein function, which was involved in CR1 and CD33.
In summary, we examined 56 candidate SNPs in a cohort of Chinese subjects using a series of analysis methods. We identified seven risk and six protective variants for LOAD. With regard to SNP-SNP interaction, firstly, we found out some SNPs in the same gene could be influenced by each other to weaken or aggravate their single effect. Secondly, although some variants had a weak effect, a strong interaction effect was found after interacted with each other.
Supporting Information S1 File.