Gene-Based Rare Allele Analysis Identified a Risk Gene of Alzheimer’s Disease

Alzheimer’s disease (AD) has a strong propensity to run in families. However, the known risk genes excluding APOE are not clinically useful. In various complex diseases, gene studies have targeted rare alleles for unsolved heritability. Our study aims to elucidate previously unknown risk genes for AD by targeting rare alleles. We used data from five publicly available genetic studies from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the database of Genotypes and Phenotypes (dbGaP). A total of 4,171 cases and 9,358 controls were included. The genotype information of rare alleles was imputed using 1,000 genomes. We performed gene-based analysis of rare alleles (minor allele frequency≤3%). The genome-wide significance level was defined as meta P<1.8×10–6 (0.05/number of genes in human genome = 0.05/28,517). ZNF628, which is located at chromosome 19q13.42, showed a genome-wide significant association with AD. The association of ZNF628 with AD was not dependent on APOE ε4. APOE and TREM2 were also significantly associated with AD, although not at genome-wide significance levels. Other genes identified by targeting common alleles could not be replicated in our gene-based rare allele analysis. We identified that rare variants in ZNF628 are associated with AD. The protein encoded by ZNF628 is known as a transcription factor. Furthermore, the associations of APOE and TREM2 with AD were highly significant, even in gene-based rare allele analysis, which implies that further deep sequencing of these genes is required in AD heritability studies.


Introduction
Alzheimer's disease (AD) is a leading cause of dementia and is known to have high heritability (as high as 60-80%) [1,2]. Genome-wide association studies (GWAS) have identified several risk genes for AD such as ABCA7, BIN1, CD33, CD2AP, CLU, CR1, EPHA1, MS4A6A/MS4A4E, and PICALM [3][4][5][6][7]. The known risk genes for AD explain only 30% of heritability [8,9]. Aside from APOE e4, reported risk genes have low clinical significance because of their small effect sizes [7]. The common variant hypothesis posited common diseases are attributed to common variants and this hypothesis is base concept for GWAS [10,11]. However, similar to other common diseases, the heritability of AD cannot be fully explained by common alleles [12].
There are growing reports regarding rare variants related to complex diseases [13][14][15][16][17]. Contrary to the common variant hypothesis, variants with low frequency could be primary causes for common diseases, according to the rare variant hypothesis [11,18]. The rationale of the rare variant hypothesis is that allele variants with low frequencies have a higher probability of functional significance [12]. A large scale exome sequencing study has indicated that 95.7% SNPs with functional importance are rare variants [19]. Additionally, the number of variants with loss of function showed an inverse correlation with MAF [20,21]. Considering their functional significance, rare variants may have large effect sizes. Recently, rare alleles in TREM2, APP, and PLD3 have been reported to have association with AD [22][23][24]. Thus, the identification of more risk or protective rare alleles associated with AD is required.
Although rare alleles are promising targets for genetic association studies of complex diseases, the analyses of rare alleles remains challenging. For example, very large sample sizes are required to detect rare alleles that have modest effect sizes [19]. Deep sequencing of large samples is too expensive for typical researchers to perform. The mutational loads within the same genes, regions, or pathways can be alternative approach [13,25]. However, a large number of candidate rare alleles within specific regions are more difficult to obtain and interpret, than genotyping of a few loci. Improvement of imputation methods has allowed accurate inference of rare alleles [26]. According to 1000 genomes study [20], the mean squared Pearson correlation coefficients (R 2 ) between rare SNPs (MAF 0.5%-5%) and imputed dosages were 0.7-0.9 in the European ancestry. Furthermore, mutational loads of rare alleles within genes obtained from imputation can confer high power [27]. In this study, we aimed to find risk genes for AD using gene-based analysis of rare alleles deduced from 1000 genomes and publicly available GWAS data.

Subjects
We used publicly available GWAS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), Genetic Alzheimer's Disease Associations (GenADA) study, Electronic Medical Records and Genomics (eMERGE), the National Institute on Aging Late Onset Alzheimer's Disease (NIA-LOAD) family study, and the Framingham study. ADNI data were obtained from https:// ida.loni.ucla.edu. GenADA (dbGaP accession number: phs000219.v1) [28,29], eMERGE (dbGaP accession number: phs000234.v1), NIA-LOAD (dbGaP accession number: phs000168.v1), and the Framingham study (dbGaP accession number: phs000007.v16) data were downloaded from dbGaP (http://www.ncbi.nlm.nih.gov/gap). Subjects with European ancestry were included. After genotypic quality control (QC), missing phenotypic data exclusion, and ethnic group selection, 4171 cases and 9358 control were included in this study. Summaries about the studies are shown in Table 1. Additional information for each study were detained in File S1. The institutional review board of Ilsan hospital approved our study. Written informed consent was given by participants. In addition patient records were anonymized prior to analysis.
After estimating haplotypes using SHAPEIT, v 1.0 [31], imputation with multi-population reference panels of 1000 genomes (phase I, release Mar 2012) was executed using IMPUTE2, v 2.2 with default parameters [32,33]. We discarded imputated SNPs with INFO,0.4. The dosage data of imputation were used for further analyses. The dosage means the expected genotype score [34].

Statistical analyses
In the association study, we adjusted for age, sex, years of education, and significant principle components (PCs) of the genetic stratification (File S1). For consistency across studies, years of education were categorized as follows: 1, # 4; 2, 4, and #10; 3, 11, and # 15; 4, .15 years according to the established methods of stratifications in the GenADA study. We imputed  missing years of education to a mean value. The years of education was regarded as a continuous variable. We performed a weighted, Z score based, fixed-effects, metaanalysis using METAL [35]. The effect sample size (N E ) for metaanalysis is given in terms of numbers of AD (N AD ) and of controls (N C ), as follows [35]: The forest plot was drawn using 'rmeta' R package. APOE is the strongest risk gene among the known risk genes for AD. In several genome-wide association studies for AD [3], the top ranked genes could show false associations with AD, because they are within same LD block of APOE e4. In addition, the pathogenesis of AD patients might be different between carriers and noncarriers of APOE e4 [36]. Therefore, we examined the dependency on APOE e4 genotype status by two ways. First, the results were compared after adjustment for APOE e4 genotype status -the number of APOE e4 allele in each individual. eMERGE and the Framingham study did not include data on APOE e4 genotype status. Therefore, we used imputed dosages of APOE e4 for these two studies (Table S1 in File S1). Second, the collinearity between selected genes and APOE e4 genotype status was examined.

Gene-based rare allele analysis
In this study, gene-based rare allele analysis means accumulations of rare alleles within the same coding region implemented in GRANVIL [27]. The definition of gene boundaries was based on the UCSC genome browser (build 37). The Framingham study showed inflated type I error and skewed results (Figures S1 and S2 in File S1). Therefore, we need to adjust for genetic stratification of the Framingham study using another algorithm implemented in GenABEL v1.69 and ProbABEL v0.30 [30,37] ( Figure S2 in File S1). For gene-based analysis of the Framingham study, we need to make computer program for ourselves. We made a dosage of a gene (D) similar to an allele's dosage in the Framingham study, as follows [27].

D~P n i~1 Gi n
Where Gi is a dosage of the ith SNP and n is a number of rare alleles within a gene that were used in the analysis.
Analyses proceeded in two steps. The overall study scheme is shown in Figure 1. We performed the first meta-analysis to select genes with genome-wide significance. The genome-wide significance was defined as significance of P,1.8610 26 (0.05/number of genes in human genome in UCSC genome browser (build 37) = 0.05/28517). However, there are three shortcomings in the gene-based rare allele analysis using imputation. First, it is difficult to interpret if there are a lot of rare alleles in a gene. Second, by pooling risk and protective alleles, power can be decreased. However, considering such directions before selecting candidate genes, overinflation of type I error can be problematic. Third the accuracy of imputation can be decreased in rare alleles with very low MAF. We performed confirmatory analysis (the second metaanalysis) with selected SNPs We did confirmatory analysis, according to two reasons. First, if we could test genetic risk factors with a small number of SNPs, it would be more convenient for genotyping and interpretation. Therefore, we selected several risk Table 2. The highly ranked seven genes in the first meta-analyses. * Larger absolute Z score represents smaller P and the direction of the Z score represents the direction of risk [35]. 1 The signs mean those of the Z score of each study. The question mark represents missing data in the study because of low INFO or high MAF. The order of the signs is ADNI, ADNI2, GenADA, eMERGE, NIA-LOAD, and Framingham study. doi:10.1371/journal.pone.0107983.t002 SNPs in the finally selected gene according to meta P and meta Z (P,0.05 and Z.0) after performing classical SNP based GWAS and meta-analysis. Second, we excluded rare variants with MAF, 0.5%, because the imputation accuracy decreases in very low MAF [20].

Dependency on APOE e4 genotype status
We examined the dependencies of the selected genes by adjusting for APOE e4 ( Table 2). The significance of ZNF628 was remained, even after adjustment. However, the significance of APOE decreased after adjustment for APOE e4 (after adjustment, P value of APOE increased to 0.023).
Additionally, the collinearity between ZNF628 and APOE e4 genotype status were examined based on the variance inflation factor (VIF, Table S3 in File S1). The VIFs of all studies were approximately 1.

Meta-analysis with selected risk SNPs (the confirmatory second analysis)
For a more applicable clinical approach, we identified significant risk SNPs by meta P and meta Z scores. Furthermore, considering the imputation accuracy [20], we selected SNPs with 0.5% # MAF#3%. Two risk SNPs (dbSNP ID: rs112407198 and  . Schematic representation of ZNF628 with locations of SNPs used in gene-based rare allele analysis in this study. ZNF628 is a protein 1059 amino acids long. We briefly showed the domains (boxes) and the locations of SNPs (arrows) in a schematic linear structure of ZNF628. Blue boxes denote C2H2-type zinc finger domains. dbSNP ID can be found in Table S2 in File S1. SNPs within red boxes were used in the second analysis. doi:10.1371/journal.pone.0107983.g003 rs73057174) selected within ZNF628 were synonymous SNPs ( Figure 3). As shown in Figure 2B and Table 3, gene-based rare allele analysis using only selected SNPs had genome-wide significance with moderately high effect size (meta P = 3.7610 -7 [OR 1.7, 95% CI 1.4-2.0]).
Gene-based rare allele analyses for the genes known to be associated with AD Interestingly, rare alleles in APOE and TREM2 showed significantly high association with AD (Table 2). Thus, we tested rare alleles of other known genes associated with AD. The most highly ranked nine genes in the AlzGene database [38] (ABCA7, PICALM, CLU, MS4A6A/MS4AE, CD33, BIN1, CR1, and CD2AP) were selected for the test. Based on the meta-analysis, only BIN1 had significance (meta P = 0.046), but did not reach to genome-wide significance level (Table 4).

Discussion
We performed meta-analysis with publicly available genetic studies of AD with imputed rare (MAF#3%) alleles. ZNF628 was identified to have significant association with AD. Additionally, our rare allele analysis revealed the significant association of APOE and TREM2 with AD, which suggested that our results were valid and that these genes require further study [39,40].
ZNF628 is a C2H2-zinc finger protein, a type of transcription factors [41] consisting of three exons. C2H2-type zinc finger proteins are known to be essential for normal growth and development [41]. ZNF628 is found in mammals, but not Zebra fish or C. elegans [41]. ZNF628 is evenly expressed in various tissues including brain [42,43]. ZNF628 is conserved among mammals and seems to be functionally important [41]. The possible DNA binding site is the sequence motif -C/GA/TA/ TGGTTGGTTGC [41]. As this time, the target proteins and related human disorders associated with ZNF628 have not been reported. It is possible that the rare alleles in ZNF628 change the expression levels of certain proteins related to AD pathogenesis.
In the selected allele analysis of ZNF628 (the second confirmatory analysis), P and Z values of two SNPs (rs112407198 and rs73057174) reached the criteria of P,0.05 and Z.0. These SNPs are located outside the C2H2-type zinc finger domains and synonymous SNPs (Figure 3). The synonymous mutations are known to change the protein expression level and conformation [44] by affecting mRNA structure [45] or changing the time of cotranslational folding [46]. The altered expression levels or structure of ZNF628 could affect the expression level of other proteins.
There were no dependencies between ZNF628 and APOE e4 genotype status. ZNF628 is separated from APOE by more than 10 8 bp, although they are both located on chromosome 19. Therefore, ZNF628 is not included in same LD block with APOE e4. ZNF628 did not lose its significance in meta-analysis even after adjustment for APOE e4 genotype status. Therefore, ZNF628 appears to be related with AD independently from APOE e4. In contrast, the significance of APOE was affected by APOE e4. The association of the rare alleles in APOE with AD was highly significant (P = 1.4610 -6 ) with AD, although this significance disappeared after adjusting for APOE e4. This suggested that rare alleles in the same LD block with APOE e4 conferred significant association with AD.
Other risk genes that have been found in GWAS targeting common alleles were not replicated in our gene-based rare allele Table 3. Results of the second meta-analysis (confirmatory analysis).   analysis. Only TREM2, which has been identified in previous studies targeting rare alleles, showed high significance levels [39,40]. Common alleles with small effect sizes have been explained by synthetic association of rare alleles [47,48]. Recently, however, this hypothesis was not confirmed in a large-scale study of seven common immune diseases [49]. Similarly, we could not show association of rare alleles within the known genes with AD. There are several limitations in this study. First, a replication study with real genotyping is required. However, 1000 genomesbased imputations can enable us to find refined and novel signals [50]. Furthermore, the sample size and power can be increased by imputation [51] and meta-analysis [52]. Our gene-based rare variant analysis by imputation have comparable high power with re-sequencing analysis, especially with a large number of sample size [27]. Second, rare alleles analysis of ZNF628 of this study was performed in White populations. Although this result should be replicated in different populations, it is difficult to identify. The two important selected SNPs of our study, rs11247198 and rs73057174, have not been reported in Asian populations, whereas higher MAF has been identified in Black populations (especially in the Bushmen). Third, current methods of rare allele analysis still have problems and need more powerful and consistent methods [53]. The simulated studies using 20 different tools did not generate consistent results [54]. Therefore, simulation studies to identify methods that generate the optimal results are required [53]. Additionally, the directions of SNPs for related diseases are not usually considered [53]. Lastly, the SNPs in introns could not be considered because of limited our computational resources.
In conclusion, we observed a noble association between ZNF628 and AD. Considering the biological role of the ZNF628 protein, it may contribute to AD by regulating various AD-related proteins expressions. Functional studies to elucidate its contribution to AD pathogenesis are required. Additionally, further studies addressing different populations should be replicated to assess the value of the ZNF628 rare allele as a genetic biomarker of AD.

Supporting Information
File S1 Supplement text, tables, and figures.  However, we did not receive the commercial funds that are shown in this section. Although the data in our study can be publicly available, it was mandatory to show the funding sources for the studies. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
This manuscript was not written in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or NHLBI.
Some of data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.lo-ni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report.

Author Contributions
Conceived and designed the experiments: JHK SAP. Performed the experiments: JHK SAP. Analyzed the data: JHK HSL J-HL SAP. Contributed reagents/materials/analysis tools: JHK J-HL SAP. Contributed to the writing of the manuscript: JHK PS HSL J-HL JHL SAP. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved: JHK PS HSL J-HL JHL SAP.