Fine Mapping of a Region of Chromosome 11q23.3 Reveals Independent Locus Associated with Risk of Glioma

Background A single nucleotide polymorphism (SNP) at locus 11q23.3 (rs498872) in the near 5′-UTR of the PHLDB1 gene was recently implicated as a risk factor for gliomas in a genome-wide association study, and this involvement was confirmed in three additional studies. Methodology/Principal Findings To identify possible causal variants in the region, the authors genotyped 15 tagging SNPs in the 200 kb genomic region at 11q23.3 locus in a Chinese Han population-based case-control study with 983 cases and 1024 controls. We found evidence for an association between two independent loci (both the PHLDB1 and the ACRN1 genes) and a predisposition for gliomas. Among the multiple significant SNPs in the PHLDB1 gene region, the rs17749 SNP was the most significant [P = 1.31×10−6 in a recessive genetic model]. Additionally, two novel SNPs (rs2236661 and rs494560) that were independent of rs17749 were significantly associated with glioma risk in a recessive genetic model [P = 1.31×10−5 and P = 3.32×10−5, respectively]. The second novel locus was within the ARCN1 gene, and it was associated with a significantly reduced risk for glioma. Conclusions/Significance Our data strongly support PHLDB1 as a susceptibility gene for glioma, also shedding light on a new potentially candidate gene, ARCN1.


Introduction
Gliomas that originate from glial cells are the most common primary tumors of the central nervous system (CNS), representing more than 40% of newly diagnosed brain tumors [1]. Asian populations generally show lower incidence rates than populations in Europe and North America. However, in the last few decades, China has experienced rapid increases in malignant brain tumor incidence rates, especially in large cities. In 2000, the annual incidence rate of brain tumors was less than 3.9 per 100,000 in men and 2.8 per 100,000 in women in China [2]. According to the China's Health Statistics Yearbook from 2009, the annual mortality rate in 2004 and 2005 from gliomas was approximately 3.13 per 100,000 individuals [3]. Furthermore, glioblastoma multiforme (GBM) is the most common and the most malignant astrocytic tumor, with a median survival of only 12-15 months under the current standard of care [4]. The etiology of gliomas is largely unknown. High-dose ionizing radiation is known to increase the risk. However, only a small proportion of exposed individuals will develop gliomas, suggesting a genetic predisposition for glioma occurrence.
An association between glioma risk and the rs498872 at 11q23.3 was recently identified in a genome-wide association studies (GWAS) [5]. This association was confirmed in three additional studies [6,7,8], including our previous replication study in a Chinese Han population. More importantly, the SNP, which is mapped to the near 59-UTR of PHLDB1 (Pleckstrin homology-like domain, family B, member 1) within a 101-kb LD block on 11q23.3, is frequently deleted in patients with neuroblastoma [9]. Although there is no direct evidence of a role for this gene in gliogenesis, based on all the available data, an association between 11q23.3 and glioma appears to be one of the most consistent genetic association findings in complex diseases to date. The current data strongly suggest the presence of a glioma susceptibility locus at 11q23. 3.
It is possible that other genetic variants near rs498872 in this region are responsible for the consistent risk in glioma observed for this SNP in previous studies. In this study, we performed a fine mapping study to examine the association with glioma risk of all known common sequence variants in the vicinity of the rs498872 SNP at 11q23.3.

Results
To explore whether additional SNPs in the regions flanking SNP rs498872 are associated with glioma risk, we genotyped 15 tagging SNPs in the 200 kb genomic region at 11q23.3 in a Chinese Han study population. The distribution of selected characteristics between the 983 glioma cases and the 1,024 cancerfree controls are summarized in Table 1. Because of the frequency-matching design of this study, the distributions of age (age at diagnosis for case patients and age at inclusion for control subjects) and sex were comparable between the case patients and the control subjects. The mean age was 42.19 years for cases and 42.22 for controls. Approximately 59% of both cases and controls were male. Similar to the previous study [7,10,11], the cases were slightly more likely to report a family history of cancer (among first-degree relatives) than controls (17.5% versus 13.1%; P = 0.003). Among the cases, 306 had glioblastomas and 671 had low-grade gliomas (including 369 astrocytomas and 303 other low-grade gliomas).
The allele distributions of the 15 SNPs at 11q23.3 with their MAFs and associations with glioma risk are shown in Table 2. The genotype distribution in controls for all the variants were in Hardy-Weinberg equilibrium (P.0.01). To reduce the potential of spurious findings due to multiple testing, P = 6.67610 24 (0.01/15) was considered the significance threshold using strict Bonferroni corrections [12,13]. In the single locus analyses, we observed statistically significant differences between case patients and control participants in allele distributions for four SNPs (P = 2.86610 24 for rs7115634, P = 3.32610 26 , for rs2236661, P = 2.93610 25 for rs494560 and P = 3.08610 25 for rs17748, respectively). Further logistic regression analyses revealed that patients with the rs7115634 G allele in the ARCN1 gene had a 21% reduction of glioma risk (adjusted OR = 0.79, 95%CI = 0.70-0.89, P = 2.12610 24 ). Additionally, the rs2236661, rs494560 and rs17748 in the PHLDB1 gene were statistically significantly associated with glioma risk (adjusted P = 1.06610 25 , P = 4.23610 25 and P = 2.36610 25 , respectively). By tumor subtypes, at a minimum separating the glioblastoma and other gliomas, three SNPs (rs2236661, rs494560 and rs17748) were significantly associated with other types (including astrocytic glioma, oligodendrogliomas, ependymomas, ormixed gliomas) but not with GBM (adjusted P = 9.22610 26 , P = 1.06610 24 and P = 7.90610 25 , respectively).
The LD plot of the chr11:117882577-118082577 region is shown in Figure 1 (from hapmap data, release 21, Phases I and II, CHB). Two separate clusters of glioma-associated SNPs were found, including the previously reported PHLDB1 locus and a novel locus (ARCN1). Among multiple significant SNPs at PHLDB1 gene region, rs2236661 and rs494560 remained significantly associated with glioma risk after adjusting for rs17748 (adjusted P = 4.09610 26 and P = 8.73610 26 , respectively), suggesting these SNPs are independent from rs17748.
Genotype frequency distributions of the four identified risk SNPs between the cases and controls are detailed in Table 3. Significant associations were observed for these four SNPs (P = 1.75610 25 for rs2236661, P = 2.82610 24 for rs494560 and P = 1.68610 26 for rs17748) in a dominant model and for one SNP (P = 1.48610 24 for rs7115634) in a recessive model, based on the best fit of the Akaike's information criterion (Table 3). Furthermore, we performed stratified analyses by glioma histological type. The rs2236661, rs494560 and rs17748 were strongly associated with non-GBM gliomas (adjusted P = 2.08610 25 , P = 2.69610 24 and P = 1.66610 26 , respectively).
To understand the cumulative effects of these variants on glioma risks, we created a variable to combine the effects of risk alleles per individual from the four independent risk variants (rs7115634, rs2236661, rs494560 and rs71148). Overall, glioma risk increased with increasing numbers of risk variant alleles. Individuals carrying 6-8 risk alleles had a 1.93-fold increased risk of developing glioma compared with those who carried 0-2 risk alleles (adjusted OR = 1.93, 95% CI = 1.46 to 2.55, P = 3.70610 26 , Table 4). Figure 2 shows plots of the pairwise LD (D') values for the 15 SNPs and the LD structure of the 11q23.3 region. Four blocks were defined by 8 SNPs (rs12289253 and rs3741324; rs494560 and rs17748; rs10892251 and rs11216943; and rs4639966 and rs496547). Global score tests showed statistically significant differences in haplotype frequency distributions between the cases and the controls for block 2 and block 3 (P sim = 5.0610 26 ) but not for block 1 and block 4 ( Table 5).

Discussion
To explore the contribution of genetic variation at the 11q23.3 locus that was previously identified by GWAS and our previous replication study for gliomas, we performed mapping of this region, including the PHLDB1 gene, using a highly correlated tag SNP approach. After genotyping 15 tagged SNPs from a 200 -kb   region of LD flanking the initial SNP marker (rs498872), rs17748 was in strong LD with rs498872 (r 2 = 0.826) and was significantly associated with glioma risk. Additionally, we found two novel independent SNPs (rs7115634 and 2236661) in the PHLDB1 gene and one SNP (rs494560) in the ARCN1 gene that conferred to glioma risk. PHLDB1 (also known as LL5a) is a protein that was first identified in a bioinformatics screen [14]. It contains a Forkheadassociated (FHA) domain and a C-terminal PH domain [15][16][17]. The database from the Genomics Institute of the Novartis Research Foundation shows that PHLDB1 is highly expressed in the brain and adipose tissues of mice and humans. Although the PH domain of PHLDB1 possesses a potential PI(3,4,5)P 3 -binding motif, the molecular basis by which PH domains are able to interact with PI(3,4,5)P 3 has not been established definitively. Recently, Zhou et al. [18] demonstrated that PHLDB1 binds PI(3,4,5)P 3 through its PH domain and that PHLDB1 functions in adipocytes as a positive regulator of Akt activation, where it is required for optimal insulin induced glucose transport and GLUT4 translocation. However, there is no direct functional evidence of a role for PHLDB1 in the initiation of tumors.
Through fine-mapping, the strongest signal was still located on the PHLDB1 gene. In our study, the rs223661 C allele and the rs17748 T allele exhibited a statistically significant increased risk of glioma (OR = 1.46, 95% CI = 1.23 to 1.72, P = 1.06610 25 and OR = 1.36, 95% CI = 1.17 to 1.59, P = 2.36610 25 , respectively), and the rs494560 A allele showed a significantly protective effects (OR = 0.71, 95% CI = 0.60 to 0.85, P = 4.23610 25 ). Individuals with the 'GT' haplotype, which consists of the containing the 'G' risk allele of rs494560 and the 'T' risk allele of rs17748, had a 1.30fold higher risk of developing glioma than individuals with the 'GC' haplotype, which was consistent with the individual SNP association analysis. Rs17748, which is located in the 39UTR of PHLDB1, and the previously reported rs498872 were in strong LD (r 2 = 0.826). Thus, the significant association between these SNPs and gliomas risk appeared to be consistent between our two separate studies. It is notable that rs17748 was also reported to be significantly associated with glioma risk in a European populations by a recent GWA study [19,6]. Those data strongly suggest that Importantly, rs2236661 and rs494560 remained significantly associated with glioma risk after adjusting for rs17748 (adjusted P = 4.09610 26 and P = 8.73610 26 , respectively), suggesting that these SNPs have a role that is independent from rs17748.
Recent advances in understanding of glioma subtypes (e.g. proneural, neural, mesenchymal) based on gene expression [20], somatic mutations (e.g. IDH1) [21] and global patterns of methylation (glioma CpG island methylator pheynotype; G-CIMP) [22] suggest there are important subgroups of glioma that may represent distinct pathological entities. Jenkins et al. [8] report that specific germ line polymorphisms are associated with different glioma subtypes. Similar to previously reports, we note that 3 SNPs (rs 226661, rs494560 and rs17748) in PHLDB1 gene was only strongly associated with other non-GBM gliomas but not with GBM.
Additionally, we also detected a novel association that is independent of PHLDB1. Rs7115634 was in low LD (pairwise r 2 ,0.122, Table S1) with rs17748, rs2236661, and rs494560 in our Chinese population. This newly identified maker SNP is mapped to the intron of ARCN1, which is located within the commonly deleted region of neuroblastoma patients at human chromosome 11q23.3 [14]. ARCN1, also known as d-COP, is a sub-unit of the coat protein I (COPI) complex binds to dilysine motifs and reversibly associates with Golgi non-clathrin-coated vesicles. The association further mediates biosynthetic protein transport from the ER via the Golgi up to the trans-Golgi network [23][24][25][26]. The mutation in ARCN1 results in phenotypes commonly observed in neurodegenerative disorders, such as abnormal protein accumulation, ER stress, and neurofibrillary tangles [27]. TCGA (The Cancer Genome Atlas) database (http://cancergenome.nih.gov/cancersselected/  glioblastomamultiforme) showed that ARCN1 expression was increased in glioblastomas. However, the exact mechanism of the ARCN1 gene in the development of glioma still needs further investigation. Nevertheless, it is worth noting that the three newly identified variants (rs7115634 in ARCN1, rs2236661 and rs494560 in PHLDB1) and the rs17748 SNP contributed to a cumulative risk effect for glioma susceptibility. Individuals carrying 6-8 risk alleles had a 1.93-fold increased risk of developing gliomas when compared with those who carried between 0 and 2 risk alleles, indicating the importance of combined effects from independent risk variants in the etiology of glioma.
The genetic variants within conventional regulatory regions, such as the 59UTR and the 39UTR, were given priority in most previous studies; however, accumulating evidence indicates the importance of intronic polymorphisms as markers of disease susceptibility [28][29][30]. It is possible that all identified SNPs (rs17748 in the 39UTR, rs7115634, rs2236661 and rs494560 in introns) in this region affect glioma risk by modulating PHLDB1 and ARCN1 expression levels or function. However, these hypotheses are based on speculation, and they need to be confirmed by biological assays in future studies.
In summary, through fine-mapping of the 11q23.3 region in a large sample of Chinese population, we identified a novel marker in the ARCN1 gene and two new variants in the PHLDB1 gene. These markers are individually related to the susceptibility to glioma in this population. Further functional evaluation and larger association studies with ethnically diverse populations are needed to elucidate the role of these causative or marker SNPs in the development of glioma.

Study Population
Using the same recruitment method described elsewhere [8,10,11,31], we recruited 983 patients with histopathologically confirmed gliomas and 1,024 healthy controls between October 2004 and July 2009 from the Department of Neurosurgery at Huashan Hospital, Fudan University (Shanghai, China). There were no restrictions on age, sex, or histologic type, but patients with a self-reported history of cancer and patients with previous radiotherapy or chemotherapy for unknown disease conditions were excluded. Additionally, diagnoses of potentially eligible cases were validated by trained abstractors who reviewed the pathologic and medical records of all cases to confirm that there were no undiagnosed occult primary tumors at the time of recruitment. All controls were frequency-matched to the cases by age (within 5 years), sex, and residential area (urban area or countryside). The controls were recruited from visitors to the trauma outpatient clinic and from persons undergoing annual check-ups at the same hospital. These controls had no known central nervous systemrelated diseases, no self-reported history of cancer at any site, and no history of radiotherapy/chemotherapy for unknown disease conditions. No evidence of demographic differences was found between the trauma outpatients and the annual check-up subjects. All cases and controls were from Shanghai and the surrounding provinces (Zhejiang, Jiangsu, and Anhui) in eastern China, and all were of Han Chinese ethnic background.
Written informed consent was obtained from all participants or from the patients' representatives. Each subject was interviewed face-to-face by trained personnel using a previously described questionnaire [11] to obtain demographic data, history of environmental exposure to ionizing radiation and overall health characteristics. After the interview, each subject provided 3-5 mL of venous blood. From all of the participants, blood samples and questionnaires were available for 983 cases and 1024 control subjects, representing a 92.6% and 88.2% of all eligible case and control subjects, respectively.

Selection of Tagging SNPs and Genotyping
CHB data were obtained from the International HapMap Project (http://www.hapmap.org) using the phase II Nov 08, on NCBI B36 assembly, dbSNP b126 and the phase III Aug 10, on NCBI B36 assembly, dbSNP b132. The minor-allele frequency (MAF) was $0.05, the Hardy-Weinberg equilibrium (HWE) cutoff was $0.05 and the call rate was $95%. Based on a blockbased tagging strategy using HaploView program 4.2, we targeted a 200 kb region (chr11:117882577-118082577, 200,000) that included the MLL (myeloid/lymphoid or mixed-lineage leukemia), TMEM25 (transmembrane protein 25), ARCN1 (archain), PHLDB1, and TREH (trehalase) genes. A total of 17 tagging SNPs were identified to capture (r 2 $0.8) all SNPs (119 SNPs) with minor allele frequencies of at least 5% in this region. Two SNPs (rs576950 and rs633308) could not be assayed using our technique. The previously reported rs498872 SNP was not included in this tagging SNP panel because of the overlap in patient populations between the two studies. However, rs498872 was well tagged by rs17748 in this study (r 2 = 0.826).
Genomic DNA was extracted from leukocytes using the Qiagen Blood Kit (Qiagen, Chatsworth, California, USA). Then, the genomic DNA was diluted to a final concentration of 15-20 ng/ml for the subsequent assays. Primers for amplification and extension reactions were designed using the MassARRAY Assay Design software, version 3.1 (Sequenom, San Diego, California). Polymorphism-flanking fragments were amplified by polymerase chain reaction. Genotyping was performed on the Mass-ARRAY iPLEX platform (Sequenom) through the use of an allele-specific matrixassisted laser desorption/ionization time-of-flight mass spectrometry assay [32] without knowing the case or control status. All assays were carried out in 384-well arrays, and 8 blank controls and 8 random duplicates were used for quality control. The results were more than 98% concordant between the duplicated samples. On average, 98% of the genotypes were successfully assayed for all SNPs.

Statistical Analyses
We used Fisher's exact test to test for deviations from Hardy-Weinberg equilibrium among the controls for each SNP. We used the x 2 test to examine the differences in demographic characteristics and frequency distributions of genotypes and alleles between the cases and controls. The most common genotype in the controls was used as the reference group. We performed unconditional logistic regression analysis with adjustments for age, sex, cigarette smoking and family history of cancer to calculate odds ratios and 95% confidence intervals of the estimates of the relative risk for each SNP and for multiple SNPs. All statistical tests were twosided. Akaike's information criteria (AIC) were employed to determine the best fitting model for each SNP [33].
We also evaluated the cumulative effects of the risk alleles, which were independently associated with glioma risk, by counting the total number of risk alleles per individual from the four independent risk variants (rs7115634, rs2236661, rs494560 and rs71148) (categories 0-2, 3, 4, 5, and 6-8). Statistical analyses were all performed using the SPSS17.0 software, unless indicated otherwise.
Pairwise LD parameter D' was calculated using the Haploview program (http://www.broad.mit.edu/mpg), and D' .0.8 was defined for 2 SNPs in strong LD [34]. The Haplo.Stats package which runs in R environment was used to infer the haplotypes of the genotyped SNPs using the ''haplo.glm'' function, and an additive model was assumed to estimate the haplotype specific OR with adjustments for possible confounding variables (i.e., age, sex).
The ''haplo.score'' function was introduced to calculate the global and haplotype-specific permutation P value (P sim , minimal simulation: 10,000 with a significance level less than 0.05) [35].

Ethics
Written informed consent was obtained from each participant, and the study was approved by the School of Life Sciences of Fudan University Ethics Board.