Sex- and Subtype-Specific Analysis of H2AFX Polymorphisms in Non-Hodgkin Lymphoma

H2AFX encodes a histone variant involved in signaling sites of DNA damage and recruiting repair factors. Genetic variants in H2AFX may influence risk of non-Hodgkin lymphoma (NHL), a heterogeneous group of lymphoid tumors that are characterized by chromosomal translocations. We previously reported that rs2509049, a common variant in the promoter of H2AFX, was associated with risk for NHL in the British Columbia population. Here we report results for 13 single nucleotide polymorphisms (SNPs) in 100 Kb surrounding H2AFX in an expanded collection of 568 NHL cases and 547 controls. After correction for multiple testing, significant associations were present for mantle cell lymphoma (p=0.007 for rs604714) and all B-cell lymphomas (p=0.046 for rs2509049). Strong linkage disequilibrium in the 5 Kb upstream of H2AFX limited the ability to determine which specific SNP (rs2509049, rs7759, rs8551, rs643788, rs604714, or rs603826), if any, was responsible. There was a significant interaction between sex and rs2509049 in the all B-cell lymphomas group (p=0.002); a sex-stratified analysis revealed that the association was confined to females (p=0.001). Neither the overall nor the female-specific association with rs2509049 was replicated in any of four independent NHL sample sets. Meta-analysis of all five study populations (3,882 B-cell NHL cases and 3,718 controls) supported a weak association with B-cell lymphoma (OR=0.92, 95% CI=0.86-0.99, p=0.034), although this association was not significant after exclusion of the British Columbia data. Further research into the potential sex-specificity of the H2AFX-NHL association may identify a subset of NHL cases that are influenced by genotype at this locus.


Introduction
Non-Hodgkin lymphomas (NHL) are a histologically diverse group of neoplasms of lymphoid origin that vary in severity of clinical behavior from indolent to very aggressive. NHLs can be broadly divided into tumors of B-cell or T-cell origin, and each of these can be further classified based on clinical features, pathology, histology and/or genetic indicators into one of over 40 subtypes [1].
A key characteristic of B-cell development is the creation of a diverse repertoire of immunoglobulin receptors able to mount an immune response to a vast assortment of foreign antigens [2]. This diversity is accomplished by three processes that alter immunoglobulin genes: V(D)J recombination, somatic hypermutation and class switch recombination (reviewed in 2). All of these processes require maintenance of DNA integrity and in particular, both V(D) J and class switch recombination require repair of double stranded DNA breaks [3]. Aberrant resolution of these breaks might lead to oncogenic chromosomal translocations by juxtaposition of genes that confer a growth stimulating or anti-apoptotic effect with DNA elements that lead to high or inappropriately timed expression in lymphoid cells [2]. It is therefore not surprising that reciprocal chromosomal translocations involving the immunoglobulin loci are a characteristic of many NHLs [2]. The tendency for some NHL subtypes to have translocations may imply an underlying defect in the cellular systems that protect against them, such as genes involved in DNA repair or surveillance for damaged DNA. Attenuation of the process of double-stranded break repair due to genetic variation in the genes involved may lead to increased translocations and influence NHL risk.
H2AX is a non-canonical histone which replaces 2-25% of histone H2A molecules that compact DNA into the nucleosome, the basic unit of chromatin organization (reviewed in 3). H2AX is involved in signaling the presence of double stranded breaks, recruiting DNA repair factors and preventing DNA breaks from progressing to translocations. In a process primarily mediated by activated ATM, double stranded breaks prompt phosphorylation of a highly conserved C-terminal serine residue that is unique to the H2AX histone [4]. Phosphorylated H2AX (γH2AX) recruits DNA damage repair factors to the sites of double stranded breaks and initiates a signal cascade that amplifies and expands the DNA repair signal (reviewed in 3). H2AX phosphorylation is such an integral part of the double strand break repair process that staining for γH2AX foci is a frequently used indicator for visualizing the sites of DNA damage within a cell. H2afx deficient mice have reduced DNA repair efficiency and show elevated levels of chromosome instability, DNA repair defects and tumorigenesis [5][6][7]. Specific to B-cell development, H2AX is required for efficient resolution of the double stranded breaks induced during class switch recombination [8] and stabilization of DNA strands to prevent progression to chromosome breaks during V(D)J recombination [9]. Significant roles for H2AX in both V(D) J and class switch recombination suggest that optimal H2AX function may be particularly important in preventing tumor formation in lymphoid cells.
We previously reported that a single nucleotide polymorphism (SNP), rs2509049, in the promoter region of the H2AX gene, H2AFX, was associated with NHL [10]. Specifically, rs2509049 was associated with translocationprone follicular (FL) and mantle cell (MCL) lymphomas, but not with diffuse large B cell lymphoma (DLBCL), consistent with a role for H2AX in prevention of translocations. Subsequently, other groups have reported that H2AFX genetic variants are associated with breast cancer [11], glioma [12] and DLBCL [13] but not with bladder cancer [14]. However it remains unknown which variant or group of variants at the H2AFX locus contributes to the risk of malignancy.
To explore, confirm and characterize the H2AFX association with NHL, we analyzed genotypes of 13 SNPs in a 100 Kb region surrounding H2AFX in constitutional DNA of an expanded collection of 568 NHL cases and 547 control individuals from the British Columbia (BC) population and in four independent NHL sample sets: 1) the Scandinavian Lymphoma Etiology study (SCALE), 2) a population-based case-control study in San Francisco (SF), 3) NHL patients and controls collected as part of the National Cancer Institute -Surveillance, Epidemiology and End Results study (NCI-SEER) in the United States and 4) a population-based case-control study of NHL patients in New South Wales and the Australian Capital Territory, Australia (NSW).

Ethics Statement
This study was approved by the joint University of British Columbia/British Columbia Cancer Agency Research Ethics Board and written informed consent was obtained from all participants.

Study population
The study population has been previously described [15,16], and the case and control samples genotyped in this study have been described in detail [17]. Briefly, DNA was obtained from 797 cases (20-79 years of age) and 790 controls frequency matched for age, sex and region (Vancouver or Victoria), collected as part of a population based study of NHL in British Columbia from March 2000 to February 2004.

Genotyping
Initially, 16 SNPs in a 100 Kb region encompassing the H2AFX region were selected for genotyping (Table S1). These included 8 tagSNPs [18] chosen to represent the variation in SNPs genotyped by HapMap in the CEU population, 3 SNPs chosen from literature reports of associations with H2AFX [10,11] and an additional 5 SNPs added to further saturate the regions 5 Kb upstream and downstream of H2AFX. One SNP failed Illumina genotyping design. The remaining 15 SNPs were genotyped at The Centre for Applied Genomics, at the Hospital for Sick Children in Toronto, Canada as part of a larger Golden Gate assay (Illumina, San Diego, CA) which has been described previously [17]. Genotypes were assessed using Genome Studio version 2009.1 (Illumina, San Diego, CA).
Prior to analysis, all SNPs and samples included in the assay were subject to extensive quality control previously described in detail [17]. SNP quality control included exclusions based on: GenCall score (< 0.25); GenTrain score (<0.4); poor or abnormal genotype clustering; discrepancies between 53 pairs of duplicate samples; poor call rates (<95%); and deviation from Hardy Weinberg equilibrium (HWE; p<0.001) in Europeanancestry controls. Although all 15 H2AFX SNPs met overall quality control requirements, 2 SNPs (rs28990980 and rs603826) were subsequently excluded from analysis (Table  S1). rs28990980 was excluded due to a very low minor allele frequency (0.002) in control samples; and rs603826, although passing multiple testing-corrected HWE cutoffs at the overall quality control stage, had an uncorrected HWE p value suggesting a departure from HWE (p=0.007). Examination of the sequence surrounding this variant revealed the presence of SNP rs10892330 within 3 base pairs of rs603826, which had not been recognized at the time of assay design. As this nearby SNP may interfere with Illumina probe binding and may be responsible for the observed deviation from HWE, rs603826 was excluded from analysis. Thus, after genotyping and quality control, 13 SNPs remained for analysis.
Sample quality control has been described previously [17]. It included: exclusions based on call rate (<0.98); exclusions based on discrepancies in sex and race between what was reported for a sample and what was supported by sample genotypes; and exclusions based on unexpected relatedness between samples revealed by SNP analysis [17]. These quality control measures resulted in exclusion of 176 samples leaving 1411/1587 samples remaining. One additional case was excluded from analyses due to diagnoses of both B-cell and Tcell lymphomas. All analyses reported here were restricted to 568 NHL cases and 547 controls (1115 samples) who reported that all four grandparents were of European-descent and for whom genotype data supported European ancestry [17].

Replication study populations and genotyping
The four study populations used to replicate findings have been described previously. All genotyping platforms used are highly accurate and cases and controls within each study were genotyped in an identical manner.
The Scandinavian lymphoma etiology study (SCALE) is a population-based case-control study of individuals (18-75 years old) collected in Denmark and Sweden between 1999 and 2002 [19]. rs2509049 genotypes were available for 4294 samples genotyped using Sequenom technology and SpectroTYPER RT3.4 software (Sequenom Inc., San Diego, CA) as described [20]. Samples (N=46) who did not report that both parents were born in Europe were excluded, leaving 1871 controls and 2376 NHL cases (2183 of B-cell origin) for analysis.
The San Francisco study (SF) is a population-based casecontrol study of individuals (20-84 years old) collected in the San Francisco Bay area between 2001 and 2005 [21]. Genotypes for rs2509049 were imputed using the BEAGLE 3.0.3 software [22] based on haplotype information from unrelated HapMap-II CEU samples. SNPs imputed with maximum posterior probability < 0.9 were set to missing and those with >10% missing rate were further excluded. The analyses reported here were limited to 737 controls and 664 cases who reported non-Hispanic white race and for whom genotype data supported non-Hispanic white race [21].
The National Cancer Institute -Surveillance, Epidemiology and End Results (NCI-SEER) study is a case-control study of NHL cases (20-74 years old) identified in Detroit, Iowa, Los Angeles, or Seattle SEER registries between 1998 and 2000 and population controls identified by random digit dialing random digit dialing (<65 years) and from Medicare eligibility files (>65 years) [23]. rs2509049 genotypes determined by Fluidigm technology (Fluidigm Corporation, San Francisco, CA) were available for 455 controls and 516 NHL cases. The analyses reported here were confined to 378 controls and 442 NHL cases (373 of B-cell origin) who self-reported non-Hispanic white race.
The New South Wales (NSW) study is a population-based case-control study of NHL cases (20-74 years old) identified in NSW or the Australian Capital Territory (ACT) between 2000 and 2001 and matched controls randomly selected from the NSW and ACT electoral rolls [24]. rs2509049 genotypes determined by Fluidigm technology (Fluidigm Corporation, San Francisco, CA) were available for 268 controls and 245 NHL cases. Analyses reported here were confined to 264 controls and 239 NHL cases (218 of B cell origin) who self-reported non-Hispanic white race.

Statistical analysis
BC cases of each NHL subtype were compared separately to all BC controls. Odds ratios (OR) and corresponding 95% confidence intervals (CI) were estimated by logistic regression performed with SVS Suite 7 (Golden Helix, Bozeman, MT). Pvalues for an additive model were calculated for a full model including the SNP of interest vs. a reduced model which accounted for age group (in 5 year increments), sex and region of residence; uncorrected p values are indicated in tables as p.
Full scan permutations carried out in SVS (10,000 permutations) were performed to account for multiple testing; corrected p values are indicated as p adj.
Since the association of H2AFX variants with glioma is reportedly stronger in males [12], we hypothesized that effect of H2AFX genotype on NHL risk may also be influenced by sex. To assess interaction with sex, the SNP with the most significant p value within each NHL subtype was chosen to represent the gene for that subtype [17] and was analyzed by logistic regression comparing a full model including sex*SNP as an interaction term, to a reduced model with sex, age group, region of residence and SNP. For subtypes in which this analysis was significant, the data was stratified by sex and logistic regression separately in the female and male strata (correcting for age group and region of residence) for all 13 SNPs.
Linkage disequilibrium in the cases and controls was determined using Haploview v4.2. Haplotype blocks were predicted with Haploview 4.2 using 95% confidence bounds on D' [25] with the following parameters: CI minima for strong linkage disequilibrium (LD) of 0.7-0.98, upper CI for strong recombination of 0.90, fraction of strong LD in informative comparisons of at least 95% and exclusion of SNPs with a minor allele frequency < 0.10. Haplotype frequencies in cases and controls were determined with SVS Suite 7 using the expectation-maximization method and logistic regression was performed for haplotypes with frequencies >0.01, as described for individual SNPs.
Analyses of independent study populations for replication were performed in R version 2.15.1 [26] on individuals of European ancestry or white race for those subtypes or groups for which at least two studies had genotypes for more than 100 samples: the DLBCL and FL subtypes and a group encompassing all B cell lymphomas. Study-specific ORs and 95% CIs were estimated by logistic regression, with P values for an additive model determined by comparing the full model to a reduced model which included study-specific variables described in Table S2. Heterogeneity between ORs from different studies was assessed using Cochran's Q test performed in with rmeta version 2.16 [27]. ORs without significant heterogeneity between studies (Q>0.10) were combined by meta-analysis under a fixed effects model. For analyses with significant heterogeneity in ORs between studies, a random effects model was used.

Results
The characteristics of the 568 NHL cases and 547 controls from the BC study population who met quality control criteria and were included in analyses are described in Table 1. Although controls were frequency-matched to cases by age, sex and region in the study overall, cases of European descent were more likely to be male, older and resident of Vancouver than controls of European descent.

Subtype-specific analysis
Subtype-specific association results for 13 SNPs within 100 Kb of H2AFX are summarized in Tables 2 and 3. rs2509049 was associated with the FL and MCL subtypes and with the all B-cell group; however, only the associations with MCL and the all B-cell group remained significant after correction for multiple testing. Additional SNPs in linkage disequilibrium (LD) with rs2509049 ( Figure 1) were also associated with the FL and MCL subtypes and the all B-cell group, with the most significant p values observed for the association of rs604714 with MCL. rs1804690, a SNP located more than 40 Kb downstream of H2AFX and not in LD with rs2509049, was associated with NHL in the DLBCL subtype and the all B-cell group, and remained significant after multiple testing correction in the all Bcell group. There were no associations with any of the SNPs for the MZL/MALT or T/NK cell lymphoma subtypes.
This BC dataset included 214 new cases and 164 new controls in addition to 354 cases and 383 controls for which results had been reported previously [10]. From the previously reported data, only samples for which genotypes were confirmed by Illumina, Golden Gate assay were included these analyses, this excluded 33 cases and 37 controls from our original report [10] for which there was insufficient sample remaining. Analysis confined to the new B-cell lymphoma cases (N=196) and controls (N=164) was significant only for rs1804690 (OR=0.39, 95% CI=0.22-0.69, p =0.0007) and not for rs2509049 (OR=0.93, 95% CI=0.67-1.29, p=0.662). Further subtype analyses confined to the new data were not warranted due to small sample sizes.

Sex-specific analysis
The SNPs with most significant p values in the DLBCL, FL and MCL subtypes were assessed for interaction with sex in that subtype. For the all B-cell group, four SNPs were assessed for interaction with sex: rs2509049, as it had the lowest p value in B-cell lymphomas, and rs1804690, rs7759 and rs604714 as they were being assessed in the subtype analyses (Table S3). No interactions were significant in the subtype analyses; however for the all B-cell group, rs2509049, rs7759 and rs604714 had significant interactions with sex (p=0.002, p=0.003 and p=0.015, respectively). The lowest ORs and most significant p values were observed for the rs2509049-sex interaction in the all B-cell group (OR_interaction=0.56, 95% CI=0.59-0.81, p=0.002). Cases and controls were therefore stratified by sex and assessed for association separately in males and females in the all B-cell group (Table 4). In males, rs1804690 was the only SNP significantly associated with B cell NHL, whereas in the females, 10 SNPs in the H2AFX region were associated with NHL with the most significant association being with rs2509049.

Haplotype analysis
To determine if there is a specific H2AFX haplotype associated with NHL, linkage disequilibrium was examined between the 13 SNPs in all cases and controls in the BC population ( Figure 1). Two haplotype blocks were predicted: Block 1, encompassing 2 SNPs in a 16 Kb region 3 Kb Table 2. Subtype-specific association results for 13 SNPs in the BC population.   Table 3. Subtype-specific association results for 13 SNPs in the BC population.

Table 3 (continued).
*. The sum of the genotypes is in some cases lower than the total number of samples for a subtype, because some samples failed Illumina genotyping for some markers.  downstream of H2AFX and Block 2, encompassing the 7 SNPs in the 6 Kb region surrounding and directly upstream of H2AFX that show the most significant association with NHL. As neither of the SNPs in Block 1 was associated with NHL, this block was not analyzed further. For Block 2, haplotype associations with B-cell NHL as a whole and in females only were assessed (Table S4). In neither analysis was any one haplotype more significantly associated with NHL than the individual SNPs.

Replication in independent study populations
To replicate the sex-and subtype-specific association of H2AFX SNPs, rs2509049 allele frequencies were examined in 4 additional independent sample sets of NHL patients and controls from studies in the InterLymph Consortium. Details on the samples and genotyping protocols for these studies are summarized in Table S2. The rs2509049 association results for the DLBCL and FL subtypes and the all B cell group for males and females combined and for the female subset alone in the validation study populations are shown in Table S5 and summarized in Figure 2. The BC population was the only study to show a significant protective effect for the A allele; furthermore, the NCI-SEER population showed an association in the opposite direction that was statistically significant for the female FL subtype. Meta-analysis combining all 5 studies supported a significant but weak protective effect for the A allele only in the all B-cell group, however, this association was not significant with exclusion of the BC population.

Discussion
Analysis of 13 SNPs within 100 Kb of H2AFX in the BC population supports previous reports [10,13] that variants in this region are associated with protection against B-cell NHL. In the BC population the association is confined to the FL and MCL subtypes and also appears to be sex-specific. Metaanalysis of a collective 3,882 NHL cases of B-cell origin from five populations showed a significant association with all B-cell lymphomas only in the combined male and female analysis, an effect that was not significant with exclusion of the BC sample set.
The BC population showed evidence that the association between H2AFX polymorphisms and B-cell lymphoma was sex-specific, present only in females and absent from the male subset. NHL has a higher incidence in males with an overall male to female incidence rate ratio of 1.6 for B-cell NHL [28] a phenomenon that may be due to a protective influence of female hormones on lymphomagenesis. Epidemiological studies reporting decreased NHL risk with increased parity [29,30], hormone contraceptive use [30,31] and hormone replacement therapy [32][33][34][35] provide support for this hypothesis. It is conceivable that female hormones directly or indirectly influence expression of H2AFX in an allele-specific manner. DNA repair capacity was noted to significantly decrease in cultured lymphocytes from females but not males older than age 48 [36]; however this finding was not replicated in a larger sample set [37] and no association with sex was seen with γH2AX response [37].
Sex-specificity was not evaluated in the previously reported associations between H2AFX and NHL [10,13], though the association between H2AFX variants and glioma in the Chinese Han population is reportedly stronger in male subjects [12]. Interestingly, the H2AFX association with glioma occurs in the opposite direction; the rs643788 A allele confers a protective effect for glioma [12], while our results suggest the G allele is protective for NHL. This phenomenon may be due to H2AFX promoter variants having opposing effects in different cell types, or differing roles for H2AX in development of these cancers.
It remains unclear whether there are subtype-specific associations between H2AFX and NHL. In the BC population, the association was significant only in the FL and MCL subtypes, though a trend toward reduced risk was also seen in DLBCL. Chromosomal translocations are found in 85-90% of FL [38] and nearly 100% of MCL tumors [39], but are less frequent in other NHL subtypes; they are present in 30-40% of DLBCL [40] and 10-50% of MZL [41]. The association of H2AFX genetic variants with translocation-prone lymphoma subtypes supports the hypothesis that H2AX is required for optimal resolution of double-stranded breaks introduced during B-cell development. Though the validation datasets only had sufficient numbers for a meta-analysis of DLBCL and FL subtypes, there was no evidence that the trend was stronger in the FL subtype. Furthermore, H2AFX polymorphisms were associated with protection against DLBCL in a Korean population [13] supporting the suggestion that the influence of H2AFX variants may extend to a variety of B-cell lymphoma subtypes.
Testing rs2509049, the SNP with the most significant effect in the all B-cell group, in four independent NHL patient collections did not replicate the observed association. This may indicate the observed association in the BC dataset was due to chance. However, the apparent sex-specificity of the H2AFX-NHL association may also provide an explanation for these differences. Differences in parity, use of hormonal contraceptives, postmenopausal hormone replacement therapy and exposure to estrogenic organochlorine pollutants between study regions could contribute to the differences observed. Fertility rates for the years 1980-85 are lower in Canada (1.63), Sweden (1.65) and Denmark (1.43) than in the USA (1.8) and Australia (1.91) [42]. Alternatively, although all replication populations were of European descent or white race, there may be undetected differences in genetic ancestry between studies that could explain the lack of consistency of association.
Though the effect was strongest in the 1.5 Kb immediately upstream of H2AFX, due to the high LD between SNPs in individuals of European ancestry, we were unable to determine which of the SNPs in this region (rs2509049, rs7759, rs8551, or rs643788), if any, is responsible. It is also possible that these SNPs are in LD with an undetected variant that is responsible for the association. The fact that haplotype analysis did not reveal an association more significant than that of individual SNPs, and that our previous resequencing of the H2AFX gene and upstream region in 95 NHL cases found no evidence for frequent rare mutations [10] make this explanation unlikely, unless the undetected SNP is either downstream or more than 1 Kb upstream of H2AFX. The lower LD between SNPs in this region in different ethnic populations may assist in determining which variant is responsible. For example, the Korean population has high LD (r 2 =0.86) between rs643788 and rs8551 but lower LD between these SNPs and rs2509049 (r 2 =0.78 and r 2 =0.79, respectively); an association with DLBCL in this population was significant for rs8551 and rs643788, but not rs2509049 [13], suggesting that rs8551 and/or rs643788 may be relevant functional variants.
As the variants most strongly associated with NHL are located just upstream of the H2AFX gene, it is tempting to speculate that they influence gene expression by impacting transcription factor binding and altering promoter efficiency. An inspection of the DNA sequence at these sites revealed that the rs643788 G allele disrupts a consensus binding site for Yinyang 1 (YY1) [43], a transcription factor capable of both activation and repression depending on cellular context [44]. Over-expression of YY1 is associated with tumor progression and poor outcome in NHL [45][46][47], consistent with a hypothesis that attenuated YY1 binding at H2AFX rs643788 is associated with reduction in cancer risk. However, other studies have made different predictions regarding the influence of these variants on binding site capacity: rs643788 is predicted to disrupt CJUN [11]; rs8551 is predicted to influence insulin activator factor [11] and CAP1 [13] binding; and the rs7759 G allele is predicted to disrupt a progesterone receptor binding site [11]. The latter prediction is particularly intriguing given the sex-specificity we report. Functional studies are required to determine which of these binding site predictions are supported by experimental evidence, and whether altered protein binding at these sites influences H2AFX gene expression.
rs1804690 was found to be associated with lymphoma risk in the all B-cell group. This association appears to be driven by a protective effect of the minor A allele in DLBCL and FL subtypes and is not sex-specific. As rs1804690 is more than 40 Kb downstream of H2AFX and not in LD with variants in the H2AFX region, it is unlikely (though not out of the question) that this association reflects an impact of rs1804690 on H2AFX expression or function. rs1804690 is a synonymous SNP located within exon 13 of the HYOU1 gene. It may influence regulation of HYOU1 or other genes in the region or be in LD with a variant that does so. HYOU1 encodes Hypoxia Upregulated 1, an oxygen regulated protein that may act as a molecular chaperone required in the cellular response to hypoxia. HYOU1 overexpression has been reported in breast [48] and colorectal cancer [49] tumors, and is associated with poor prognosis and metastasis into the lymphatic system [49]. Further research into the possible association of rs1804690 and B-cell NHL and a potential role for HYOU1 in lymphomagenesis would be required to confirm this relationship.

Conclusion
The combined results of 5 different NHL case-control studies suggest that overall DNA polymorphisms at H2AFX have a weak but significant association with NHL of B-cell origin; however this effect was largely driven by the BC sample set. The significant result in the BC dataset may be a spurious finding or suggest this variant has an impact unique to the lifestyle factors or genetic background in this population. Given the biological importance of H2AFX in cancer, further research is warranted to understand the effects of genetic variation at this locus, both functionally and in human populations. Table S1.

Supporting Information
SNPs selected for genotyping in the BC population. (XLSX)