Genetic Variation in DNA Repair Pathways and Risk of Non-Hodgkin's Lymphoma

Molecular and genetic evidence suggests that DNA repair pathways may contribute to lymphoma susceptibility. Several studies have examined the association of DNA repair genes with lymphoma risk, but the findings from these reports have been inconsistent. Here we provide the results of a focused analysis of genetic variation in DNA repair genes and their association with the risk of non-Hodgkin's lymphoma (NHL). With a population of 1,297 NHL cases and 1,946 controls, we have performed a two-stage case/control association analysis of 446 single nucleotide polymorphisms (SNPs) tagging the genetic variation in 81 DNA repair genes. We found the most significant association with NHL risk in the ATM locus for rs227060 (OR = 1.27, 95% CI: 1.13–1.43, p = 6.77×10−5), which remained significant after adjustment for multiple testing. In a subtype-specific analysis, associations were also observed for the ATM locus among both diffuse large B-cell lymphomas (DLBCL) and small lymphocytic lymphomas (SLL), however there was no association observed among follicular lymphomas (FL). In addition, our study provides suggestive evidence of an interaction between SNPs in MRE11A and NBS1 associated with NHL risk (OR = 0.51, 95% CI: 0.34–0.77, p = 0.0002). Finally, an imputation analysis using the 1,000 Genomes Project data combined with a functional prediction analysis revealed the presence of biologically relevant variants that correlate with the observed association signals. While the findings generated here warrant independent validation, the results of our large study suggest that ATM may be a novel locus associated with the risk of multiple subtypes of NHL.

Associations between DNA repair genes and lymphoma risk have been reported previously [15][16][17][18][19], however the results in many of these studies either failed to reach the necessary level of statistical significance, or lacked independent validation. In the present study, we attempted to improve on these prior efforts by performing a two-stage case-control analysis of 1,297 NHL cases and 1,946 controls to identify associations between 446 SNPs tagging 81 DNA repair genes and NHL risk. The two-stage design, thorough selection of DNA repair genes, assessment of genetic interactions, and identification of putatively functional variants by using public genomic and expression data are among the major innovations in our study, which provides yet another focused exploration of the role of DNA repair pathways in genetic susceptibility to NHL.

Ethics Statement
All cases were ascertained through Memorial Sloan-Kettering Cancer Center (MSKCC) IRB-approved protocols, or a protocol approved by the IRB at the Dana Farber Cancer Institute (DFCI) or Hadassah-Hebrew University. These protocols required written informed consent either for identified use of specimens for research into the genetic basis of lymphoma, or research use of specimens permanently de-identified prior to genotyping. Controls were part of the New York Cancer Project (NYCP) and all subjects gave written consent for use of samples in genetic studies of any disease state.

Study population
In total, the study involved 1,297 non-Hodgkin's lymphoma (NHL) cases from the combined resources at MSKCC, DFCI and Hadassah-Hebrew University, Israel as well as 1,946 controls collected from the NYCP, a study of 18,000 New York City residents originally designed to assess the role of environment and genetics in cancer risk, and described previously elsewhere [29][30][31]. The NYCP data include age, gender, history of cancers (including lymphoma) and ethnicity. A subset of NHL cases (n = 222) were probands from families with a strong family history (FH) of NHL, described in detail recently [32]. The remaining fraction of NHL patients (n = 1,075), were unrelated and unselected for FH. All cases and controls were of white European ancestry, with a fraction of cases (n = 534, 41.2%) and controls (n = 1,043, 53.6%) of self reported Ashkenazi Jewish (AJ) ancestry. In this study we have employed a two-stage design; the discovery stage (stage 1) consisting of 650 cases and 965 controls, and the replication stage (stage 2) involving 647 cases and 981 controls. The detailed structure, demographic, and clinical information of case/control populations in both stage 1 and 2 is summarized in Table S1. A subset of the patient collection in this study was previously included in a GWAS on lymphoma susceptibility [9]. Of the 944 lymphoma cases that constituted the GWAS phase of that prior study, 515 cases (39.7%) overlap with the 1,297 patients included here. While 1,043 controls (53.6%) overlap between the current study and the validation stage of the prior GWAS, there was no control overlap between this study and the GWAS discovery stage.

Selection of genes and tagging SNPs (tSNPs)
The selection of candidate DNA repair genes was performed as summarized in Figure S1. The initial subset of genes (n = 34) has been identified for their known role in DNA repair processes and queried for their catalytic activities from Gene Ontology (GO) [33] and KEGG [34]. The key networks of DNA repair defined by the catalytic domains in the seed list were further passed to: 1) GO search for genes containing identified catalytic activities and 2) yeast proteome database [35] identifying yeast homologues with experimental evidence demonstrating their effect on UV sensitivity, radiation, and DNA damage response. The yeast genes were subsequently queried for human homologues. The targets from 1) and 2) were crossed for gene overlap and passed to an interactome analysis (GeneGO, Ingenuity) to query interacting partners (defined by at least two independent reports, and confirmed by at least two experimental methods). After merging, 87 DNA repair genes were identified for the study (Table S2). The SNPs tagging 87 selected genes were chosen using Haploview with a haplotype Pearson's correlation coefficient (r 2 ) threshold ,0.6 across selected gene regions (including 5 kb from 39 and 59 UTR), and minor allele frequency (MAF) .0.05. The tagging SNP selection has been performed using the CEU data (120 individuals) of HapMap Phase II, the most accurate resource available for this purpose at the time of the study design and still serves as the most validated reference for capturing the common genetic variation in European populations. In total, 531 tagging SNPs (tSNPs) were selected to tag the 87 DNA repair genes.

Genotyping
For stage 1, the genotyping of 531 tSNPs from 87 selected DNA repair genes on 698 lymphoma cases and 1,041 controls was conducted using Sequenom MassARRAY iPLEX (Sequenom Inc., CA), multiplexed into a 16-plex design as per manufacturer's protocol and as described previously [30]. For quality control (QC), duplicates (8 per each 384-well plate) showed .99% concordance and non-template controls (2 per plate) revealed no evidence of cross-contamination. Thirty-four SNPs were excluded due to low genotyping rate across samples (,85%), poor clustering, or significant departure from Hardy-Weinberg Equilibrium (p,0.001 in control population); an additional 51 SNPs were dropped due to low MAF (,0.05) in our study population. Forty-eight cases and 76 controls were dropped due to low genotyping rate (,85%). After QC, in total 446 SNPs, tagging 81 DNA repair genes and n = 650 cases and n = 965 controls remained for the association analysis in stage 1. Twenty-eight SNPs associated with NHL (p,0.05) from stage 1 were passed to the validation analysis in stage 2 on an additional 684 cases and 1,042 controls. In order to perform downstream haplotype and imputation analyses, in stage 2 we also included 81 additional SNPs tagging the genes captured by the 28 significant loci in stage 1. A total of 109 SNPs were passed to the replication analysis in stage 2 using a re-plexed Sequenom design (iPLEX). While all 109 SNPs passed the QC in stage 2, 37 cases and 61 controls were dropped due to low genotyping rates across SNPs (,85%), resulting in the genotyping data on 109 SNPs in 647 cases and 981 controls. Data can be made available to other researchers: please contact the authors for details.

Statistical association analyses
Single SNP associations with NHL risk were tested using a logistic regression model under a per-allele test, calculating odds ratios (OR) and 95% confidence intervals (95% CI), adjusted for age, gender and AJ status. The associations were analyzed for stage 1, stage 2, and the aggregate sample set including both stages. In the aggregate sample set we also used a per-allele logistic regression model to test fifteen SNP associations within the three most common NHL subtypes in our dataset: diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL) and small lymphocytic lymphoma/chronic lymphocytic leukemia (SLL/ CLL). Quantile/quantile (Q/Q) plots were produced using ggplot2 in R, and inflation factors (l) were calculated based on 90% of least significant SNPs. Statistical analyses were performed using PLINK [36]. The main analysis examining single SNP associations with NHL risk and the associations among subtypes were controlled for multiple testing using Bonferroni adjustment. The Bonferroni level of significance was defined as p,0.000154, accounting for 247 independent SNPs in stage 1, 33 independent SNPs in stage 2, and 15 SNPs tested among three NHL subtypes (number of tests = 247+33+(15*3) = 325; p = 0.05/ 325 = 0.000154). Independent SNPs were defined under Pearson's correlation coefficient (r-square) ,0.5 calculated among our sample population.
Haplotypes were visualized by Haploview and haplotype associations were performed by logistic regression analysis adjusted for age, gender, and AJ status. We have also tested the homogeneity of the odds ratios between these three major subtypes by calculating the Breslow-Day statistics for each SNP.
In order to explore possible epistatic interactions associated with NHL risk, logistic regressions were modeled by adding an interaction term between the genotypes of each SNP pair. First, pairwise comparisons were performed between each SNP from the list of fifteen associations with NHL (105 tests), followed by pairwise comparisons of fifteen SNPs with the additional 94 SNPs included in the aggregate analysis (1,410 tests, Bonferroni adjusted p-value: 0.05/(105+1410) = 3.3610 25 ). This analysis was performed using PLINK [36]. For the targeted analysis of the MRN complex (MRE11A, NBS1), the multiplicative two-way gene-gene interactions were estimated using multiple logistic regression models. For each SNP pair, a logistic regression model was built to test case/control association based on the indicator variables (sex, age and AJ status) and the 2-SNP variable, for a total of 5 variables and an intercept. The 2-SNP variable was defined separately under three genetic models, based on the number of risk alleles in individual subjects.
For the assessment of SNP-gene association by incorporating expression information (expressed quantitative trait loci -eQTL) we used Genevar [37]. The eQTL associations were calculated by Spearman's rank correlation tests.

Imputation and functional prediction
The imputation of genotypes from 1,297 NHL cases and 1,946 controls was performed using IMPUTE2 [38] with a reference panel consisting of the 1,000 Genomes Project (1KG) data freeze from November 2010 for low-coverage genomes, May 2011 for high-coverage exomes, and the phased haplotypes released March 2012 (n = 1,092 individuals). Imputed loci were filtered based on the quality metric score (info score) .0.7 [39], chosen based on inflation estimates from a Q/Q analysis; the imputed SNPs with info score ,0.7 showed significant inflations in our data ( Figure  S2). The functional annotation of the associated tagging SNPs and their correlated imputed SNPs was performed by ANNOVAR [40], focusing on 8 functional categories: coding regions, conserved transcription factor (TF) binding sites, TF binding sites based on ChIP-Seq data (using ENCODE database), enhancer sites based on H3K4me1 chromatin marks (using ENCODE database), DNase I hypersensitivity clusters (using ENCODE database), known CNVs, and 39 UTR, and 59 UTR.

Sample population
The demographic and clinical composition of our sample population is summarized in Table S1. No significant difference was observed in the distribution of demographic variables or lymphoma subtypes between stage 1 and stage 2. Also, no significant difference between demographic characteristics or subtype distribution has been noted between cases with FH and NHL cases unselected for FH. However, there was a difference in the proportion of age, AJ ancestry and gender between the NHL cases and controls. Hence, age, AJ status and gender were used as covariates in all subsequent statistical analyses.

Single SNP associations with NHL risk
In this study we applied a two-stage design: in stage 1 we performed the association analysis on 446 SNPs tagging 81 DNA repair genes in the population of 650 cases and 965 controls. There was no significant inflation in observed versus expected associations (l = 1.07), indicating no detectable genotyping arti-facts or population substructures impacting our findings ( Figure  S3).
In the aggregate analysis of stage 1 and 2, fifteen SNPs showed associations with NHL risk (Table 1). These included two SNPs in the ATM locus: rs227060 and rs611646 (OR = 1.27, 95% CI: 1.13-1.43, p agg = 0.00007; OR = 1.26, 95% CI: 1.12-1.43, p agg = 0.00015, respectively, where p agg is a p-value from aggregate analysis). Importantly, the associations for both loci remained significant after Bonferroni correction, as detailed in Material and Methods (p adj = 0.022; p adj = 0.049, respectively, where p adj is the Bonferroni corrected p-value for each SNP). The two SNPs show incomplete LD (r 2 = 0.642, Figure S4). In order to test whether the associations in ATM were independent we conditioned the analyses by the status of rs227060 and found rs611646 and rs419716 no longer significant (data not shown), suggesting the association signals observed for ATM SNPs are correlated. No other SNP associations in the aggregate analysis passed Bonferroni correction. To further explore the structure of associated loci, we also examined common haplotypes (MAF. 0.05) for association with NHL risk. The strongest risk effect has been observed for a haplotype in ATM. Other loci have also shown specific haplotypes associated with NHL risk, however none were more significant than the associations from single SNP analyses (Table S4). As our study population includes a large fraction of AJ ancestry, we have investigated potential association differences between AJ and non-AJ samples. The majority of the 109 SNPs (n = 94) in the aggregate analysis show no more than a 5% difference in minor allele frequency (MAFs) between AJ and non-AJ subsets ( Figure S5). Three SNPs (rs4150606, rs7149962, rs7562048) that present with .5% difference in MAF also show association with NHL. The analysis stratified by AJ status indicated that most associations, including our two most significant ATM SNPs, appear to be largely driven by non-AJ subsets which are predominant in our case population (Table S5). However, for other SNP associations such as rs4150606 (GTF2H1), rs4952887 (MSH2), and rs702019 (POLQ), both AJ and non-AJ subsets contribute to the association signal. Therefore, all analyses in the study were also adjusted by AJ status.

The associations of DNA repair genes with NHL subtypes
We have further tested the associations identified in the NHL pooled analysis among NHL subtypes on fifteen SNPs that associated with overall NHL risk in the aggregate analysis. We focused on the three most common NHL subtypes in our study population in order to maintain analytical power: diffuse large Bcell lymphomas (DLBCL), follicular lymphomas (FL), and small lymphocytic lymphoma/chronic lymphocytic leukemia (SLL/ CLL). As shown in Table 2, associations were identified for the three ATM tSNPs in DLBCL (n = 412), with the strongest effect for rs611646 (OR = 1.37, 95% CI 1.14-1.64, p = 0.0008). Associations for rs611646 and rs227060 were also observed in SLL/CLL (n = 164). No ATM tSNPs were associated with risk in the FL subset (n = 301). In contrast, the strongest effects in FL was observed for CHEK1 (rs565416, OR = 0.71, 95% CI: 0.58-0.87, p = 0.001) and TDP1 (rs7149962, OR = 1.64, 95% CI: 1.14-2.35, p = 0.007), which were not associated with DLBCL or SLL/CLL. In SLL/CLL, the strongest association effect was found for rs4150606 tagging GTF2H1 (OR = 0.51, 95% CI: 0.39-0.69, p = 4.84610 26 ); this association was not seen in FL or DLBCL sub-analyses. The association of rs4150606 with SLL/CLL remains significant after Bonferroni correction for multiple testing (p adj = 0.0016). The SNP/MAF plot between AJ and non-AJ ( Figure S5) identified rs4150606 as an outlier. Despite the MAF difference it appears that both AJ and non-AJ ancestries contribute to the observed association risk effect of rs4150606 (Table S5).
We have also tested the association heterogeneity among the three major subtypes in our analysis of fifteen SNPs ( Table 2). The Breslow-Day test results showed an association for rs227060 (p = 0.037), indicating heterogeneity in the odds ratios between the DLBCL, SLL/CLL, and FL. Heterogeneity has also been observed for an additional two SNPs with subtype-specific associations: rs4150606 (GTF2H1, p = 0.001) and rs565416 (CHEK1, p = 0.006).
The epistatic SNP-SNP interactions in DNA repair genes and NHL risk Using the additive model in pairwise SNP-SNP interaction analysis, we found several associations with NHL risk among the top fifteen SNPs from the aggregate analysis. However, none of these associations survived the adjustment for multiple testing (Table S6). Nonetheless, we noted several interactions of variants in MRE11A with SNPs in NBS1 associated with NHL risk. Interestingly, both genes biologically interact in the MRN complex, a centerpiece of double strand break repair machinery, which prompted us to examine these interactions more closely using different genetic models. While all interactions involve rs1805812 in NBS1, for MRE11A there are four different variants which contribute to these associations (rs10831227, rs625245, rs607974, rs557148). After examination of the four pairwise associations (rs1805812 in NBS1 with each of the four MRE11A SNPs) using three genetic models (Models 1-3 detailed in Table  S7), we focused on two pairwise interactions with the strongest effects, rs1805812 (NBS1) x rs625245 (MRE11A) and rs1805812 (NBS1) x rs607974 (MRE11A). As shown in Table 3, the strongest interaction association was observed for rs1805812 x rs607974 under Model 3, with the strongest effect for heterozygotes or homozygotes for the minor allele on both gene loci (MRE11A, NBS1). The association observed for rs1805812 x rs625245 also shows the strongest effect under Model 3 (Table 4). Interestingly, both of these interactions replicate independently in both stage 1 and 2. Imputation and functional predictions using data from the 1,000 Genomes Project In order to identify the SNPs with putative functional impact, we imputed all associated loci from the aggregate analysis using data from the 1,000 Genomes Project (1KG; described in Material and Methods). As seen in Figure 1, the association analysis of imputed data did not yield association effects that were stronger than those observed in the analysis of genotyped SNPs. The imputation, however, identified variants that correlate with genotyped SNPs and show an association effect comparable with single SNP analyses of aggregate data (Figure 1). To investigate possible biological implications of these associations we tested the imputed SNPs using ANNOVAR (Materials and Methods). While only one non-synonymous SNP was found among the imputed variants (rs1381057 in POLQ), other imputed SNPs map within high-impact regulatory regions. In the ATM locus we found numerous putatively functional variants strongly correlated with associated tSNPs (Table 5). Of these, rs228594 merits particular attention; it maps in a DNase I hypersensitivity cluster as well as within junD and FOSL2 binding sites, providing a rationale for future molecular exploration. The detailed list of imputed SNPs with predicted functional impact is in Table S8.

eQTL analysis
Using Genevar [37] we investigated eQTL associations for the ATM region by examining publicly available data collected from lymphoblastoid cell lines (LCL), T-cell lines (TCL), and fibroblast cell lines (FCL) derived from 75 individuals of European ancestry [41]. Out of the three ATM SNPs associated with NHL risk in our data, the most significant eQTL effect was observed for rs227060 in LCL based on data from probe ILMN_1716231, showing that reduced expression correlates with the risk allele (genetic correlation coefficient rho = 0.284, p = 0.0136; Figure S6). Interestingly, for the probe ILMN_1716231, the association of rs227060 with ATM expression was the second most significant eQTL in the region (Figure 2).

Discussion
Extensive published data suggest that DNA repair pathways are associated with lymphoma susceptibility [26,27]. The rare syndromes attributed to inherited mutations in DNA repair genes, such as ataxia talangiectasia (A-T; mutations in ATM) or Nijmegen breakage syndrome (NBS; mutations in NBS1) manifest with early onset lymphomas of various histological subtypes [42,43]. DNA repair plays a central role in B-cell development in germinal centers of primary and secondary lymphoid organs via V(D)J recombination, which is regulated by the genes involved in double strand break repair, mainly in non-homologous end-joining (NHEJ) [26]. The mouse knockout models of different components of NHEJ have serious defects in the V(D)J recombination process and manifest with increased incidence of B-cell specific lymphomas [26,44,45]. Also, inherited immunodeficiency syndromes (e.g. SCID), which are often associated with early onset lymphomas [26] are due to germline mutations in NHEJ genes. This and other evidence points to the importance of DNA repair genes in lymphomagenesis and suggest that these pathways are putative candidates in the susceptibility to lymphoma. In this study we tested common genetic variation in DNA repair genes for its role in susceptibility to NHL. The two-stage design, the thorough computational selection of DNA repair targets, the examination of potential epistatic effects of the associated loci, and the suggested functional implication of associated variants are among the major enhancements of our study design compared to prior efforts.    Table 4. Association analysis of rs1805812 x rs625245 interactions between MRE11A and NBS1 with the risk of NHL. In this study, we report the most significant association with risk of NHL for two SNPs in the ATM locus, remaining significant after adjustment for multiple testing (Bonferroni). Although among FL patients there was no association observed, ATM SNPs did correlate with risk of DLBCL and SLL/CLL. While the FL subset was the second largest in the study (n = 301), many other subtypes were underrepresented (n,100) and hence it was not possible to assess risk among these smaller groups due to the limited power (although a suggestive, yet non-significant effect was observed among Mantle Cell lymphomas; rs611646, OR = 1.45, 95% CI: 1.03-2.05, p = 0.035). It is also important to note that the associations of ATM SNPs are stronger among the pooled NHL analysis (all subtypes) compared to separate associations with DLBCL or SLL/CLL. This evidence suggests for the first time that ATM is a putatively novel candidate NHL susceptibility locus.
In our study we have noted association differences in stratified analyses by AJ status (Table S5). This relates to the two most significant ATM SNPs, which show the significant risk effect only in non-AJ population. While the frequency of the risk allele of both SNPs appears to be similar among AJ and non-AJ cases (for rs227060 MAF = 40%, for rs611646 MAF = 48%), the MAF in AJ and non-AJ controls differs by approximately 4% (for rs227060 MAF = 37% and 33% respectively, for rs611646 MAF = 45% and MAF = 42% respectively). While there is a possibility of underlying genetic substructure [46,47], we believe reduced power of the AJ stratified sub-analysis is the most plausible explanation for the observed differences in the risk effects between AJ and non-AJ subsets, as AJ cases are underrepresented by ,43% compared to non-AJ cases in the study. It is likely that despite the differences in allele frequency of both SNPs in AJ, the risk would be detected by increasing the number of cases to a comparable size of the non-AJ subset. The issue of power reduction contributing to the observed association differences between AJ and non-AJ is further supported by the same directionality of odds ratios in both AJ and non-AJ subsets and aggregate analysis. To explore both possibilities in detail, a larger validation analysis and possibly the detailed fine mapping of the ATM locus in AJ as well as non AJ populations will be needed.
Prior evidence linking ATM with lymphoid malignancies has been largely restricted to the somatic level; ATM somatic mutations were noted in particular lymphoma subtypes of DLBCL [48], CLL [49][50][51][52], and MCL [53,54], consistent with the results of the subtype-specific analysis in our data. Rare germline variants in ATM (MAF,0.05) have previously been associated with CLL susceptibility [25], however as the current study focused on common variants (MAF.0.05), these SNPs are not strongly correlated with tSNPs in our study (r 2 ,0.1). The study by Sipahimalani, et al. performed an extensive analysis exploring the association of six common genetic variants in ATM with lymphoma risk [55]. Notably, while the least significant ATM SNP associated with NHL risk in our data, rs419716, is strongly correlated with one of the SNPs from that prior study, rs664982 (r 2 .0.9), no association was reported for rs664982 by Sipahimalani, et al. [55], although the directionality of the effects for both SNPs is similar. The smaller sample size (798 cases, 793 controls) of this prior report, together with a more heterogeneous population (.15% of Asian ancestry) likely explains the different association outcomes [55]. The different distribution of NHL subtypes in the study by Sipahimalani, et al. provides another possible explanation. While our strongest signals appear to be driven by associations of ATM SNPs in DLBCL and SLL/CLL, the study population from Sipahimalani, et al. had a much higher proportion of FL, for which we did not observe an association in our study. DLBCL specific GWAS studies have been previously Figure 1. The results of association analysis, displayed as Manhattan plot, after imputation of 28 SNPs genotyped in both stage 1 and 2, which tag 11 DNA repair genes that showed association with NHL risk in our study. The SNPs and genes are ordered by chromosomal position (x-axis). The associations are displayed as -log 10 (p-value) for each SNP. Red dots represent fifteen tagging SNPs that were genotyped in our study and were associated with NHL risk. Green dots represent tagging SNPs that were genotyped in our study and that showed no association with NHL. Blue markers represent SNPs imputed by IMPUTE from 1KG. The red dotted line defines the threshold of p-value ,0.05. * indicates an associated SNP with a putatively functional impact; non-synonymous coding change or SNP mapping in: transcription factor binding site, H3K4Me1 chromatin mark, DNaseI hypersensitivity cluster, 59UTR, 39UTR. doi:10.1371/journal.pone.0101685.g001 Table 5. Putatively functional SNPs imputed from the ATM locus associated with the risk of NHL. reported [7,12], and most recently a large GWAS by international consortia has identified novel loci in apoptotic pathways associated with the risk of SLL/CLL [11]. While ATM was not among the reported associations in these prior studies, these reports focused only on the loci that passed the threshold of genome wide level of significance. For independent confirmation, a separate deeper analysis of this published data will be needed in the follow up study to validate the potential association effect of ATM with SLL/CLL and DLBCL in the large populations studied in these scans. Although the replication of our findings in NHL subtypes will be critical as part of the prior and upcoming GWAS studies from large consortia, it will also be very important to test the association effect of identified ATM variants in pooled NHL population, where we observed the most significant associations in our analysis. Such rationale is particularly relevant given the critical role of DNA repair pathways (with ATM as an important cell cycle checkpoint) in early development of progenitor B and T-cell lineages [56][57][58][59] via the process of V(D)J recombination producing diverse immune repertoire [60]. Because of the common origin of these precursor cell populations among different NHL subtypes, it is reasonable to hypothesize that variation in DNA repair networks in these progenitor cells could confer risk effects that are shared among multiple NHL subtypes, as we have suggested in our most recent GWAS scan [9] and as also discussed previously [7]. Additionally, double-stranded DNA break and non-homologous end joining repair mechanisms have been implicated in the occurrence of chromothripsis [61], which has recently been observed in both CLL and DLBCL [62][63][64] and proposed along with ''chromoplexy'' [65] to be involved in a punctuated cancer evolution. ATM is a critical checkpoint impacting both homologous recombination and non-homologous end joining [66]. It is possible that the inherited genetic variants in ATM may affect the capacity of DSB repair in a way that would associate with the patterns of specific large-scale genomic alterations and rearrangements in a subset of cells in the tumor due to yet unknown genetic or microenvironment modifiers. Although in the context of this study it is highly speculative, given the associations with the ATM locus observed here, such a biological scenario is an attractive possibility and should be further investigated in detail on the somatic level of lymphomagenesis. Numerous other studies have previously examined the germline variation in major DNA repair genes for their association with lymphoma risk [15][16][17]19]. However, mostly due to the limited selection of candidates, limited power or lack of independent validation, the results were of marginal significance. These borderline associations included the variants in nucleotide excision repair proteins, such as XRCC1 and XRCC2 [17], cell cycle protein BLM [19], and MGMT, a gene involved in DNA repair of alkylation damage [17]. We examined those previously associated loci, which had perfect proxies in our data and observed a borderline association in stage 1 for a proxy of rs1799782 in XRCC1 [17], (rs3213344, OR = 1.44, 95% CI: 1.03-2.01, p = 0.03), however this SNP did not replicate in our stage 2. MRE11A and NBS1, which were associated with NHL risk in our study, were examined in a previous report on a small subset of NHL population [18]. Although some marginal associations with NHL risk were observed, the SNPs reported in that previous study were in weak correlation with the variants in MRE11A and NBS1 associated with lymphoma risk in our analysis.
One novelty of our study design compared to prior efforts is the exploratory assessment of potential epistatic effects in DNA repair networks contributing to lymphoma risk. Our data suggest that the interactions between several genetic variants in MRE11A and NBS1, critical components of the MRN complex, associate with increased risk of NHL. Interestingly, these findings show that the carriers of at least one copy of the minor (protective) allele inherited in both MRE11A and NBS1 are at more than a 2-fold reduced risk of developing NHL compared to the single SNP effect of each locus separately. While these findings are novel they are also exploratory, given the reduced power of this analysis and the fact that none of these observations reached the significance threshold adjusted for multiple testing. Importantly however, these epistatic interactions were observed in both stage 1 and 2 separately (Table 3 and Table 4) suggesting that the observed SNP-SNP associations merit further attention. Looking forward, the addition of other critical checkpoint DNA repair genes (such as ATM or CHEK1/2) to the MRN interaction model found in our study would be of interest. This analysis, along with the independent validation of epistatic and single SNP associations identified here, will need to be performed in a large consortium as part of the analyses following up on recent reports [11,67].
Our study provides a suggestive link between the associated SNPs and putatively functional variants to be pursued in subsequent molecular studies. By imputing our data from the public resources of 1KG we have identified several functional variants highly correlated with the SNPs associated with NHL risk in our study. Most notably, several of the variants in ATM were located within multiple functional regions, as annotated utilizing ENCODE data from lymphoblastoid cell lines. For example, rs228594, rs228599, and rs189037, which correlate with the associations observed in ATM, map within known transcription factor binding sites as well as chromatin marks, DNase I hypersensitivity clusters, and 59 UTR, strongly supporting a possible impact on expression regulation.
The eQTL association with our top SNP, rs227060, was the second most significant eQTL association within the ATM region, and was detected in lymphoblastoid cell lines but not in fibroblast controls [41] (Figure 2). The observed eQTL effect shows the reduced expression of ATM correlating with the dosage of rs227060 risk allele. The reduction of ATM expression has been strongly linked with radiosensitivity and defective DNA damageinduced ATM-dependent signaling in various experimental studies, and was clearly shown to promote the tumor growth in lymphoma and other cancer models [68]. However, despite the potential biological implications of these associations, more detailed molecular investigations will be needed to link the imputed loci with lymphomageneis. Nonetheless, the in-silico functional predictions as presented here can substantially improve subsequent fine mapping strategies of associated SNPs. At the same time this approach can reduce the need for a large scale re-sequencing by functional prioritizing the target variants for further molecular investigation.
Besides ATM, other loci also showed association effects in pooled NHL or subtype-specific analyses. As power limitations of our study may also be a concern, subsequent validation of these findings should be performed in large consortia. These future studies, with the concomitant collection of epidemiologic data and clinical characteristics, will also allow for a more in-depth analysis of potential gene-environment interactions attributed to DNA repair pathways. At the same time, the utilization of data from completed and ongoing lymphoma GWAS will be critical for the assessment of other interacting molecular pathways that may define the complex genetic susceptibility to NHL. For example, the HLA locus has been consistently replicated as a low-penetrant allele in recent NHL GWAS and follow-up meta-analyses [7,8,10,11,32,69]. It has been shown that the innate immunity pathways are closely connected with particular DNA repair networks (e.g. B-cell maturation in germinal centers) [58,70], suggesting that detailed exploration of such interactions will be important. Our results strongly support the strategies for the pathway analysis of data from current and future GWAS on lymphoma susceptibility, using a deep validation of associations, considering the loci that did not reach genome-wide association thresholds, but may be biologically related. Our observations indicate that the genetic variants in key biological pathways, such as DNA repair, may account for an additional fraction of missing inherited susceptibility to lymphoid neoplasia. The associations observed here can also serve as the basis for further molecular investigations of the biological roles of the implicated loci on both a germline and somatic level. Such investigations will contribute not only to more efficient risk algorithms, but will lead to the improved understanding of lymphomagenesis for more effective targeting of therapy and prevention. Figure S1 Strategy schema for the selection of candidate DNA repair genes in the study. (TIF) Figure S2 Quantile/Quantile (Q/Q) plot comparinglog 10 (expected p-value) vs. -log 10 (observed p-value) under different quality metric scores (info). Based on imputation analysis of 109 genotyped SNPs using IMPUTE2. Q/ Q analysis of imputed data was used to select quality score threshold with least inflation of significant SNPs, which we have set at 0.7. (TIF)