Non-Hodgkin Lymphoma Risk and Variants in Genes Controlling Lymphocyte Development

Non-Hodgkin lymphomas (NHL) are a heterogeneous group of solid tumours of lymphoid cell origin. Three important aspects of lymphocyte development include immunity and inflammation, DNA repair, and programmed cell death. We have used a previously established case-control study of NHL to ask whether genetic variation in genes involved in these three important processes influences risk of this cancer. 118 genes in these three categories were tagged with single nucleotide polymorphisms (SNPs), which were tested for association with NHL and its subtypes. The main analysis used logistic regression (additive model) to estimate odds ratios in European-ancestry cases and controls. 599 SNPs and 1116 samples (569 cases and 547 controls) passed quality control measures and were included in analyses. Following multiple-testing correction, one SNP in MSH3, a mismatch repair gene, showed an association with diffuse large B-cell lymphoma (OR: 1.91; 95% CI: 1.41–2.59; uncorrected p = 0.00003; corrected p = 0.010). This association was not replicated in an independent European-ancestry sample set of 251 diffuse large B-cell lymphoma cases and 737 controls, indicating this result was likely a false positive. It is likely that moderate sample size, inter-subtype and other genetic heterogeneity, and small true effect sizes account for the lack of replicable findings.


Introduction
Non-Hodgkin lymphoma (NHL) is a collection of malignancies of lymphocyte origin. In Western countries, 85% of NHLs have a B-cell origin. NHL subtypes vary in prognosis, treatment options and outcome. Diffuse large B-cell lymphoma (DLBCL) patients with different molecular or genetic abnormalities can have diverse presentation and outcomes. Risk of developing NHL can be influenced by both environmental and genetic factors that affect the survival of lymphocytes.
Lymphocyte development is a complex process, with checkpoints in place to ensure that the cells whose function is to quickly and effectively protect the host from a variety of offences, will also withhold such an assault on host cells. Cell growth and cell death need to be regulated so that the number of lymphocytes is controlled in such a way that they are sufficient to fight infections, but not so numerous that they are a burden to maintain. Three important aspects of this control are: 1) immunity and inflammation to respond to stimuli that cause their activation and rapid cell cycle division; 2) DNA repair to counteract errors from cell division or lymphocyte receptor gene rearrangement; and 3) cell death to remove lymphocytes that are not able to meet cell cycle checkpoints and/or reduce autoimmunity.
Collectively, genetic variants in these types of genes are likely to play a role in susceptibility to NHL. To survey for genetic factors associated with NHL in genes involved in immunity and inflammation, DNA repair or cell death, we selected 118 genes (listed in Table S1 in File S1) related to these biological processes, tagged them with SNPs and tested them for association with NHL in 569 cases and 547 controls. In addition, we selected 39 SNPs that had previously been associated with NHL in the literature, and tested them for replication in our study. After correction for multiple testing, we found evidence that a SNP in MSH3, a gene that has never before been implicated in NHL, may affect susceptibility to DLBCL; however, this association did not replicate in an independent NHL population.

Materials and Methods
The samples and genes tested in this study were part of a 1536-SNP Illumina GoldenGate panel that included SNPs from candidate genes related to other pathways and hypotheses [9]. Details of the population, samples and methodology have been previously described [10].

Study Subjects and Samples
All new NHL cases in the Greater Vancouver Regional District and Greater Victoria (Capital Regional District), British Columbia, from March 2000 to February 2004 were invited to participate. Cases aged 20 to 79 were included. Patients with prior transplant or HIV-positivity were excluded. Population controls were frequency matched by age (within 5-year groups), sex and area of residence. Family history of cancer was based on subject-reported data. Of 821 cases and 848 controls were available for this study, 797 cases and 790 controls had sufficient DNA for genotyping. The study was approved by the joint University of British Columbia/British Columbia Cancer Agency Research Ethics Board; all participants gave written informed consent.

Genotyping
The 118 genes selected for this study ( Table 1) were based on a review of the biological literature. For each gene, publicly available data from HapMap phase II was imported into Haploview [11] for tagSNP selection using Tagger at r 2 = 0.8. TagSNP selection was restricted to SNPs with minor allele frequency (MAF) .5%. In addition, 39 specific SNPs previously reported as associated with NHL, autoimmune disease or cancer were included to test for replication of these associations in our study. These 'replication' SNPs are listed in Table S2 in File S1. 51 ancestry-informative markers (AIMs) selected from Halder et al. [12] were also included in the assay. Genotyping was done using the Golden Gate system (Illumina, San Diego, CA), at The Centre for Applied Genomics, the Hospital for Sick Children in Toronto, Canada; as described previously [9].
Quality control (Q/C) was conducted using Genome Studio version 2009.1 (Illumina, San Diego, CA) and systems and databases developed in the laboratory of DD [13]. Genotypes derived from WGA DNA and genomic DNA were subjected to Q/C separately. 1411 samples (717 cases and 694 controls) passed Q/C ( Table 2); 1116/1411 samples (569 cases and 547 controls) were of European ancestry and subsequently included in statistical analysis [9]. AIMS analysis in this study has been previously described [9], and supported analysis of the European-ancestry samples as one group.
Of 708 SNPs selected for genotyping of variants in genes related to lymphocyte development, 109 were excluded at the genotype Q/C stage (32 SNPs were rejected by the genotyping centre upon initial inspection, 14 for low GenTrain scores, 26 for being potential copy number variants, 12 for being monoallelic, 8 for having a call rate ,0.95, 15 for having any error between duplicate genotypes, and 2 for deviating significantly from Hardy-Weinberg equilibrium [HWE]). An additional 160 SNPs failed Q/ C only in WGA samples (8 upon initial inspection by the genotyping centre, 49 for low GenTrain score, 64 for call rate ,0.95, 38 SNPs that had discrepant genotypes between WGA samples and pre-WGA matched DNA, and 1 SNP for being out of HWE), and 4 SNPs failed Q/C only in mouthwash or saliva samples. This left 599 SNPs (85%), listed in Table S3 in File S1, for analysis in all non-WGA samples and 439 SNPs in both blood and WGA samples.

Statistical Analysis
Statistical analyses were conducted in SVS Suite 7 (Golden Helix, Bozeman, MT). Logistic regression (additive model) was fit for diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL), marginal zone lymphoma (MZL), all B-cell NHLs and all Tcell NHLs. Other NHL subtypes were not individually tested, as sample numbers were insufficient. In all subtype analyses, selected cases were compared to all controls. The analysis was restricted to European-ancestry samples, with other ethnicities (Asian, southeast Asian and ''other'') only tested when SNPs showed association in European-ancestry samples, corresponding to 148 DLBCL, 165 FL, 55 MZL, 523 B-cell NHL, 45 T-cell NHL and 547 control samples. This corresponded to a minimum detectable odds ratio of 1.54 for DLBCL, 1.51 for FL, 1.88 for MZL, 1.33 for B-cell NHL and 1.99 for T-cell NHL. For each SNP, p-values were calculated for the model with the SNP of interest vs. the basic model (which accounted for 5-year age groups, sex, and region). For only the SNPs that showed a statistically significant association, to find the model with the best fit we then tested dominant and recessive models in genotypic tests using the chi-squared test, as well as a recessive model by logistic regression with the adjustments listed above (i.e. age groups, sex and region). SNPs that showed an association were also tested for interaction with sex by comparing a model including the SNP, age group, sex and region to a model that also included the SNP*sex interaction. In genes that Table 1. Genes and categories. AICDA, BRD2, CCL5, CD69, CD74, CD81, CTLA4, HFE, IFNAR2, IFNB1, IFNG, IL10RA, IL1RN, IL4, IL6, IL7, IL7R, IRF4, IRF5, ITGAM contained multiple SNPs with an association, the SNPs that showed an association were tested for interaction by comparing a model including that included the two SNPs, age group, sex and region vs. a model with the addition of the SNP*SNP interaction. In addition, for genes with an association, haplotype analysis was conducted in SVS Suite 7.

Immunity and Inflammation
To correct for multiple testing, we have used a two-tiered approach, as previously described [9]. The Benjamini-Hochberg procedure [14], implemented in R version 2.11.1, was applied to control the false-discovery rate (FDR) for SNPs within each gene, giving a corrected p-value denoted as p G . The smallest adjusted pvalue for each gene was taken to represent the gene, and FDR was applied again across the genes in each of the three hypotheses (i.e. gene categories) tested (cell death, DNA repair and immunity and inflammation). This second corrected p-value was denoted p H . Adjusted p-values ,0.05 were considered statistically significant. No multiple-testing correction was applied for the few interaction or haplotype tests.
Since genes involved in mismatch repair pathways have been shown to be important for colorectal cancer risk, we tested whether rs33003 in MSH3 were associated with a family history of colorectal cancer. Colorectal cancer in one or more first-degree relatives of the NHL cases and controls was coded as a true/false ''family history of colorectal cancer'' variable, and was used in logistic regression analysis in European-ancestry samples, adjusting for sex, region and 5-year age groups. 28/569 cases and 33/547 controls of European-ancestry had a family history of colorectal cancer.

Replication
The association of rs33003 with DLBCL was tested in a previously described independent population from the San Francisco Bay Area [15] Table S4 in File S1 lists all SNPs with p,0.05 (before any multiple testing correction). Table 3 lists the 59 SNPs with p G ,0.05. Of note, none of the 39 SNPs selected to replicate previously reported associations were associated with lymphoma in our population. Only one SNP showed an association that was significant after multiple testing correction both at the individual gene and multi-gene (hypothesis) level. rs33003, located in MSH3, was significantly associated with DLBCL (OR per allele: 1.91 [95% CI: 1.41-2.59]; p G = 0.0002; p H = 0.0103). It is a common SNP, with MAF 0.32. We found the recessive model best fits the inheritance mode of rs33003 ( Table S5 in File S1). Many SNPs in the same region had low p-values in the analysis with DLBCL ( Figure 1). The second most strongly associated SNP in MSH3, rs181747, is in moderate linkage disequilibrium with rs33003, with r 2 = 0.55 in HapMap data and r 2 = 0.65 in our data set. There is evidence for an interaction between these two SNPs (p = 0.0014). However, no haplotype of SNPs in this region was more strongly associated with DLBCL than either of these two SNPs alone. There was no statistically-significant association of rs33003 or rs181747 with DLBCL in 21  .50], p G = 0.4849, respectively); the number of samples in these groups is too small to make a statement about associations in these groups. There was also no evidence for interaction between rs33003 and rs181747 in Asian ancestry samples (p = 0.1957) or South-Asian ancestry samples (p = 0.9873).

Results
Testing rs33003 for association with increased risk of family history of colorectal cancer showed an association under the recessive model (OR: 0.20 [95% CI: 0.03-1.43], p = 0.034) but not under the additive or dominant models. The 95% confidence interval overlaps 1, however, indicating this result could be a chance finding. Furthermore, adjusting the DLBCL   susceptibility analysis by family history of colorectal cancer (in addition to 5-year age group, sex, and region) did not change the OR or p-values of the association of rs33003 with DLBCL susceptibility. We find no evidence that family history of colorectal cancer influences the association between rs33003 and susceptibility to DLBCL. The association of rs33003 with DLBCL did not replicate in the San Francisco sample set (OR 1.03 [95% CI: 0.83-1.29], p = 0.774). The minor allele frequencies of rs33003 are similar in the original population (MAF = 0.32) and the San Francisco set (MAF = 0.34). Furthermore, the r 2 value between rs33003 and rs181747 is similar in the two populations (r 2 = 0.65 in the original population and r 2 = 0.67 in the San Francisco population). This indicates that the failure to replicate is unlikely to be due to population-specific differences in minor allele frequencies or LD structure in that area of the genome.

Discussion
After multiple testing correction within genes, there was evidence for associations of NHL subtypes with SNPs in two genes: RELB with MZL and MSH3 with DLBCL. Only the MSH3 association, however, was significant after the additional correction for multiple testing between genes. This association, however, did not replicate in another North American population [15], indicating that it was likely a type I error.
MSH3 is involved in DNA mismatch repair (MMR), which corrects mismatched or unmatched bases and small insertion/ deletion loops that result from DNA replication before cell division or from DNA repair processes [17]. The MMR pathway is an important repair mechanism in normal lymphocyte development as evidenced by mouse models and human patients deficient in this pathway [18]. Studies of MMR deficiency and MMR gene deregulation in lymphomas have also illustrated the potential role of this pathway in NHL [19][20][21][22][23].
Because of the MMR pathway's established role in hereditary non-polyposis colorectal cancer (HNPCC), we tested whether  rs33003 was associated with a family history of colorectal cancer in first degree relatives. Adjusting the DLBCL susceptibility analysis by family history of colorectal cancer in addition to 5-year age group, sex, and region did not change the analysis results, indicating that family history of colorectal cancer is not a confounder for susceptibility to DLBCL. Furthermore, we did not find that rs33003 was associated with a family history of colorectal cancer. This is not entirely surprising, as colorectal cancer is not associated with lymphoma [24], although mismatch repair cancer syndrome is characterized in part by a combination of colorectal polyposis, [25] and early-onset hematologic cancers [26,27]. DLBCL can be subdivided into at least three subgroups using molecular signatures [28]. It is therefore possible that the MSH3 association is confined to patients with tumours belonging to specific DLBCL subgroups. We do not, however, have molecular signature data for the tumours of the DLBCL patients included in this study. It is also possible that there are true associations with NHL susceptibility that we are not able to detect in this study. This could be due to low sample sizes for some subtypes of NHL in our study, or perhaps population-specific effects. This could explain our inability to replicate candidate gene ( Table 1) associations of SNPs in IRF4 with FL [8], or our observation of weak associations (i.e. a SNP with p,0.05 but that does not pass multiple testing correction) of SNPs in BID, APAF1 and CASP10 with NHL [2]. We were also unable to replicate other associations for SNPs in the ''replication'' category, listed in Table S2 in File S1. Furthermore, HapMap coverage may not have been adequately deep to represent causal variants present in some genes we assayed, making our tagSNP approach vulnerable to false negative results. As in most other lymphoma studies [1][2][3][4][5][6][7][8], multiple testing correction was not done for the number of subtypes tested as the subtypes are considered separate disease entities, with different presentation, possible etiology and hypotheses. Finally, any association reported here could be an association with survival as opposed to susceptibility, as patients who have less aggressive disease are more likely to have time to participate in the study and provide a DNA sample. This is not likely, however, given the low percentage of cases who died prior to contact (10.5% in the British Columbia study [10] and 14.2% in the San Francisco set [15]).
In summary, we found no replicated associations in the genes studied related to immunity and inflammation, DNA repair and programmed cell death.

Supporting Information
File S1 Table S1. Candidate genes chosen based on biological interest. Table S2. SNPs tested for replication. Table S3. SNPs that passed quality control. Table S4. Logistic regression analysis results for SNPs with p G ,0.05 before multiple testing correction.