Confirmation of Childhood Acute Lymphoblastic Leukemia Variants, ARID5B and IKZF1, and Interaction with Parental Environmental Exposures

Genome wide association studies (GWAS) have established association of ARID5B and IKZF1 variants with childhood acute lymphoblastic leukemia (ALL). Epidemiological studies suggest that environmental factors alone appear to make a relatively minor contribution to disease risk. The polygenic nature of childhood ALL predisposition together with the timing of environmental triggers may hold vital clues for disease etiology. This study presents results from an Australian GWAS of childhood ALL cases (n = 358) and population controls (n = 1192). Furthermore, we utilised family trio (n = 204) genotypes to extend our investigation to gene-environment interaction of significant loci with parental exposures before conception, and child’s sex and age. Thirteen SNPs achieved genome wide significance in the population based case/control analysis; ten annotated to ARID5B and three to IKZF1. The most significant SNPs in these regions were ARID5B rs4245595 (OR 1.63, CI 1.38–1.93, P = 2.13×10−9), and IKZF1 rs1110701 (OR 1.69, CI 1.42–2.02, p = 7.26×10−9). There was evidence of gene-environment interaction for risk genotype at IKZF1, whereby an apparently stronger genetic effect was observed if the mother took folic acid or if the father did not smoke prior to pregnancy (respective interaction P-values: 0.04, 0.05). There were no interactions of risk genotypes with age or sex (P-values >0.2). Our results evidence that interaction of genetic variants and environmental exposures may further alter risk of childhood ALL however, investigation in a larger population is required. If interaction of folic acid supplementation and IKZF1 variants holds, it may be useful to quantify folate levels prior to initiating use of folic acid supplements.


Introduction
Contemporary genome wide association studies (GWAS) have consistently revealed, in the European population, two genetic loci that are associated with childhood acute lymphoblastic leukemia (ALL) risk [1][2][3][4]. The genes implicated at these loci, AT-rich interactive domain 5b (ARID5B) and Ikaros family zinc finger 1 (IKZF1), encode proteins that regulate normal lymphocyte differentiation. Other genes have been identified from some but not all GWASs including CEBPE, CDKN2A and GATA3 [5,6]. A recent report [7] demonstrated a greater sensitivity to detect risk loci by using an ethnically diverse population of childhood ALL patients and identified a new risk locus at 10p12.31 (BMI1-PIP4K2A). Also, a graduated scale of ALL risk was demonstrated for: i) increasing number of risk alleles at four of the loci combined (ARID5A, IKZF1, CEBPE and BMI1-PIP4K2A); and ii) an ARID5B risk allele with odds highest for children younger than five years and lowest for those older than 10 years. This information underscores the polygenic nature of childhood ALL and places some emphasis on the potential for interaction with environmental triggers and the developmental stages at which they occur.
Epidemiological studies have assessed the role of environmental factors in the pathogenesis of childhood ALL however these factors alone appear to make a relatively minor contribution to disease risk [8]. Previous investigations from our own Australian ALL case-control study have focussed on parental exposures before and after conception including: alcohol consumption, folic acid use, and smoking. We found that paternal smoking around time of conception increased the odds of childhood ALL [9]; our meta-analysis for maternal folate demonstrated that supplementation had a protective effect [10] and; while parental alcohol consumption did not alter odds, the quantity of consumption might [11].
Aforementioned GWAS have investigated associations of IKZF1 and ARID5B genotypes with treatment outcomes, and Linabery et al. [12] assessed interaction with age and sex demographics; however, we hypothesise that genetic variants compounded with environmental exposures and the developmental stages at which they occur, may contribute to the background of risk in childhood ALL.
This study presents results from an Australian GWAS of childhood ALL cases and population controls. A French casecontrol study was used to determine the validity of novel findings [3]. Finally, we investigated gene-environment interaction of key loci identified in our GWAS with parental exposures before conception, and child's sex and age.

Study population
The study population comprised 441 childhood ALL cases from whom blood-derived DNA had been isolated during remission. 324 were participants in The Australian Study of Childhood Acute Lymphoblastic Leukaemia (Aus-ALL) described in Milne et al. [10]. This study involved the collection of environmental exposure data via questionnaire. The remaining 117 cases were from The Tumour Bank at the Children's Hospital Westmead in Sydney NSW, consecutively collected from 1998-2006. The control series comprised 1229 healthy Caucasian adults sourced from the Hunter Community Study (HCS), Newcastle, Australia [13]. Aus-ALL was approved by the human research ethics committees at all participating hospitals and the Hunter Community study was approved by the University of Newcastle Research Ethics Committee and the Hunter New England Health Service Research Ethics Committee.

Genome-wide genotyping
Genome-wide genotyping was undertaken on Illumina (San Diego, USA) BeadChips accordingly: 102 cases were genotyped for the pilot study using HumanCNV370-Duo v.1; 191 cases using HumanCNV370-Quad v.3; 148 cases and 1229 controls using Human610-Quad v.1. Genotype validation for rs7089424 and rs4132601 was carried out for the Aus-ALL cases and 400 HCS controls using Applied Biosystems (Foster City, USA) TaqMan Technology, revealing 100% concordance. Genotyping for these single nucleotide polymorphisms (SNPs) was extended to both parents of Aus-ALL cases for whom DNA and exposure data were available (n = 204 trios).
Genome-wide association analysis PLINK v1.07 [14] was used to apply stringent quality control (QC) first by BeadChip version and again following data merging (Methods S1 in File S1) resulting in 358 cases, 1192 controls and 309 117 SNPs carried forward to genotype imputation (Post-QC case demographics for age, gender and ALL subtype available in Table S1 in File S1). Samples were pre-phased using SHAPEIT, and imputation using the 1000 Genomes phase 1 reference panel was carried out using IMPUTE2 v2.3.0. Case-control association at each SNP was tested under an additive model using a missing data likelihood score test which takes into account genotype uncertainty due to imputation. Models were fitted using SNPTEST v2.4.1, adjusting for the top three principal components defining ancestry. SNPs with a SNPTEST information measure less than 0.4 and a minor allele frequency less than 2% (in cases or controls or cases/controls combined) were filtered out resulting in 7 162 141 SNPs passing QC. Regional association plots for significant loci were produced using LocusZoom [15]. Pritchard Lab and Blood eQTL browsers [16,17] were used to establish whether SNPs are expression quantitative trait loci (eQTLs).
Data from the French ESCALE nationwide registry, comprising 441 cases and 1542 controls [3], were available for replication of a new suggestive association.

Gene-environment interaction analyses
For genotyped SNPs at genome-wide significant loci, the method of Cordell [18] was used to investigate gene-environment interactions for 204 cases-parent trios. Interactions with paternal smoking before conception, and maternal use of folic acid prior to pregnancy were investigated because they demonstrated main effects in previous Aus-ALL studies [9,10]. Maternal alcohol consumption before pregnancy was also investigated as adequate numbers in each exposure category was achievable [11]. Interactions with the child's sex and age were also assessed. The Stata function pseudocc [19] coupled with conditional logistic regression was used for analysis. Genotype-based (rather than allele based) odds ratios were estimated as recommended by Sasieni [20]. This process uses the parents' genotypes to estimate the theoretical genotypes their offspring could have had, and these are used as matched controls for each case. Thus, only genetic main effects and their interactions with 'environmental' (age, sex, folate, etc) effects can be estimated as each case/control set has the same values for the environmental variables.

Results
Genome-wide association study for childhood acute lymophoblastic leukemia QQ-plot and genomic inflation factor (l = 1.0006) indicated an absence of population substructure or other systematic bias between cases and controls ( Figure S1 in File S1). Table 1 annotates 13 SNPs at two loci that were significantly associated with childhood ALL (P,5610 28 ), and 259 additional SNPs achieved P,10 25 (Table S2 in File S1 and Manhattan plot in Figure S2 in File S1). Ten of the 13 significant SNPs map to intron three of ARID5B on chromosome 10 and three flank IKZF1 on chromosome 7 (LocusZoom plots in Figure S3 in File S1).
The most significantly associated SNP was rs4245595 at ARID5B (OR 1.63, CI 1.38-1.93, P = 2.13610 29 ) where G is the risk allele, and the association was stronger for B-cell ALL (OR 1.86, CI 1.54-2.25, P = 1.21610 210 ). Association at the IKZF1 locus was most significant for rs1110701 (OR 1.69, CI 1.42-2.02, p = 7.26610 29 ) where G is the risk allele and this association was also stronger when analysis was confined to the B-cell subtype (OR 1.91, CI 1.57-2.32, P = 8.27610 211 ).
For the third most significant genotyped loci, located on chromosome 8 in the vicinity of the ZNF704 gene, borderline significance was achieved (P,10 25 ). There were three genotyped SNPs in the region with 10 28 ,P,10 25 : rs7000234, rs7018449 and rs6992620. This result, however, was not replicated when interrogated in the French dataset (Table S3 in File S1 and Figure S4 in File S1).

Interaction analyses of environmental variables and IKZF1 and ARID5B variants
Main effects of genotyped SNPs at ARID5B and IKZF1 loci (rs7089424 and rs4132601 respectively) and interactions with environmental variables were conducted using a reduced population of 204 case-parent trios for whom environmental exposure data were available. Genetic main effects demonstrated significant risk increases with homozygosity for the minor alleles (GG for both) versus the referent homozygous major alleles (TT) ( Table 2 and Table S4 in File S1).
An interaction was observed between the genotyped SNP at the IKZF1 locus (rs4132601) and maternal use of folic acid prepregnancy (Table 2). There was a significantly increased risk of ALL among children with the homozygous minor genotype (GG) whose mothers used supplements; the OR was less elevated if they did not (P for interaction = 0.04).
We also observed interaction of rs4132601 genotype and father's smoking prior to conception, whereby risk of ALL associated with the GG genotype was greater among children of non-smoking fathers (P for interaction = 0.05) ( Table 2). In addition, there was some evidence that risk associated with the GG genotype at this SNP was elevated only if the mother drank alcohol before pregnancy (OR: 7.34 CI: 2.23-24.2); however, the interaction P-value was 0.18 (Table 2).
No interaction was evident for rs7089424 with parental exposure variables (Table S4 in File S1). No interaction was observed for variables sex and age with either SNP (IKZF1 rs4132601 sex/age Ps for interaction = 0.90/0.31, and ARID5B rs7089424 sex/age Ps = 0.96/0.72) ( Table 2 and Table S4 in File S1).

Discussion
Our current study has demonstrated that variants in ARID5B and IKZF1 are associated with childhood ALL in an Australian Caucasian cohort. This is consistent with former GWAS publications in similar cohorts [1][2][3][4]. Also in agreement with these studies is our finding that the strength of association was increased when analysis was confined to B-cell ALL versus controls. Our study was not large enough to reveal any other associations such as those previously reported at CEBPE, CDKN2A, GATA3 and BMI1-PIP4K2A -loci which have not consistently been identified across multiple GWAS but which meta-analyses have validated [5,21,22].
The frequencies of the ZNF704 SNPs in our case population were lower than that reported in the 1000 GENOMES phase 1 database but they were not replicated in the French data. This result could be due to population differences thereby suggesting a real effect; however, considering our cases were from an Anglo-Celtic background, and these SNPs were not identified in other such populations [1], it remains to be categorically defined as a risk variant. The ARID5B transcription factor is important in embryogenesis and B-cell development [23] and ARID5B deletion mutations occur in leukemic cells. None of the significantly associated SNPs we identified at this locus are expression quantitative trait loci (eQTLs) according to Pritchard Lab and Blood eQTL browsers [16,17], nor did they overlap enhancer binding sites annotated by ENCODE regulatory segmentation. Despite the four years since publication of the first childhood ALL GWAS, a dearth of information persists for the mechanisms responsible for ARID5B variants predisposing to childhood ALL.
The Ikaros transcription factor is restricted to the hemolymphopoietic system and is a key regulator of lymphocyte differentiation via chromatin remodelling. Our strongest associated SNP rs1110701, and other IKZF1 SNPs identified in this study (rs10272724 and rs17133807), annotate to enhancer binding sites in the GM12878 B-Lymphocyte cell line according to ENCODE regulatory segmentation. These SNPs were identified as eQTLs acting in cis [16,17] consistent with Papaemmanuil's et al. [1] demonstrated attenuation of IKZF1 gene expression with variant allele dosage at rs4132601 (which is in linkage disequilibrium with rs1110701 r 2 = 0.802 D' = 1; P in the current study 8.51610 28 ). Interestingly, Meyer's et al. [24] recent sequence analysis of IKZF1 deletion breakpoints in leukemic cells revealed four recombination hotspots, one of which was observed in 20% of participants and localised to the untranslated region of exon 8, ,1300 bp downstream from rs1110701. Thus, single variants or a combination of variants in this region may contribute to differential expression, splicing, or propensity for rearrangement in leukemic transformation. The full extent of these effects remains to be revealed.
Previous epidemiological studies have provided links between childhood ALL risk and environmental exposures to parents before and after conception. Additionally, certain genetic features of ALL are predominant in particular age groups, such as MLL rearrangements [25]. In the current study, we investigated the potential for interaction of parental exposures and child's age and sex with child's genotype. The most significant SNPs that were genotyped, rather than imputed, at ARID5B (rs7089424) and IKZF1 (rs4132601) loci were used for the interaction analysis. This was to facilitate genotyping validation of the Illumina array technology and the extension of the study in a single step. Paternal smoking, maternal folate and alcohol use (each before conception) were investigated; however no interaction was apparent for ARID5B variant rs7089424 with any of these exposures. There were no interactions observed for the two variants with child's sex and age, validating the findings of Linabery et al. [12].
A meta-analysis of maternal folic acid supplementation and risk of childhood ALL conducted by our affiliated Aus-ALL consortium, verified a protective effect for supplementation [10]. When we assessed interaction of maternal folic acid supplementation with IKZF1 SNP rs4132601 (the most significant genotyped SNP at the locus), we observed elevated odds ratios for children with the risk (GG) genotype whose mother took supplements. This observation is contrary to expectations, and biologically plausible explanations for this direction of effect are lacking. Nonetheless, potential interaction between folate levels and genotype at IKZF1 warrants further investigation since: i) transcription of the reduced folate carrier (SLC19A1) can be modulated by the balance of Ikaros activating and dominant-negative isoforms [26] and ii) increased circulating un-metabolised folic acid has demonstrated association with reduced natural killer cell cytotoxicity [27] which is in turn associated with cancer [28]. Additionally, our measure of folic acid exposure (supplementation-Yes/No) is without knowledge of possible dietary or metabolic folate sufficiency which could have influenced the observations. Implications of this finding suggest that folate quantification may be valuable prior to initiating supplementation.
We observed a similarly unexpected interaction whereby risk of childhood ALL was greater for children with rs4132601 risk (GG) genotype and non-smoking fathers ( Table 2) despite a previous Aus-ALL epidemiological study [9] demonstrating that paternal smoking was a risk factor for childhood ALL. An interesting trend was observed whereby risk associated with IKZF1 was elevated only for the GG genotype if the mother drank alcohol before pregnancy. While this may be a chance finding, it corroborates health recommendations to avoid alcohol during pregnancy.
Given the limited numbers of cases homozygous for the minor alleles for all interactions, it is possible that our findings for SNP rs4132601 occurred due to chance, nonetheless investigation in a larger population is required to elucidate whether any true interactions exist.

Conclusions
We have replicated associations of IKZF1 and ARID5B variants with childhood ALL in an Australian Caucasian population. The IKZF1 variants identified were eQTLs and impacted enhancer sequences however, such was not the case for ARID5B variants. Furthermore, there was some evidence for gene-environment interaction for an IKZF1 variant, with an apparently stronger genetic effect if the mother took folic acid or drank alcohol, or if the father did not smoke prior to pregnancy, but these may be explained by chance. These findings warrant further investigation in larger samples and may indicate that folate quantification may be valuable prior to initiating supplementation.