The authors have declared that no competing interests exist.
Conceived and designed the experiments: KYU APC CM JLW JKW ET PB MT LFB PAB. Performed the experiments: KYU APC HH SM PR KJ AT. Analyzed the data: KYU APC SM PR. Contributed reagents/materials/analysis tools: KYU HH SM PR PT YI. Wrote the manuscript: KYU APC.
The extended major histocompatibility complex (xMHC) is the most gene-dense region of the genome and harbors a disproportionately large number of genes involved in immune function. The postulated role of infection in the causation of childhood B-cell precursor acute lymphoblastic leukemia (BCP-ALL) suggests that the xMHC may make an important contribution to the risk of this disease. We conducted association mapping across an approximately 4 megabase region of the xMHC using a validated panel of single nucleotide polymorphisms (SNPs) in childhood BCP-ALL cases (n=567) enrolled in the Northern California Childhood Leukemia Study (NCCLS) compared with population controls (n=892). Logistic regression analyses of 1,145 SNPs, adjusted for age, sex, and Hispanic ethnicity indicated potential associations between several SNPs and childhood BCP-ALL. After accounting for multiple comparisons, one of these included a statistically significant increased risk associated with rs9296068 (OR=1.40, 95% CI=1.19-1.66, corrected p=0.036), located in proximity to
Acute lymphoblastic leukemia (ALL) is a clonal disorder involving the dysregulated proliferation of genetically altered lymphoid progenitor cells that lack the ability for differentiation and maturation. In children, B-cell precursor (BCP) ALL is the most common ALL subtype and accounts for about 80% of childhood ALL cases in most economically developed countries. BCP-ALL, which demonstrates a unique age-incidence peak between 2 and 5 years of age, is widely suspected to be caused by environmental exposures, though these have yet to be definitively identified [
The extended major histocompatibility complex (xMHC) region, spanning about 7.6 megabases (Mb) on the short arm of chromosome 6 (6p21.3), is densely populated with genes that are critical to both innate and adaptive immunity in humans [
Evidence of susceptibility associated with xMHC loci has been identified in several autoimmune, malignant and infectious diseases, including asthma, Hodgkin and non-Hodgkin lymphoma, hepatitis B and HIV infection and others [
A previous analysis of MHC SNP data and imputed HLA class I and II alleles derived from a childhood ALL genome-wide association study (GWAS) suggested that MHC genetic variation is unlikely to be a major determinant of BCP-ALL [
The study protocol was approved by the Institutional Review Boards of the University of California, Berkeley and all collaborating institutions (California Department of Public Health, University of California, Davis, University of California, San Francisco, Children’s Hospital of Central California, Lucile Packard Children’s Hospital, Children’s Hospital and Research Center, Oakland, Kaiser Permanente, Roseville, Kaiser Permanente, Santa Clara, Kaiser Permanente, San Francisco, Kaiser Permanente, Oakland), and written informed consent was obtained from the parents or guardians on behalf of the children participants involved in this study. This study was conducted in accordance with the Declaration of Helsinki.
The current study was conducted within the NCCLS, an ongoing case-control study of childhood leukemia. Beginning in 1995, newly diagnosed childhood leukemia cases were ascertained at the time of diagnosis from major pediatric hospitals in a 17-county San Francisco Bay Area study region, expanded in 1999 to 35 counties in Northern and Central California, USA. Comparison with the California Cancer Registry (1997-2003) showed that the NCCLS case ascertainment protocol has captured about 95% of children diagnosed with leukemia in the participating study hospitals. For each eligible case, statewide birth records maintained by the California Office of Vital Records were utilized to generate a list of randomly selected controls that matched the case on child’s date of birth, sex, Hispanic ethnicity (a biological parent who is Hispanic), and maternal race. Information obtained through the birth certificates and commercially available searching tools were used to trace and enroll one or two matched controls for each case.
Cases and controls were considered eligible if they were under 15 years of age at date of diagnosis for cases (or corresponding reference date for controls), residents of the study region, had a biological parent who spoke either English or Spanish, and had no prior history of malignancy. Approximately 85% of eligible cases and 86% of eligible controls consented to participate [
In the current study, non-Hispanic white and Hispanic children with ALL and control children, recruited between 1995 and 2008 (study phases 1-3), were included in the analysis. These are the two largest racial/ethnic groups which together comprise about 85% of enrolled subjects. Other ethnic groups were excluded due to the small number of subjects. Children were classified as Hispanic if at least one biological parent self-identified as Hispanic. Children were assigned to the non-Hispanic white group if both biological parents self-identified as non-Hispanic white. In a previous NCCLS analysis, genetic admixture was assessed using a series of 80 ancestry informative markers for a subset of the cases and controls [
Buccal cells as a source of DNA were obtained from case and control children using cytobrushes by trained interviewers. Cytobrushes were processed within 48 hours of collection by heating in the presence of 0.5N NaOH. Isolated DNA was later re-purified either manually using Gentra Puregene reagents (QIAGEN, USA, Valencia, CA) or an automated organic DNA extraction protocol (AutoGen, Holliston, MA). Whole genome amplification (WGA) was performed using GenomePlex reagents (Rubicon Genomics, Ann Arbor, MI) according to the manufacturer’s protocol. WGA products were cleaned with a Montage PCR9 filter plate (Millipore, Billerica, MA). When buccal cytobrush DNA was inadequate or not available (26.6% of subjects), DNA was isolated from dried bloodspots collected at birth and archived at ˗20ºC by the Genetic Disease Screening Program of the California Department of Public Health. After extraction using the QIAamp DNA Mini Kit (QIAGEN, USA, Valencia, CA), DNA samples were whole-genome amplified using REPLI-g reagents (QIAGEN, USA, Valencia, CA). Regardless of source, DNA specimens were quantified using human-specific Alu-PCR to confirm a minimum level of amplifiable human DNA [
Genotyping was conducted using the Illumina MHC Mapping Panel (Illumina Inc., San Diego, CA) which comprises 1,293 SNPs spanning an approximately 4 Mb region of the xMHC bounded by the tripartite motif containing protein 27 (
Genotyping was conducted on 1,550 unique DNA samples (635 cases and 915 controls), in addition to 10 sets of
Quality control metrics applied to the 1,550 samples also resulted in the exclusion of 17 samples (1.1%) with less than 95% overall genotyping success rate and 20 samples (1.3%) that showed questionable concordance between reported gender and gender prediction by the Illumina platform. There was 99.6% concordance of successfully genotyped SNPs in the duplicate series and a 0.2% Mendelian error rate was observed in the CEPH family trios. Application of these quality control criteria and a focus on BCP-ALL (54 T-cell or mixed lineage ALL cases excluded) resulted in the analysis of 1,145 SNPs in a total of 567 BCP-ALL cases and 892 controls (
Cases |
Controls |
|||
---|---|---|---|---|
n | % | n | % | |
567 | 100 | 892 | 100 | |
Male | 298 | 52.6 | 495 | 55.5 |
Female | 269 | 47.4 | 397 | 44.5 |
0-1 | 51 | 9.0 | 111 | 12.4 |
2-5 | 340 | 60.0 | 460 | 51.6 |
6-10 | 114 | 20.1 | 210 | 23.5 |
11-14 | 62 | 10.9 | 111 | 12.4 |
White, non-Hispanic | 241 | 42.5 | 426 | 47.8 |
Hispanic | 326 | 57.5 | 466 | 52.2 |
cALLb | 309 | 49.8 | NA | NA |
Non-cALL | 258 | 41.5 | NA | NA |
High-hyperdiploid | 178 | 28.7 | NA | NA |
96 | 15.5 | NA | NA | |
Normal karyotype | 58 | 9.3 | NA | NA |
Abbreviations: BCP-ALL, B-cell precursor acute lymphoblastic leukemia; cALL, common ALL; NA, not applicable
a Categories of ALL subtypes are not mutually exclusive. Percentages do not equal to 100. Cytogenetic data were available for 87% of patients.
b cALL is defined as CD10+ and CD19+ ALL diagnosed between age 2-5 years.
For a large subset of the BCP-ALL cases (87%), data on hyperdiploidy and
Data analysis included a two-stage approach. First, we examined the contribution of 1,145 xMHC SNPs individually. We used logistic regression to calculate the odds ratio (OR) and 95% confidence intervals (CI) for each SNP adjusting for child’s age, sex, and race/ethnicity (i.e. non-Hispanic white or Hispanic). Various genetic models of inheritance were considered including log-additive, dominant, and recessive models, in addition to an evaluation of the dominance deviation from additivity. SNPs showing a nominal p-value of less than 0.01 in any of these analyses were considered potentially associated with childhood ALL and were subject to further analysis. Multivariable logistic regression was used to evaluate the independence of effect of multiple potentially associated SNPs within a region on childhood ALL risk. Stratified analyses by age (0-5 and 6-14 years) and gender were considered in sub-analyses of the data. To account for multiple comparisons in the presence of LD between SNPs, we calculated adjusted p-values based on 10,000 permutations of case-control status on 1,145 SNPs and considered adjusted p-values below a family-wise type I error rate threshold of 0.05 to be statistically significant.
Second, we conducted three-SNP sliding window haplotype analyses across candidate regions selected by the location of SNPs that showed nominal p-values of less than 0.01. This resulted in 3 broad regions (
Presented are -log10(p-values) resulting from the logistic regression analysis assuming log-additive (navy blue) and dominant (light blue) genetic models of inheritance and adjusting for child’s age, sex, and race/ethnicity. Results plotted above the dotted line represent nominal p-values of less than 0.01. Analyses evaluating the recessive and genotypic genetic models were also performed (not plotted) resulting in five additional SNPs with a nominal p-value of less than 0.01 which were also located within one of the three designated regions (Regions A-C). A total of 20 SNPs with a p-value below this threshold were considered in further analyses.
Case and control distributions for sex, age, and race/ethnicity were similar, as expected from the matched design of the NCCLS (
Analysis of 1,145 SNPs in childhood BCP-ALL, assuming a log-additive genetic model of inheritance, showed a quantile-quantile (Q-Q) plot of the expected versus observed –log10 p-value distribution that suggested little evidence of inflation in results caused by systematic error (
Frequencya |
Single SNP |
Mutually adjusted |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
SNP | Position | Region of xMHC | Minor allele | Cases | Controls | OR | 95% CIb | p-value | OR | 95% CIb | p-value | p-value |
rs7747023 | 29133659 | Extended class I | G | 0.17 | 0.21 | 0.73 | (0.60-0.89) | 1.7x10-3 | 0.72 | (0.59-0.88) | 1.4x10-3 | 0.518 |
rs3130785 | 30904717 | Class I | A | 0.14 | 0.11 | 1.45 | (1.16-1.82) | 1.3x10-3 | 1.37 | (1.09-1.74) | 7.9x10-3 | 0.973 |
rs1632856 | 31079715 | Class I | A | 0.25 | 0.29 | 0.80 | (0.68-0.95) | 9.9x10-3 | 0.79 | (0.66-0.94) | 8.6x10-3 | 0.898 |
rs2524279 | 31500885 | Class I | G | 0.11 | 0.15 | 0.73 | (0.58-0.92) | 7.9 x10-3 | 0.70 | (0.55-0.89) | 3.0x10-3 | 0.749 |
rs9296068 | 33096673 | Class II | C | 0.42 | 0.36 | 1.37 | (1.17-1.61) | 1.2x10-4 | 1.40 | (1.19-1.66) | 5.7x10-5 | 0.036 |
rs213203d | 33346382 | Extended class II | A | 0.47 | 0.49 | 0.68 | (0.55-0.84) | 3.6x10-4 | 0.69 | (0.55-0.86) | 7.4x10-4 | 0.347 |
Abbreviations: CI, confidence interval; FWE, family-wise type I error; OR, odds ratio; SNP, single nucleotide polymorphism; xMHC, extended major histocompatibility complex
a Frequency of the minor allele in case and control subjects
b ORs and 95% CI for each SNP in the single SNP analysis were derived using logistic regression assuming a log-additive genetic model of inheritance and adjusting for child’s age, sex, and race/ethnicity (non-Hispanic white versus Hispanic). The mutually adjusted analysis included additional adjustment for the effects of all other SNPs in the table.
c Adjustment for multiple testing was performed with 10,000 permutations of case-control status on 1,145 SNPs and using FWE rate of 0.05. An adjusted p-value of less than 0.05 was considered statistically significant.
d Evaluation of the genetic model of inheritance indicated a significant deviation from the log-additive model with an effect associated with heterozygotes. ORs and 95% CI were estimated for heterozygous genotypes compared to homozygous genotypes.
Further examination of SNP rs9296068 (
Odds ratios (ORs, represented by boxes with the area of each box inversely proportional to the variance of the estimate) and 95% CI (error bars) were derived using logistic regression assuming a log-additive genetic model and adjusting for rs7747023, rs3130785, rs1632856, rs2524279, and rs213203 (other potentially associated SNPs presented in
We performed a 3-SNP sliding window haplotype analysis that included 395 genotyped SNPs across each of the 3 candidate regions (regions A, B, and C) identified by the single SNP analysis. After adjusting for multiple comparisons, a statistically significant association was found for the rs1237485-rs3118361-rs2032502 haplotype (nominal global p=3.2x10-4; corrected p=0.046) in region A (
Frequency |
Compare to reference haplotype |
Compared to all other haplotypes |
||||||
---|---|---|---|---|---|---|---|---|
Haplotype | Cases (%) | Controls (%) | OR | 95% CIa | OR | 95% CIa | p-value | p-value |
rs1237485-rs3118361-rs2032502c | ||||||||
A-A-G | 0.06 | 0.06 | 1.03 | (0.72-1.48) | 1.16 | (0.82-1.62) | 0.406 | |
A-G-A | 0.10 | 0.13 | 0.84 | (0.65-1.08) | 0.74 | (0.58-0.94) | 0.015 | |
A-G-G | 0.37 | 0.33 | 1.22 | (1.02-1.45) | 1.14 | (0.97-1.34) | 0.109 | |
G-A-G | 0.06 | 0.03 | 2.44 | (1.54-3.88) | 2.18 | (1.41-3.38) | 2.9 x10-4 | |
G-G-G | 0.41 | 0.45 | 0.89 | (0.76-1.04) | 0.136 | |||
Global p-valued | 3.2x10-4 | 0.046 | ||||||
rs423639-rs7754316-rs9296068c | ||||||||
A-A-A | 0.07 | 0.08 | 0.82 | (0.61-1.11) | 0.93 | (0.69-1.23) | 0.576 | |
G-A-A | 0.27 | 0.32 | 0.77 | (0.63-0.93) | 0.80 | (0.67-0.95) | 9.1x10-3 | |
G-A-C | 0.39 | 0.34 | 1.25 | (1.07-1.46) | 6.0x10-3 | |||
G-G-A | 0.24 | 0.25 | 0.82 | (0.67-1.00) | 0.91 | (0.76-1.10) | 0.331 | |
G-G-C | 0.04 | 0.02 | 1.98 | (1.08-3.63) | 2.47 | (1.38-4.43) | 1.7x10-3 | |
Global p-valued | 9.2 x10-5 | 0.014 |
Abbreviations: CI, confidence interval; FWE, family-wise type I error; OR, odds ratio; SNP, single nucleotide polymorphism
a ORs and 95% CI for each haplotype were derived using logistic regression modeling haplotype probabilities and adjusting for child’s age, sex, and race/ethnicity (non-Hispanic white versus Hispanic).
b Adjustment for multiple testing was performed with 10,000 permutations of case-control status on 393 haplotype windows and using FWE rate of 0.05. An adjusted p-value of less than 0.05 was considered statistically significant.
c The associated region A haplotype is located in the extended class I region and the SNPs are at chromosomal positions 29,002,323 (rs1237485), 29,006,266 (rs3118361), and 29,009,544 (rs2032502). The associated region C haplotype is located in the class II region and the SNPs are chromosomal positions 33,095,752 (rs423639), 33,095,976 (rs7754316), and 33,096,673 (rs9296068).
d Global p-values were derived based on a likelihood ratio test of association based on the null hypothesis of no effect of any haplotype at that position.
The -log10 (p-values) for each SNP (y-axis) are plotted against their chromosomal position (x-axis, Mb). The colors of the points indicate the degree of linkage disequilibrium (based on
Finally, haplotype rs423639-rs7754316-rs9296068 (
In this study, we conducted a SNP-association analysis of childhood BCP-ALL compared with controls across a 4 Mb stretch of the xMHC in an attempt to pinpoint regions of potential involvement in susceptibility. The xMHC is a potentially strong candidate region for a role in genetic susceptibility to childhood ALL, a disease whose causation has been attributed to an inappropriate immune response to post-natal infection [
The rs9296068 SNP is located in the 5’ untranslated region of the
Recently, HLA-DOA has been implicated in other disease association studies such as type 1 diabetes and chronic lymphocytic leukemia survival [
Previous examinations of the xMHC in childhood ALL have mostly been candidate gene studies that focused on the classical HLA genes (i.e.
Authors of a recent report using data extracted from a prior GWAS analysis concluded no substantive support for a major role of MHC genetic variation on childhood BCP-ALL risk [
Using a haplotype sliding window analysis, we identified a second independently associated locus which is localized to the extended class I region and is represented by a haplotype comprised of SNPs rs1237485, rs3118361, and rs2032502. This haplotype maps to the 5’ untranslated region of the
We did not impute genotypes for additional xMHC SNP loci or classical HLA alleles because certain features of the current study made it suboptimal for implementation of imputation including, 1) uncertainty in the use of currently available reference panels for imputation in a recently admixed population [
The associations reported from these analyses were not identified in the previous GWAS [
Certain characteristics of the association mapping approach, namely the dependence on SNP coverage and the effect of multiple comparisons on statistical power, may have contributed to inconsistencies between results of the current study and associations reported in previous studies based on the candidate gene approach. As reviewed previously [
Any substantial effect of population stratification is likely to be minimal in the NCCLS due to the careful and detailed account of race and ethnicity obtained from the subjects and statistical adjustment. As described earlier, this is further supported by our previous report showing estimates of genetic ancestry (percent of European, Amerindian, and African ancestry) to be similar between cases and matched controls [
In this comprehensive examination of genetic variation across the xMHC, we provide evidence localizing potential disease susceptibility loci for childhood BCP-ALL to two regions, the extended class I near
Association results were derived by logistic regression assuming a log-additive genetic model and adjusting for child’s age, sex, and race/ethnicity. The red line represents the plot where the observed distribution of the -log10 (p-value) is same as the expected distribution given the number of SNPs tested.
(TIF)
The values displayed in the plot are correlation coefficients (
(TIFF)
Odds ratios (ORs, represented by boxes with the area of each box inversely proportional to the variance of the estimate) and 95% confidence intervals (CIs, error bars) were derived using logistic regression adjusting for child’s age, sex, and race/ethnicity depending on the stratification variable. The dashed vertical line represents the OR of the SNP in the analysis of BCP-ALL among all subjects and the width of the diamond is the corresponding 95% CI.
(PDF)
(PDF)
We would like to thank the staff of the University of California, Berkeley Genetic Epidemiology and Genomics Laboratory, the Northern California Childhood leukemia Study, and the Survey Research Center, and the participating children and their families for their important contributions to this study. We also thank the clinical collaborators and participating hospitals: University of California Davis, University of California San Francisco, Children’s Hospital of Central California, Lucile Packard Children’s Hospital, Children’s Hospital and Research Center Oakland, Kaiser Permanente Roseville, Kaiser Permanente Santa Clara, Kaiser Permanente San Francisco, Kaiser Permanente Oakland.