Space-Time Clustering of Childhood Leukemia: Evidence of an Association with ETV6-RUNX1 (TEL-AML1) Fusion

Background Many studies have observed space-time clustering of childhood leukemia (CL) yet few have attempted to elicit etiological clues from such clustering. We recently reported space-time clustering of CL around birth, and now aim to generate etiological hypotheses by comparing clustered and nonclustered cases. We also investigated whether the clustering resulted from many small aggregations of cases or from a few larger clusters. Methods We identified cases of persons born and diagnosed between 1985 and 2014 at age 0–15 years from the Swiss Childhood Cancer Registry. We determined spatial and temporal lags that maximized evidence of clustering based on the Knox test and classified cases born within these lags from another case as clustered. Using logistic regression adjusted for child population density, we determined whether clustering status was associated with age at diagnosis, immunophenotype, cytogenetic subtype, perinatal and socioeconomic characteristics, and pollution sources. Results Analyses included 1,282 cases of which 242 were clustered (born within 1 km and 2 years from another case). Of all investigated characteristics only the t(12;21)(p13;q22) translocation (resulting in ETV6-RUNX1 fusion) differed significantly in prevalence between clustered and nonclustered cases (40% and 25%, respectively; adjusted OR 2.54 [1.52–4.23]; p = 0.003). Spatio-temporal clustering was driven by an excess of aggregations of two or three children rather than by a few large clusters. Conclusion Our findings suggest ETV6-RUNX1 is associated with space-time clustering of CL and are consistent with an infection interacting with that oncogene in early life leading to clinical leukemia.


Results
Analyses included 1,282 cases of which 242 were clustered (born within 1 km and 2 years from another case). Of all investigated characteristics only the t(12;21)(p13;q22) translocation (resulting in ETV6-RUNX1 fusion) differed significantly in prevalence between clustered and nonclustered cases (40% and 25%, respectively; adjusted OR 2.54 [1.52-4.23]; p = 0.003). Spatio-temporal clustering was driven by an excess of aggregations of two or three children rather than by a few large clusters.

Introduction
The etiology of childhood leukemia remains largely unknown. Multiple factors including genetic susceptibility, exogenous and endogenous exposures, and chance are thought to play a role [1]. Except for ionizing radiation in high doses, no environmental risk factors have been established [2,3]. The rarity of the disease and the existence of biologically distinct subtypes with potentially diverse etiologies complicate the search for causal pathways [1]. Several chromosomal abnormalities are observed with increased frequency among cases of childhood leukemia (cytogenetic subtypes) [1,4]. These are considered to be disease initiating events mostly occurring in utero or in early life, which are followed by further genetic alterations that then lead to overt disease [5]. Environmental triggers might be involved in both initial and subsequent mutations.
Several hypotheses regarding a possible infectious etiology of CL have been proposed. Kinlen's [6,7] population mixing hypothesis suggests that childhood leukemia is an aberrant response to some yet unidentified infectious agent and that this infection might explain localized increases in incidence following rapid population inflows in rural areas. Smith [8] proposed that the characteristic childhood peak of acute lymphoblastic leukemia (ALL)-observed in more affluent societies at ages 2-6 years for precursor B-cell ALL-might be the result of an infection that occurred in utero. The delayed infection hypothesis proposed by Greaves [9] posits a two-hit model: a first hit occurring in utero induces genetic alterations in a precursor B-cell, while the second hit precipitating the outbreak of overt ALL is consequent to an aberrant immune response to delayed infections following a lack of exposure to common infections in early life.
A number of studies have reported space-time clustering in the incidence of CL providing indirect support for an infectious origin [10][11][12][13]. We recently reported evidence of space-time clustering of CL in Switzerland around the time of birth, noting a considerable excess of pairs of cases born <1 km and <2 years apart over the number that would be expected by chance [14]. Unlike most previous studies, our analysis relied on precise geocoded locations of residence of cases, and adjusted for regional population shifts, which can lead to spurious findings of space-time clustering [15].
If space-time clustering of CL is real, the characteristics of cases occurring close to each other compared to more dispersed cases might provide hints about the causes of the clustering and of the disease itself. Morris [16] thus analyzed cases of CL occuring in close proximity in space and time and found weak evidence that a history of measles 2-3 years prior to the onset of disease was more common than among their matched controls. Williams et al. [17] compared clustered and nonclustered cases of leukemia and Hodgkin's lymphoma in the context of spatial clustering. They observed a greater degree of concordance of common ALL (CD10+, B-cell precursor ALL) among clustered cases than would be expected by chance. To our knowledge, no other study has compared clustered and nonclustered cases of (childhood) leukemia to elicit clues about etiology nor analyzed the propensity of incident cases to cluster depending on cytogenetic subtype. This might partly be due to the difficulty of distinguishing true clusters, i.e. cases occurring close to each other due to a shared etiological factor, from cases occurring close to each other purely by chance. However, a comparison of cases occurring in close proximity to each other-these should include cases from true etiological clusters if such exist-with other, more dispersed cases should still reveal characteristics of true clusters, even though the observed differences with nonclustered cases will be smaller.
The aim of this study was to characterize clustered cases of CL in Switzerland in an effort to generate new hypotheses about the origins of clustering. We defined clustering by proximity in space and time alone, following the approach adopted by Morris [16]. Thus, based on our earlier findings of space-time clustering around the time of birth, a case was defined as clustered if they were born in close spatial and temporal proximity to another case. We investigated whether clustered cases differed from nonclustered cases in terms of diagnostic, demographic, or socioeconomic characteristics or environmental exposures. We also investigated the size of CL clusters to determine whether the clustering resulted from many small aggregations of only two or three children or rather from a few larger clusters.

Materials and Methods
Ethics approval was granted through the Ethics Committee of the Canton of Bern to the SCCR.
A detailed description of Materials and Methods is given in the S1 Appendix.

Population
The study population included all children recorded in the Swiss Childhood Cancer Registry (SCCR) who were born in Switzerland and diagnosed with leukemia at age 0-15 years between 1 January 1985 and 31 December 2014-a sample 22% larger than that of our previous analysis. The SCCR is a population-based registry of all childhood cancers diagnosed in Switzerland with an estimated coverage of 91% during the study period and of about 95% since 1995 [18]. Geocoded residential addresses of cases at the time of birth were obtained from the SCCR. We inspected all cases living <50 m apart for possible sibling relationships and retained only one case from any identified sibling pair. Data on clinical characteristics were obtained from the SCCR in a similar manner. Cytogenetic data became available in 1994 when the earliest karyotype tests were carried out; since 1999 both karyotype and fluorescence in situ hybridization (FISH) analyses have been performed routinely. We also linked SCCR cases with corresponding records in the Swiss National Cohort (SNC) and birth records extracted from the Vital Statistics of the Swiss Federal Office of Statistics using probabilistic record linkage. The SNC is a research platform linking the national censuses with national datasets on birth, mortality, and migration [19]. This allowed us to obtain data on perinatal and socioeconomic characteristics of cases not available directly from the SCCR.

Characteristics compared between clustered and nonclustered cases
We investigated potential differences in prevalence of the following characteristics between clustered and nonclustered cases: Clinical characteristics included age at diagnosis and the leukemia subtypes acute lymphoid leukemia (ALL) and acute myeloid leukemia (AML). We also distinguished T-precursor ALL and B-precursor ALL separately, and, among the latter, the cytogenetic subtypes ETV6-RUNX1, Philadelphia chromosome, trisomies 4, 10, 17, and high hyperdiploidy (>51-65 chromosomes).
Socioeconomic characteristics included education level of the household head (compulsory only, upper secondary, tertiary) and household crowding (number of persons per room in tertiles: 0.82, 0.83-1.16, !1.17) as recorded in the earliest census (1990 or 2000) to which a child could be linked, degree of urbanization of the municipality of residence at birth (urban vs. rural), and neighborhood-based socioeconomic position (neighborhood index of socioeconomic position, Swiss-SEP, in tertiles low, medium, high) [20].

Statistical analysis
In a first step, we assessed space-time clustering of cases of CL for place and time of birth using the Knox test [26], following the same procedure as in our previous analysis [14]. This test counts the number of pairs of cases that lie in close spatial and temporal proximity of each other, i.e. closer than a prespecified spatial and temporal lag, and assesses whether it exceeds the number that is expected by chance. We computed Knox tests for a range of spatial and temporal lags and selected the combination showing the strongest evidence of space-time clustering as critical lags. Tests were carried out using a Monte Carlo procedure that accounts for uneven shifts of the residential population [15]. We used Baker's max method to calculate a pvalue adjusted for multiple testing over different combinations of spatial and temporal lags [27]. We classified a case of CL as clustered if the child was born within the critical lags of at least one other case or otherwise as nonclustered.
In a second step, we ran logistic regressions with clustering status as dependent variable to assess associations with the characteristics listed above. Associations were tested using likelihood ratio tests. Since we applied the critical lags uniformly across the study area, in a densely populated area and in the absence of space-time clustering a case is more likely to be close to another case by chance alone. A characteristic might thus be associated with clustered cases not because it affects the propensity of clustering but because it correlates with population density. In separate regression models, we therefore adjusted for an index of child population density. This index represents the log odds of a case being clustered rather than nonclustered by chance alone, i.e. of another case occurring nearby in time and space. Moreover, for characteristics that are themselves geographically determined (e.g., proximity to a pollution source) inference from logistic regression models is not valid because a dependency between cases is induced [17]. For these variables we therefore calculated p-values through Monte-Carlo simulation. Finally, we used Holm's [28] procedure to adjust the p-values for multiple testing of different characteristics.
In the last step, we used graph methods [29] to link all clustered cases of CL into individual local clusters. A cluster is thus defined as a group of cases in which each case lies within the critical spatial and temporal lag of at least one other case. After linking all close cases of CL we tabulated the clusters thus identified by the number of children they comprise. Then, for any given cluster size we calculated the probability that the number of clusters with this many children or more in the empirical sample would occur by chance, i.e. in the absence of a tendency of cases to cluster, using Monte Carlo simulations.
A detailed description of Materials and Methods is given in the S1 Appendix.

Study population
We identified 1,299 eligible cases of CL in the SCCR. After excluding 12 cases because of missing geocodes and 5 cases due to a sibling relationship with another case, 1,282 cases were included in the analysis (Fig 1). Diagnostic information (age at diagnosis, leukemia subtype) and characteristics related to place of residence (urbanization, neighborhood SEP, environmental exposures) were available for all 1,282 cases. Information on perinatal and socioeconomic characteristics as well as the cytogenetic subtype (collected only from 1994 onward) was available only for subsets of cases due to incomplete data and linkages (Fig 1).

Space-time clustering
Knox tests showed the strongest evidence of space-time clustering for a spatial lag of <1 km and a temporal lag of <2 years (

Comparison of clustered and nonclustered cases
Results of the analyses of the association between clinical and perinatal characteristics of cases of CL and their propensity to cluster in space-time are presented in Table 2. The only characteristic that showed an association with clustering status independent of child population density was the ETV6-RUNX1 subtype. Among the 569 cases that had been cytogenetically tested for this subtype, 37 out of 93 (40%) clustered cases were carriers of the translocation, compared to only 118 out of 476 (25%) nonclustered cases (unadjusted odds ratio [OR] 2.00, 95% CI 1. 26-3.19). This association became stronger in the regression model adjusting for child population density (OR 2.54, 95% CI 1.52-4.23) and remained statistically significant after correcting for multiple testing using Holm's correction (P = 0.0027) [28]. By contrast, none of the other cytogenetic subtypes showed any clear evidence of an association with clustering status nor did any of the clinical and perinatal attributes (Table 2). Likewise, adjusting for child population density there was no evidence of an association for any of the investigated socioeconomic and environmental characteristics (S1 Table). The apparent association of clustered cases with degree of urbanization was expected due to higher child population density in urban areas and disappeared after adjusting for the latter. Similarly, the apparent associations with foreign nationality-significant in unadjusted analyses after correcting for multiple testing (P = 0.019)-and living 250 m from a petrol station were greatly attenuated or reversed after adjusting for child population density.

Analyses by cluster size
Among the 242 clustered cases of CL we identified a total of 104 individual clusters. The majority of them were small; 82 clusters consisted of just two cases, 16 clusters consisted of three cases, there were four clusters with four cases, and one cluster each with five and nine cases of CL. Table 3 reports the mean, minimum, and maximum number of clusters that occurred in the 999 Monte Carlo data sets under the null hypothesis of no clustering. These data sets on average contained one cluster with five and 0.08 clusters with nine children. The p-values indicate the probability that the number of clusters of a given size or larger observed in the empirical data could have occurred simply by chance. The p-value was smallest for a Obs., number of observed close pairs; Exp., number of expected close pairs. P-values calculated using a Monte Carlo procedure adjusting for regional population shifts [15]. cluster size of two (P = 0.013) indicating an excess of small clusters in the observed data compared to simulated data. By contrast, the observed number of leukemia clusters of size !4 and !9 was not statistically significant (P = 0.38 and P = 0.18, respectively), meaning that it was not uncommon for clusters of this size or larger to occur in the Monte Carlo data sets. Fig 2 visualizes this observation, highlighting the frequency of clusters of a given size in the empirical data (red diamonds) compared to the Monte Carlo data sets (box plots). The 82 clusters with two children observed in the empirical data lie near the upper end of the range of the corresponding values in the simulated data sets (45-94 clusters). By contrast, for larger cluster sizes of four or more children, the number observed in the empirical data is near the mean number in the simulated data. Columns two and three indicate the prevalence of each case characteristic among clustered and nonclustered cases in absolute numbers and as percentages. Results of the logistic regressions unadjusted and adjusting for child population density are presented in column four and five, respectively. * Cases born within 1 km and 2 years from another case. a For a given case, the child density index reflects the probability of another case occurring within 1 km and 2 years by chance alone (See the S1 Appendix for more details). doi:10.1371/journal.pone.0170020.t002

Summary of results
This nationwide study found that the ETV6-RUNX1 gene translocation was more common among CL cases who were born in close spatial and temporal proximity of each other (defined here as clustered) than among more isolated cases. The association between the ETV6-RUNX1 translocation and space-time clustering thus defined became stronger in the analysis adjusting for child population density and remained significant after correcting for multiple testing. The current study also showed that the space-time clustering of CL cases at birth found in an earlier study [14] with overlapping data was primarily due to the numerous small clusters consisting of pairs and triplets of cases rather than to a few larger clusters.

Our study in the context of the literature
Our results are in accord with previous studies, a majority of which found evidence of clustering of cases of CL around the place and time of birth [11,30], diagnosis [10,12,[31][32][33][34][35], or both [13,16] (studies up to 2004 were reviewed by McNally and Eden [36] and Little [37]). In these studies, the space-time clustering typically was most pronounced for spatial lags of a few kilometers and temporal lags of several months. Despite this accumulating evidence of the clustering of childhood leukemia, to our knowledge only two previous studies, both from England, have investigated differences between clustered and nonclustered cases to elicit clues about etiology. Looking at 228 cases occurring in the Midlands between 1953 and 1960, Morris observed no significant association between four prenatal events (toxemia, anemia, and X rays of the chest or abdomen) and the tendency of cases to cluster in space and time at birth. However, analyzing the history of six common infectious diseases (measles, pertussis, chicken pox, rubella, mumps, and scarlet fever), Morris  observed weak evidence that reports of measles 2-3 years prior to diagnosis were more common among the 18 clustered cases of CL compared to their matched controls, whereas no such difference existed between the 176 nonclustered cases, for which data on infectious histories were available, and controls [16]. In this study, being clustered was defined by proximity in space and time alone as in our study. In a more recent study, Williams et al. found no evidence of association between clustering status of ALL cases and age at diagnosis, immunophenotype, rural vs. urban residence, isolation of area of residence, distance of residence to built-up area, or area-level socioeconomic status [17]. They further investigated whether there was a higher degree of concordance in age at diagnosis, season of diagnosis, and leukemia subtype among cases of the same cluster than would be expected by chance. Evidence of higher concordance was found for common ALL and for a diagnosis in summer. However, Williams et al. only considered spatial clustering and, overall, there was no evidence of a tendency of cases to cluster. The samples of both studies were considerably smaller in size than our sample and they did not have information on cytogenetic subtypes. The t(12;21)(p13;q22) translocation resulting in the ETV6-RUNX1 fusion gene is the most frequent genetic rearrangement in ALL [38]. In our study, 27.2% of cases with cytogenetic information were carriers. There is convincing evidence from studies of concordant twins showing that this translocation arises in utero [39,40]. It is thought that the translocation is not uncommon in the population at large, yet that further mutations are required for the development of overt leukemia [41]. An early study reported a prevalence of 1% of ETV6-RUNX1 positive cord blood samples from healthy newborns [42]. More recent and larger studies suggested that the prevalence is markedly lower [38,43]. However, the validity of these findings has been challenged [44].

Limitations
While the overall sample size in our study was considerable, data on some characteristics were only available for a subset of cases of CL. As a consequence, statistical power was low for some comparisons, particularly for rare gene translocations (such as Philadelphia). Also, we could not perform a separate test of whether space-time clustering is associated with the common ALL subtype (CD10+, B-cell precursor ALL) because the data records in the SCCR do not strictly adhere to a common procedure to differentiate the immunophenotypes of B-cell lineage ALLs. Some socioeconomic and perinatal characteristics could only be obtained through probabilistic record linkage with other databases and may thus include some misclassifications. Furthermore, we classified cases as clustered or nonclustered based solely on proximity using the threshold values that maximized space-time clustering across the entire study area. This definition is likely to have included many pairs of cases occurring close to each other by chance, particularly in more densely populated areas. We therefore adjusted our analyses for child population density by using a measure based on the probability that another case would occur within the critical distance and time lags of a given case in the absence of clustering. Hence such chance clustering should have been accounted for.

Strengths
The main strength of the study was the availability of precise geocoded residential locations of all cases of CL at birth and for the entire child population in census years. This made it possible to perform space-time clustering analyses at a small spatial scale and to adjust for uneven regional population growth, which may produce spurious evidence of clustering if not controlled for [15]. Exact geocodes of the homes of cases and the child population also allowed us to compute an index of child population density that reflected the probability of another case occurring next to a case simply by chance. Linking the SCCR data with other nationwide routine data sets, we were able to compare clustered and nonclustered cases with regard to a wide array of characteristics that have been hypothesized to play a role in the etiology of childhood leukemia.

Interpretation of findings
The space-time clustering around the place and time of birth observed in the current study was consistent with the results of our own previous analysis with partly overlapping data [14]. In that paper, which included cases born between 1985 and 2010, we concluded that the scale and timing of clustering might be indicative of an infection occurring in utero or shortly after birth. In our previous analysis, stratification of the study sample by age at diagnosis (0-4, 5-15 years) reduced the observed clustering effect, meaning that clustered cases were born close to each other in space and time but diagnosed at widely differing ages [14]. We therefore argued that infections could be involved in early disease initiating events but that the outbreak of overt leukemia required further genetic alterations or environmental stimuli as hypothesized by Greaves [9].
Our current finding that the ETV6-RUNX1 translocation is associated with space-time clustering of CL cases around birth is consistent with the translocation occuring in utero [42]. Excesses of cases of CL occurring in spatial and temporal proximity were observed only for small cluster sizes of two or three children and were not driven by a few large clusters. This space-time pattern is consistent with the spreading of common infections that can involve highly localized mini-epidemics [45]. If an infection is indeed causing the clustering, one might speculate about the role of the ETV6-RUNX1 fusion gene. We can think of two potential explanations. First, the infection could cause this translocation. An etiologic role of in utero infections has previously been hypothesized for childhood ALL [8]. However, evidence for a direct transforming agent from studies screening for viral sequences in leukemia samples or in neonatal blood spots is still lacking [46]. A second explanation is that, in the presence of the putative infection, the pre-B cells bearing the ETV6-RUNX1 fusion become more susceptible to additional genetic changes leading to uncontrolled replication [39]. Such a process is in line with Greaves' hypothesis and could explain the tendency of this subtype of ALL to cluster in space and time [9]. This second explanation is more plausible as it is also compatible with different additional genetic changes discernible in ETV6-RUNX1 positive ALLs, even in monozygotic twins [40], and offers a wider timespan for the outbreak of overt leukemia.
The two-year time lag maximizing evidence of space-time clustering appears relatively long considering that a local epidemic might be rather short-lived. However, such a lag is compatible with a brief exposure to an infectious agent paired with an extended age window of susceptibility. The relevant etiological event driving an indvidual space-time cluster might occur concurrently in calendar time-up to several years after birth-but at a different age in each child. If few children move between birth and diagnosis, this will show up as space-time clustering around birth. In our study sample, 34% of children had changed place of residence between birth and diagnosis; however, among children diagnosed before the age of five, only 25% had relocated.
While the space-time pattern of incident cases of CL found in this study thus favors a causal role of an infectious agent, we cannot exclude other environmental factors that might explain the clustering. In fact, the observed lags of <1 km and <2 years maximizing the space-time clustering effect are also compatible with exposure to environmental pollution from local sources with time-varying emission levels. The excesses observed for small cluster sizes would point to numerous small pollutant sources rather than to few large ones. Finally, we cannot rule out that the observed association between the ETV6-RUNX1 translocation and the clustering of cases of CL is due to chance. We investigated numerous factors, none of which had strong prior evidence of being associated with the etiology of CL or with CL clustering. Although we adjusted for multiple testing, this is an exploratory study and validation in independent samples is necessary.

Conclusions
Our study suggests that the ETV6-RUNX1 translocation is associated with space-time clustering of childhood leukemia around birth. If our findings are confirmed by other studies, future research should investigate a possible link between the ETV6-RUNX1 gene fusion and infections, particularly how infections might induce further genetic alterations in ETV6-RUNX1 positive pre-leukemic clones prompting the outbreak of clinical leukemia.
Supporting Information S1 Table. Socioeconomic and environmental characteristics. Comparison of prevalence of attributes between clustered and nonclustered cases of CL both unadjusted and adjusted for local child population density. (DOCX) S1 Appendix. Detailed description of Materials and Methods. (DOCX)