Environmental epidemiology of Kawasaki disease: Linking disease etiology, pathogenesis and global distribution

Background The pathogenesis of Kawasaki disease (KD) is commonly ascribed to an exaggerated immunologic response to an unidentified environmental or infectious trigger in susceptible children. A comprehensive framework linking epidemiological data and global distribution of KD has not yet been proposed. Methods and findings Patients with KD (n = 81) were enrolled within 6 weeks of diagnosis along with control subjects (n = 87). All completed an extensive epidemiological questionnaire. Geographic localization software characterized the subjects’ neighborhood. KD incidence was compared to atmospheric biological particles counts and winds patterns. These data were used to create a comprehensive risk framework for KD, which we tested against published data on the global distribution. Compared to controls, patients with KD were more likely to be of Asian ancestry and were more likely to live in an environment with low exposure to environmental allergens. Higher atmospheric counts of biological particles other than fungus/spores were associated with a temporal reduction in incidence of KD. Finally, westerly winds were associated with increased fungal particles in the atmosphere and increased incidence of KD over the Greater Toronto Area. Our proposed framework was able to explain approximately 80% of the variation in the global distribution of KD. The main limitations of the study are that the majority of data used in this study are limited to the Canadian context and our proposed disease framework is theoretical and circumstantial rather than the result of a single simulation. Conclusions Our proposed etiologic framework incorporates the 1) proportion of population that are genetically susceptible; 2) modulation of risk, determined by habitual exposure to environmental allergens, seasonal variations of atmospheric biological particles and contact with infectious diseases; and 3) exposure to the putative trigger. Future modelling of individual risk and global distribution will be strengthened by taking into consideration all of these non-traditional elements.


Introduction
Kawasaki disease (KD) is a complex, limited, systemic vasculitic syndrome of unknown etiology that mainly occurs in infants and toddlers, and potentially causes severe coronary artery aneurysms [1]. Since its initial identification by Tomisaku Kawasaki in 1970, epidemiological surveys have been periodically conducted in many countries to monitor the worldwide distribution of KD and to attempt to elucidate its cause [2][3][4]. The high incidence of KD in Asian populations globally is strong evidence of a genetic contribution to disease susceptibility. Significant familial associations have been found in KD with a higher incidence rate being reported in children with family members who have previously had an episode of KD [5][6][7]. Genome wide linkage studies have identified multiple regions that are associated with an increased risk of KD, and genetic polymorphisms for increased risk of KD, further, increased risk of coronary artery abnormalities resulting from KD have been identified [8][9][10][11]. The current consensus is that KD is likely the result of an exaggerated immune response to an environmental or infectious trigger occurring in genetically susceptible children.
The prominent seasonal and temporal/spatial distribution of KD cases has led to many theories suggesting an infectious trigger to the disease [12]. Despite an important body of research, no substantial, reproducible study has been to able to identify a specific infectious trigger(s), even though an important proportion of KD cases are associated with an infectious co-diagnosis at presentation [13]. Previous studies performed in the 1980's and 1990's have uncovered potential associations between environmental factors and risk of KD [14][15][16] but there is a critical lack of contemporary, high quality, environmental epidemiology studies on KD. More recently, studying tropospheric wind patterns and using a flexible particle dispersion model (Flexpart) for backtracking winds over Japan, Rodo et al. identified northeastern China as a potential source region for a trigger, and suggested that, due to the short incubation time, a toxic or microbial antigen agent is most likely [17]. Such an agent might possibly be carried over the Pacific Ocean, and a zonal wind pattern was found to be associated with increased KD [18]. The study of the environmental epidemiology of KD might discover additional clues, either to identify the trigger for KD or to elucidate modulatory mechanisms that may increase or decrease the risk of developing KD when encountering said trigger. In this study, we sought to determine the environmental factors associated with both individual risk of KD, local epidemiological patterns, and to integrate these into a comprehensive model to explain the global distribution.

Environmental epidemiology study
The first component of this study comprised a case-control observational study enrolling patients with a new diagnosis of KD within 6 weeks of hospital discharge after their acute admission. Patients were approached either during a follow-up clinic visit or by mail. Controls subjects were recruited who had an unremarkable recent medical history and who were not taking regular medication. These subjects were recruited using a variety of means (in person; e-mail invitation), but the majority of controls were recruited through the TARGet Kids! Collaborative [19], which enrolls and follows healthy children from birth to 10 year old at the time of their regularly scheduled well-child visits. Enrollment of control subjects was stratified by age and gender to reflect the age and gender distribution of patients with KD. All parents were asked to complete a detailed paper-based questionnaire regarding all aspects of their child's medical history, immediate living environment and important events, clinical signs and symptoms in the weeks prior to the onset of KD symptoms. Details of the questionnaire are provided in Table 1. Many of the questions regarding events or conditions were dependent on close association with the time since the onset of KD symptoms. Therefore, we limited enrollment to 6 weeks after discharge from the acute admission in order to reduce important recall bias. For control subjects, we had to simulate the timing of events to cover a similar exposure time and delay to that of patients with KD. The questionnaire was modified for control subjects. Instead of a date of symptom onset, subjects were provided with a randomly generated number between 2-6 weeks, and asked to select a specific event in their lives that occurred within this assigned number of weeks. Subjects were not asked to disclose what this event was, but were asked to use the event as a placeholder for "symptoms onset" to answer time-related questions. Once completed, all questionnaires were entered in a REDCap electronic capture tool hosted at The Hospital for Sick Children [20]. Upon receipt, an investigator (SO) reviewed all questionnaires to identify missing or incorrect information, and worked with the subjects to make corrections if needed and ensure that control subjects had properly understood the scenario necessary to establish the timing of "symptoms onset". Subjects were asked to provide their postal code and a geolocation software (Google Earth, Googleplex, Mountain View, California, USA) was used to map the immediate environment (~1 km) around the subject's residence. Aerial pictures were taken and the distance to the specific landmarks (lakes/rivers, parks/wooded area and highways), traffic density at the closest major intersections and tree density in a 500m and 1000m radius around the residence were collected. Tree density was graded qualitatively on a scale from 1 to 5 (1 = low density, 5 = high density) independently and in a blinded manner by 3 separate investigators (SO, ML, BM) and the average of the 3 ratings was used.
The study was approved by the Research Ethics Board at the Hospital for Sick Children. All subjects provided written consent for participation in the study; a separate consent was obtained specifically for the geolocation aspect of the study. Given the depth of the questionnaire and the sensitive nature of some of the questions, subjects were allowed to decline to answer any question they did not wish to answer by crossing out the relevant questions; although no subjects availed themselves of this option.
Data from this part of the study are described as means with standard deviations, median with 25 th and 75 th percentiles and frequencies as appropriate. Comparisons between patients and controls were performed using Fisher's exact test and Student's t-test assuming unequal variance between samples. All statistical analyses were performed using SAS v9.4 (SAS Institute, Cary NC).

Canadian epidemiology study
Data from all pediatric (0-18 years old) hospital admissions with a primary or secondary diagnosis of KD (ICD-10-CA standard: M30.3-Mucocutaneous lymph node syndrome [Kawasaki disease]) from March 2004 to March 2012 were obtained for Canada from the Discharge Abstract Database (DAD) maintained by the Canadian Institute for Health Information (CIHI) (except Quebec which does not report data to CIHI). Multiple admissions for a given patient were tracked by the universal provincial health number (unique to each patient), which was provided in an encrypted format. Given that transfers of patients between hospitals during the acute phase of the disease are common, and that patients with KD complicated by coronary artery aneurysms often have multiple admissions, a previously validated algorithm was used to specifically identify acute admissions and exclude all readmissions and transfers between hospitals [21]. The date of hospital admission was used as a surrogate for the date of diagnosis.

Biological and atmospheric data
Atmospheric biology data consisting of daily measurements of major biological atmospheric particles was provided by the Aerobiology Research Laboratories (Ottawa, Canada). Particle counts from February-October, 2011, for the Greater Toronto Area (GTA) were obtained for analysis.

Cross correlations
To compare the number of new KD cases with atmospheric particle counts, the two time series were both smoothed with a moving average with a 5-day window. A cross-correlation was then performed to evaluate the strength and direction of the time-lagged relationship between KD cases and atmospheric particle counts with lags between 0 to 25 days.

Wind composite analysis
Monthly means of daily means of 10 meter zonal (u) and meridional (v) surface wind components were obtained from the European Centre for Medium Range Weather Forecasts (ECMWF) ERA-Interim reanalysis dataset [22]. A composite analysis was performed to determine tropospheric wind conditions associated with occurrences of KD. New KD cases were averaged over Canada, aggregated per month, smoothed with a 3-month-running mean filter and standardized. The number of new KD cases was considered high if it was larger and low if it was lower than one standard deviation above or below the multi-year monthly average number of cases, respectively. The resulting composite consists of 15 months of wind patterns for a high and 14 months for a low number of KD cases for that month. Through the accumulation and averaging of wind fields over several months with anomalous high and low KD cases in a given month, background variability in the climate system (noise) could be reduced and the signal made more easily identifiable.
Association between winds directionality and biological atmospheric particle measurements. Wind data were averaged over longitude 41.25˚N to 45.75˚N and latitude 279˚E to 284.25˚E to obtain daily values around the GTA. The wind component was then standardized, but not centered to keep the original definition for the direction (positive westward or westerly, negative eastward or easterly). Biological atmospheric particle measurements for the GTA were compared to wind directions over the GTA. Atmospheric particle measurements were averaged over days with westerly (n = 252) and days with easterly (n = 113) winds over Toronto.
Model for global distribution of KD. We proposed a framework for the individual-level risk of developing KD that would also help explain the global distribution of KD. We used incidence estimates from as many countries as possible from published reviews [2][3][4] and from the abstracts presented at the last two International Kawasaki Disease Symposium (Kyoto, Japan, February 7-10, 2012 and Honolulu, United-States, February 3-6, 2015). Demographics (population, urbanization, gross domestic product, ethnic composition) were obtained from the United Nations Department of Economic and Social Affairs, and geographic information (annual precipitations, latitude, longitude) from the National Oceanic and Atmospheric Administration (NOAA) and made available through Weatherbase [22]. Linear regression with backward selection of variables was used to create a model for the global distribution of KD. Incidence of KD was transformed using squared root to account for the heavily skewed distribution of KD incidence as a result of the very high incidences in Japan and Korea. Square root transformation was chosen over other modelling methods because it maintained the unique characteristics of Japan and Korea but prevented model overfitting.

Environmental epidemiology: Patient characteristics and potential genetic risk factors (SickKids cohort)
There were 297 newly diagnosed KD patients during the study period. A total of 168 subjects were enrolled in this study (81 patients with KD (27% participation rate), 87 controls). Patients and controls were similar in terms of age (median 4.1 (2.1-6.3) years for KD vs. 3.3 (1.6-7.1) years for controls, p = 0.64) and gender (45 males (56%) for KD vs. 39 (45%) for controls, p = 0.16). As shown in Table 2, patients with KD were more likely to be of Asian origin and generally less likely to be of European ancestry. Patients with KD were found to have an older gestational age at birth (39.2±2.1 vs. 38

Environmental epidemiology: Potential modulatory factors (SickKids cohort)
Patients with KD were living in larger households than controls (4.4±1.2 vs. 3.8±0.9 occupants, p = 0.001). Patients with KD and controls were found to be nearly identical regarding contact with other children, daily activities and food. Patients with KD were generally less exposed to environmental allergens in many aspects of their daily lives (Table 3). They were more likely to primarily drink filtered or bottle water, were exposed less to household pets, were more likely to live in a dwelling constructed in the last 10 years and were less likely to live in an area with dense tree coverage, near a park, body of water or a farm. There was some statistically significant associations between the different exposures reported in Table 3. When considering all factors in the same multivariable regression model (except for tree coverage at 500 and 1000m, which were collinear to self-reported tree coverage), all exposures other than living close to a body of water remained close to or statistically significant indicating independence from each other. Patients with KD were more likely to live in dwellings that had undergone deep carpet cleaning in the month prior to KD symptoms onset (6/81 (7%) vs. 0/83 (0%), p = 0.01). There was no evidence of association between any of the chemical exposures that were collected and the odds of KD. There were some associations between exposures to environmental allergens and patients ethnicity/origins; the associations reported here remained substantially unchanged when accounting for differences in ethnicity/origins.

Environmental epidemiology: Infectious disease history, exposure and vaccination (SickKids cohort)
Compared to controls (data available on 80/87 (92%) controls and for all patients with KD), patients with KD were more likely to report being ill and/or affected by various and non-specific clinical signs and symptoms up to 31 days prior to their diagnosis of KD (Fig 1). However, patients with KD were not found to be at higher odds of reporting diarrhea, vomiting, earaches/otitis, jaundice, irritability or rash in this period. They were more likely to report any illness and/or feeling unwell than controls in the month prior to diagnosis of KD (1-7 days: 65% in KD vs. 29% in controls, p<0.001, 8-31 days: 64% in KD vs. 33% in controls, p<0.001).
Patients with KD were more likely to report an infection confirmed by a medical practitioner in the month prior to KD symptoms onset (33% vs. 15%, p = 0.01). No pattern could be identified regarding the type of infection or timing of the diagnosis (19%, 11%, 26%, 7% at 4 weeks, 3 weeks, 2 weeks and 1 week before onset of KD symptoms respectively and the remaining 37% diagnosed after the onset of KD symptoms). Patients with KD were not found to be at increased odds of having received a recent vaccination (i.e. within 1 month) (11/78 (14%) vs.

Exposures of potential modulatory and triggering agents (Greater Toronto Area data)
We analyzed the number of KD cases from the CIHI database and total biological atmospheric particle counts over the GTA for the year 2011. The total number of cases in the GTA during that year totalled 208, and ranged between zero and three cases per day. We calculated the cross-correlation (the degree to which 2 time series correlate with each other) between biological atmospheric particle counts and number of cases of KD for all possible lag-times between 0 and 25 days (the maximum lag time proposed for KD). Atmospheric particles from pollen, trees (all trees and deciduous tree separately), weeds, grasses and myxomycetes showed a similar qualitative pattern, and a maximum negative cross correlation at lag-time of 14 to 21 days (Fig 2). The variation (difference minimum to peak correlation) in cross-correlation during the 0 to 25 day period was statistically significant for all categories other than grasses. Conversely, particles from fungi (ascomycetes, basidiomycetes, fungi imperfecti) and spores showed the exact opposite pattern with the lowest cross correlations at lag-times of 14 to 21 days (Fig 3). In this case, only the difference between lowest and peak cross-correlation for ascomycetes was statistically significant, but the overall patterns were qualitatively similar.
A time series analysis of the incidence of KD cases across Canada is shown in Fig 4A. Anomalously high numbers of new KD cases seemed to cluster during the winter months (November to April), while lower numbers were found during warmer months (May to October). This seasonal pattern was strongly and negatively associated with the level of biological atmospheric particles in Canada (Figs 2 and 3).

Potential source for triggering agent (Canada-wide data)
In order to assess whether it might be possible that periods of high KD case numbers in Canada were related to an environmental trigger from China and Japan as suggested by Rodo et al. [18] for San Diego, we investigated the dynamics of KD and possible associated wind patterns over the Pacific Ocean. During months with high numbers of KD cases in Canada, westerly winds over the Pacific Ocean were stronger, hence, might have been able to transport a potential trigger for KD from Asia to North America and Canada (Fig 4B-4D). Note that these wind patterns are typical for winter and summer months, with the highest wind speeds observed during winter months in the northern mid-latitudes [23]. The winds over Canada are westerly during both high and low KD seasons (Fig 4B and 4D).
In consideration of the Rodo et al. hypothesis, we also explored the association between wind direction and the count of fungi imperfecti (the category that would include fungal toxins and Candida fungi) in the atmosphere [17]. This analysis showed that in the GTA, westerly winds were associated with an increase in the atmospheric measurement of fungi imperfecti (average of 920.3 P/m 3 during westerlies vs. 476.7 P/m 3 during easterlies, p = 0.002). The magnitude of the increase in fungi imperfecti associated with westerly winds was the highest of all types of atmospheric biological particles investigated in this study.
We used regression analysis to determine whether the various non-environmental and environmental factors were associated with the global distribution of KD. For each of the 41

Fig 2. Time series (5-day moving averages) and cross-correlation of new cases of Kawasaki disease (KD) vs. count of
countries with available estimates of incidence, we analyzed several elements. To model genetic predisposition we included the proportion of the population from Japanese, Korean or Chinese ancestry. The proportion of country urbanization and gross national product were used as surrogates for environmental allergens exposure (increase in either would be associated with lower exposure). Finally, western and southern distance from 100.00˚E/60.00˚N

Fig 3. Time series (5-day moving averages) and cross-correlation of new cases of Kawasaki disease (KD) vs. count of biological atmospheric particles for lagtimes of 0 to 25 days for spores and fungi (ascomycetes, basidiomycetes and fungi imperfecti separately).
Upward triangles indicate maximum and downward triangles minimum cross-correlations. The difference between lowest and peak correlation was statistically significant only for ascomycetes but the overall patterns was conserved for all types.  Table 4; in its current form, the model was found to explain 84% of the variation in the global incidence of KD.  Interaction of longitude and latitude distance from 100.00˚E/60.00˚N 0.039 (0.015) 0.01

Discussion
The search to explain the etiology and pathogenesis of KD continues, despite more than 40 years of investigation. Many associations between the incidence of KD and genetic, environmental or infectious factors have previously been described. However, most associations have only been shown at the local level and have failed to integrate into a framework that might explain the global epidemiology of the disease. Through the various aspects of this study, we investigated the environmental epidemiology of KD and have proposed a new framework that attempts to explain the global distribution of KD. Our proposed framework (Fig 5) includes three separate elements: 1) the proportion of the population that have the highest genetic susceptibility to KD, 2) modulation of risk, determined by habitual exposure to environmental allergens, seasonal presence/absence of atmospheric biological particles and presence of an infectious diseases process, and 3) exposure to the still unidentified disease trigger. Importantly, our data and proposed framework support the novel paradigm proposed by Rodo et al. [24], whereby the trigger for KD is not an infectious agent but another environmental trigger, potentially a fungal toxin. The different elements of this framework are supported by the data presented herein, but will also be contextualized in the current knowledge base available on the topic of KD.
A genetic susceptibility to KD has long been established, as evidenced by multiple genetic studies [8][9][10][11]; the significantly higher incidence in people of East Asian ancestry, and potentially of African ancestry, as seen in this study and others, regardless of their country of residence [3,4,25]; the presence of multi-generational familial clustering [26]; and the elevated risk of recurrence versus risk of first episode in a KD-naïve child [5][6][7]. In our proposed framework, we included the concept of modulation, which encompasses many of the environmental or infectious factors that have previously been hypothesized to be the potential trigger of KD. In this proposed framework, environmental factors would influence, over time, the susceptibility of children to develop KD when encountering the disease trigger, but these factors would not directly act as the trigger. The mechanism for modulation could be either an immunological process, such as maturation, an alteration in the microbiome or both [24]. Based on this study, we have identified 4 potential modulating factors: early development of the immune system, habitual exposure to environmental allergens, temporal exposure to biological atmospheric particles, and temporal experience of a stimulatory or co-stimulatory event, potentially of infectious origin.
Late or post term birth [27] and not being breastfed as an infant [28] are known to be associated with multiple immunological diseases, including allergies, which are themselves associated with an increased risk of KD [29][30][31]. The association between late or post term birth and increased risk of KD was also observed in the present study. Based on our results, habitual exposure to environmental allergens might play a role in the modulation of risk of KD. Patients with KD were found to live in environments generally less exposed to environmental allergens. This effect was limited to allergens of biological origin as opposed to pollutiondriven chemicals; both in this study and in previous ones, these have been shown to not be associated with risk of KD [32,33]. This pattern of decreased exposure to the biological environment in childhood and increased risk of diseases of the immune system is consistent with the hygiene hypothesis. This has been linked to many diseases including, asthma and allergies, the incidence of which have significantly increased since the 1960's, as has the incidence of KD. Further supporting this theory is the observation from recent epidemiological data that the incidence of KD, outside of Japan and Korea, has been more or less stable in developed countries but that it is increasing in countries that are rapidly industrializing [3]. A recent review of the epidemiological information available regarding KD has also noted the consistency between epidemiological risk factors for KD and the hygiene hypothesis [34].
The third element of our modulation framework is the temporal association between exposure to biological atmospheric particles and risk of KD. In this study, we have found that exposures to pollen temporally reduced the risk of KD, regardless of the type of pollen being present in the atmosphere. This association could at least partially explain the seasonal patterns observed for KD. Using data from an international consortium, Burns, et al. showed winter peaks in the Northern Hemisphere and a lack of seasonality in the tropical areas and the Southern Hemisphere [35]. Hypothesizing the presence of pollen in the local atmosphere is protective against the trigger for KD, it would make sense that winter peaks are observed in the Northern Hemisphere, at a time where there is little or no pollen in the atmosphere. In contrast, tropical zones and the Southern Hemisphere have much more stable weather patterns throughout the year, and pollen from various origins is constantly present in the atmosphere.
The final element of our framework is exposure to the disease trigger. Until recently, an infectious trigger had been hypothesized to be responsible for KD due to the seasonal patterns [12,35], non-random case clustering [36][37][38], the common co-occurrence of KD and infectious symptoms or infections [13,39,40] and the common presence of sick household members immediately preceding KD diagnosis [41]. We have found in this study an increased burden of infectious disease and the presence of sick family members prior to the diagnosis of KD, and a high proportion of patients with infectious disease symptoms. Temporal clustering of cases in geographically unrelated areas is not consistent with person-to-person area, which would be typical of an infectious process [37]. No infectious agent is known to have a global distribution pattern similar to KD [24,42], and there is no consistency in the type of infectious disease reported to co-occur with KD. These inconsistencies have led many to conclude that multiple infectious agents could trigger KD [36,39,41,42]. An alternative explanation could be that infections are modulating factors rather than the trigger or cause of the disease. This is what we propose in our global distribution framework.
Environmental epidemiology studies from the 1980-1990s have shown that multiple exposures/events 14-21 days preceding symptoms onset could potentially trigger KD [14][15][16]43]. Recent studies have suggested that epidemiological patterns in Japan, Hawaii and the United-States might be associated with exposure to fungal toxins or other environmental factors originating from Northeastern China [17,18], the distribution of which might be influenced by global climate patterns [44]. A similar observation has been found in Chile, where epidemiological data suggest that the triggering agent could be transported from the Atacama Desert through tropospheric winds [45]. Although not designed to answer specifically this question, the association between wind direction distribution and atmospheric fungus concentrations, as found in this study, is consistent with the theory. Given that some animal models of KD use Candida species to trigger a KD-like syndrome and that Candida infection is known to be associated with asthma episodes, it is conceivable that a fungal toxin could be the trigger of KD. Under this assumption, viral and bacterial infections would act as modulating agents, along with atmospheric biological particles, in the weeks before the exposure of a susceptible child to the trigger and the appearance of KD symptoms.
Taken together, these factors were combined into a global framework, where the risk of KD was determined by the four concurrent processes described above: genetic susceptibility, long and short-term modulation of risk and exposure to the disease trigger. Although the model we propose is theoretical and only uses gross estimation of the disease incidence and risk factors, it nevertheless was able to account for more than 80% of the variation in the global distribution of KD. This model should be refined further as incidence data from more countries become available.
This study must be viewed in light of some limitations. First, the majority of the data used are limited to the Canadian context, and the environmental associations identified in this study should be re-assessed in a global context. Our environmental epidemiology questionnaire might be subject to recall and self-report bias, and we cannot be certain that some important environmental risk factors have not been included. Given the sample size available for the case-control part of the study, it is possible that some comparisons between patients and controls might have been underpowered; particularly in the case of rare features. Our analysis of atmospheric biological particles should be seen as being purely exploratory, given that these data were available only for a single year and for the GTA. While we hypothesize that Candida fungi might have an important role in the etiology of KD, our atmospheric particle data did not individually measure it, but rather included it in a larger group of fungi and we cannot exclude potentially contributing environmental factors which were not measured; additionally, atmospheric data from the west coast of Canada; the region at highest theoretical exposure to Candida fungi were not available to confirm this hypothesis. The associations described are only associations and not cause and effect, with the potential mechanisms logical but only speculative. Finally, both our risk framework and global distribution models may represent an ecologic fallacy, and not the results of a single experimental model.
In conclusion, this study described the environmental epidemiology of KD, both at the individual patient level and in a large geographic area. We used these data to propose a theoretical framework, which we tested against the global distribution of the disease and found that a substantial proportion in the variance of KD incidence between countries was accounted by our framework. Future research should attempt to replicate the associations observed in this study, and the predictive model for KD incidence should be revised as additional countries report their epidemiological data.