Multiple drivers of the COVID-19 spread: role of climate, international mobility, and region-specific conditions

The novel Coronavirus Disease 2019 (COVID-19) has spread quickly across the globe. Here, we evaluated the role of climate (temperature and precipitation), region-specific susceptibility (BCG vaccination, malaria infection, and elderly population) and international traveller population (human mobility) in shaping the geographical patterns of COVID-19 cases across 1,055 countries/regions, and examined the sequential shift of multiple drivers of the accumulated cases from December, 2019 to April 12, 2020. The accumulated numbers of COVID-19 cases (per 1 million population) were well explained by a simple regression model. The explanatory power (R2) of the model increased up to > 70% in April 2020 as the COVID-19 spread progressed. Climate, host mobility, and host susceptibility largely explained the variance of the COVID-19 cases (per 1 million population), and their explanatory power improved as the pandemic progressed; the relative importance of host mobility and host susceptibility have been greater than that of climate. The number of days from outbreak onset showed greater explanatory power in the earlier stages of COVID-19 spread but rapidly lost its influence. Our findings demonstrate that the COVID-19 pandemic is deterministically driven by climate suitability, cross-border human mobility, and region-specific susceptibility. The present distribution of COVID-19 cases has not reached an equilibrium and is changing daily, especially in the Southern Hemisphere. Nevertheless, the present results, based on mapping the spread of COVID-19 and identifying multiple drivers of this outbreak trajectory, may contribute to a better understanding of the COVID-19 disease transmission risk and the measures against long-term epidemic.

outbreaks. This suggests that the COVID -19 pandemic is driven by complicated factors, which may include habitat suitability, region-specific human mobility, and transmission related to susceptibility. Therefore, identifying the driving force of the outbreak pattern is urgently needed for predicting infection risk at the global scale (Ficetola and Rubolini 2020). Additionally, capturing region-specific factors in relation to the outbreak progress is critically important for improving control measures against the long-term epidemic.
Respiratory virus infectious disease is empirically characterized by a seasonal nature (Altizer et al. 2006). Moriyama et al. (2020) illustrated a framework to better understand the mechanisms of virus transmission; air temperature, absolute/relative humidity, and sunlight are jointly associated with virus viability/stability and host defense, and thereby human-to-human transmission is promoted by contact rates along with host susceptibility (or immunity) to the virus. From this viewpoint, several research groups have focused on relevant factors individually and quickly examined the role of climate (Baker et al. 2020;Araújo and Naimi 2020;Sajadi et al. 2020), international mobility linked to human contact (Wells et al. 2020;Coelho et al. 2020), and community-based host susceptibility (Sala and Miyakawa 2020) in the spread of COVID-19. However, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020 these analyses were inconclusive, and the relative importance of these factors in promoting the disease expansion of COVID-19 remains unclear.
Evaluating the drivers of the COVID-19 spread is a challenging task at the present phase of disease expansion. There is still a distributional disequilibrium in the prevalence of infections globally; the number of confirmed COVID-19 cases changes daily and the trajectories largely differs among countries or regions. Moreover, the absence of population-wide testing for COVID-19 makes it difficult to investigate the growth dynamics of COVID-19 infection. The case data includes selection bias due to surveillance focusing mainly on symptomatic persons. In particular, the availability of the test using reverse transcription polymerase chain reaction (PCR) to identify COVID-19 cases, e.g. the number of PCR tests conducted per population, varies greatly among countries with different medical/public-health conditions (https://ourworldindata.org/covid-testing). Therefore, the true number of the COVID-19 patients and the dynamics of the disease spread are obscured behind the prevalence of asymptomatic carriers (Mizumoto et al. 2020;Nishiura et al. 2020).
Despite the fact that the number of COVID-19 cases may not be sufficiently reliable for . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020 conducting epidemiological analyses, such as modelling growth dynamics of the disease, the available data of the confirmed cases can be still informative for implementing containment and/or suppression measures, because the number of the confirmed cases are directly linked to the consumption of medical resources for combatting against the COVID-19 pandemic. Therefore, this study focused on time-series data regarding the number of the COVID-19 cases confirmed from December 2019 to April 2020, as well as country/region-specific variables, e.g. socio-economic conditions and screening effort (number of PCR tests conducted), that potentially affect the number of cases, and then explored multiple drivers of disease expansion by controlling covariates.
Specifically, we evaluated the relative importance of climate (temperature and precipitation relevant to habitat suitability for the virus), region-specific susceptibility (BCG vaccination, malaria infection and the relative frequency of elderly citizens hypothesized to be linked to host susceptibility) and international travel (human mobility) in shaping the current geographical patterns of the COVID-19 spread across the globe.

Material and Methods
Data . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020 We compiled geographic data on the number of reported COVID-19 cases per day from December, 2019 to April 12, 2020. For 1,055 countries/regions, we collected the number of COVID-19 cases from various sources (see List of data sources for the COVID-19 cases in the end of this preprint). We then calculated the time (in days) since the onset of virus spread that was defined by the date of the first confirmed case in each country or region. We also collected the number of PCR tests conducted from World Health Organization (WHO) (https://ourworldindata.org/covid-testing) to examine the influence of sampling effort on the number of confirmed cases of COVID-19.
For each country or region, we compiled several environmental variables. For mapping cases of COVID-19, the longitude and latitude representing the largest city and area for the country or region were extracted from GADM maps and data (https://gadm.org/index.html). Based on geocoordinates of the cities, we collected the climatic data of mean precipitation (mm month -1 ) and temperature (℃) for January, February and March (WorldClim) using WorldClim version 2.1 climate data (https://www.worldclim.org/data/worldclim21.html) at the resolution of 2.5 arc-minutes grid cells that contained a country or region.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020 Regarding international travel linked to the disease transmission, we compiled the average number of foreign visitors (per year) for individual countries/regions from the World Tourism Organization (https://www.e-unwto.org/toc/unwtotfb/current). Then, we calculated the relative frequency of foreign visitors per population of each country or region for the analysis.
Regarding region-specific host susceptibility, we collected the following three epidemiological properties: the proportion of people aged over 65 years (elderly population), the number of people infected by malaria (per year), and information regarding bacillus Calmette-Guérin (BCG) vaccination from the WHO (https://www.who.int/malaria/data/en/) and is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020.04.20.20072157 doi: medRxiv preprint started (BCG_year); ii) the present situation of BCG vaccination practice (BCG_type) split into all vaccinated, partly vaccinated, one time vaccinated in the past, or never vaccinated; iii) the relative frequency of after-1980 (last 40 years) BCG vaccination for people less than 1 year old (BCG_rate); iv) the number of BCG vaccinations (MultipleBCG) describing countries as never vaccinated their citizens with BCG, vaccinated with BCG only once, vaccinated with BCG multiple times in the past, or vaccinated with BCG multiple times in the present; and v) tuberculosis cases per 1 million people (TB). These BCG-related variables have strong intercorrelation. Therefore, we reduced the dimensions of these variables (BCG_year, BCG_type, BCG_rate, MultipleBCG, and TB) by extracting the first axis of the PCA analysis: the score of the PCA 1 axis was negatively correlated with the five variables, and thus PCA 1 score multiplied by -1 was defined as the BCG vaccination effect.
We also compiled socioeconomic data for each country or region. The population size, population density (per km 2 ) (Gridded Population of the World GPW, v4.; https://sedac.ciesin.columbia.edu/data/collection/gpw-v4), gross domestic product (GDP in US dollar), and GDP per person were obtained from national census data (World

Development
Indicators; . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Statistical analyses
We investigated the relationship between the environmental variables (climatic, host susceptibility, international human mobility, and socioeconomic factors) and the number of COVID-19 cases (per 1 million population) using the two approaches to assure the results robustness: conventional multiple linear regression and random forest, a machine-learning modeling (Breiman 2001), respectively. We separately modeled the accumulated number of COVID-19 cases (per 1 million population) at successive periods from December, 2019 to April 12, 2020.
In the multiple regression analysis, we set the log-scaled cumulative number of COVID-19 cases at a period as the response variable and the climatic factors (mean temperature, squared mean temperature, and log-scaled monthly precipitation), socioeconomic conditions (log-scaled population density and GDP per person), international human mobility (the relative frequency of foreign visitors per population) and region-specific susceptibility (the percentage of people aged ≥ 65 years, the log-scaled relative frequency of people infected by malaria, and the BCG vaccination . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10. 1101/2020 effect) as explanatory variables.
We also included the time of infection (measured in days) since the first case confirmed at each country or region and the number of tests conducted for COVID-19 (as a measure of sampling effort) as covariates to control the country/region-specific observation biases. In addition, we used the trend surface method to take spatial autocorrelation into account as a covariate: we added the first eigenvector of the geo-distance matrix among the countries or regions, which was computed using geocoordinates of the largest city, as a covariate (Diniz-Filho and Bini 2005). The explanatory power of the model was evaluated by the adjusted coefficient of determination (R 2 ). We also calculated the relative importance of each explanatory variable in a regression model according to its partial coefficient of determination and determined the predominant variables that explained the variance in the response variables. The statistical significance of each variable was determined by conducting F-test. All of the explanatory variables were standardized to have a mean of zero and a variance of one before these analyses.
In the random forest model, we used the same set of response and explanatory variables, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10. 1101/2020 as well as the same covariates. In each run of the random forest analysis, we generated 1,000 regression trees. The model performance was evaluated by the proportional variance explained. We evaluated the relative importance of each explanatory variable based on the increase in the mean squared error when the variable was permutated.
Before these analyses, we tested collinearity between the explanatory variables by calculating the variance inflation factor (VIF); the largest VIF value was 5.28 throughout the period and VIF at April 12, 2020 was 3.35, indicating the absence of multicollinearity in the regression.
To confirm the testing effort bias on the number of confirmed cases, we conducted an additional analysis that accounted for the number of conducted tests (i.e., sampling efforts) in individual countries/regions, as a covariate in the model. Note that this analysis was applied to the data from 128-700 countries/regions, because testing data for many countries is currently unavailable (https://ourworldindata.org/covid-testing).
All analyses were performed with the R environment for statistical computing (R Development Core Team 2012); 'sf' package was used for graphics artworks (Pebesma . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10. 1101/2020 2018) and 'randomForest' package was used for the random forest analysis (Liaw and Wiener 2002). As time progressed, the standardized regression coefficients of the model greatly changed (from non-significance to significance) through December, 2019 to April 12, 2020 (Fig. 4). After February, 2020, mean temperature was negatively correlated with the accumulated numbers of the COVID-19 cases, whereas mean precipitation was positively correlated (Figs. 4A, B and C). After March, 2020, relative frequency of . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020.  (Fig. 4D), and the relative frequency of people ≥ 65 years old was also positively (but temporarily negatively) correlated with the accumulated numbers of the COVID-19 cases (Fig. 4I). In the early stage of COVID-19 spread, the number of days from case onset was strongly positively correlated (Fig. 4J).
The explanatory power, i.e., coefficient of determination (R 2 ), of the model increased to up to > 70% in April, 2020 as the COVID-19 epidemic progressed (Fig. 5). The number of days from case onset had greater explanatory power > 20% in January, 2020, but quickly lost its influence as the epidemic progressed, and instead other variables (related to climate, human mobility, and host susceptibility) showed the increasing explanatory powers (Fig. 5).
The results of the random forest model were generally consistent with those of the linear multiple regression model (Fig. S1). The relative importance of the variables . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. Notably, the explanatory power of these drivers substantially increased as the epidemic progressed from January to April, 2020, indicating a deterministic expansion of the disease across the globe.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020 Cross-border human mobility, which has been facilitated by globalization (Smith et al. 2007), clearly accelerated the COVID-19 pandemic. This finding is in line with a report by Coelho et al. (2020), which emphasized the role of the air transportation network in this epidemic. In addition, region-specific susceptibility, which here was surrogated by BCG vaccination, malaria infection (maybe linked to anti-malarial drugs use), and the elderly population aged over 65 years, explained a substantial part of the variance in COVID-19 cases across the globe: this supports the findings by Sala and Miyakawa (2020) that show a significant correlation between BCG vaccination and COVID-19 prevalence. We note that these correlation patterns may change as this epidemic progresses; a correlation with malaria infection was relatively robust, but the regression coefficients of BCG vaccination effect with the COVID-19 cases (per 1 million population) seems to be less influential after April 2020, which is due to the recent spreads in some particular countries with BCG vaccination program (e.g. Japan, Russia, Turkey and Brazil).
Our analysis using the regression model, which comprehensively accounted for climate, international human mobility, region-specific susceptibility, and socio-economic conditions, showed that climate suitability remains an important driver shaping the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020 current distribution of COVID-19 cases (Ficetola and Rubolini 2020, Araújo and Naimi 2020). Although human mobility and host susceptibility were main drivers in the spread of COVID-19, the distribution of COVID-19 cases across biome types (Fig. 2 and Supplement Video 2) may suggest biogeographical patterns of the epidemic (Murray et al. 2018). Unless the epidemic progresses through the summer months in the Northern Hemisphere and/or in temperate regions of the Southern Hemisphere, it may be too early to make a conclusion on the relationship between abiotic factors and COVID-19 (Moriyama et al. 2020).
Our model prediction does not take into account variables relevant to local-scale factors that are associated with community infection or containment/suppression measures implemented against the epidemic in individual countries/regions. Therefore, the residual of the model (Fig. 6), i.e., deviation of the observed number of COVID-19 cases reflects the influence of local-scale drivers on epidemic spreading; positive deviation may indicate more serious local-scale cluster infection, e.g., in some prefectures in Japan or in parts of South East Asia, Africa, and South America, than we predicted by macro-scale driver-based model, while negative deviation indicates the influence of distributional disequilibrium of COVID-19 cases (because of the areas, e.g., .
CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10. 1101/2020 Africa, that the virus reached recently) or suggest the effectiveness of the present control measures.
The distribution of COVID-19 cases is not presently at an equilibrium and is changing daily. Nevertheless, our findings demonstrate that the COVID-19 pandemic is deterministically driven by climate suitability, cross-border human mobility, and region-specific susceptibility. The present results, based on mapping the spread of COVID-19 and identifying multiple drivers of this outbreak trajectory, may contribute to a better understanding of the disease transmission risk and the measures against long-term epidemic. Altizer S., Dobson A., Hosseini P., Hudson P., Pascual M., and Rohani P. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020 Amoroso A. (2020) Temperature, humidity, and latitude analysis to predict potential spread and seasonality for COVID-19. Preprint research paper. Electronic copy available at: https://ssrn.com/abstract=3550308 Sala G. and Miyakawa T. (2020)   . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10. 1101/2020 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020.04.20.20072157 doi: medRxiv preprint Fig. 3. Standardized regression coefficients and coefficient of determination (R 2 ) of the model explaining the accumulated number of COVID-19 cases (per 1 million population) from December, 2019 to January 31, 2020 (A), December, 2019 to February 29, 2020 (B), December, 2019 to March 31, 2020 (C), and December, 2019 to April 12, 2020 (D). Temp, mean temperature; Temp 2 , squared mean temperature; Prec, mean monthly precipitation; Pop dens, population density; Visitor, relative frequency of foreign visitors per population; GDP, gross domestic product per person; BCG, BCG vaccination effect defined by the first PCA axis summarizing five variables related to BCG vaccination (see the method section); Malaria, relative frequency of people infected by malaria; Age, relative frequency of people ≥ 65 years old; First cases, number of days from case onset. The regressions were conducted using ordinary least squares analyses. Vertical lines represent the 95% confidence intervals of parameters. Closed symbols indicate the significance of explanatory variables (P < 0.05). A nonlinear modeling analysis was also conducted by the random forest using the same set of response and explanatory variables and the same covariates; the results of this parallel analysis are shown in Fig. S1.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10. 1101/2020  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 24, 2020. . https://doi.org/10.1101/2020