Limited role for meteorological factors on the variability in COVID-19 incidence: A retrospective study of 102 Chinese cities

While many studies have focused on identifying the association between meteorological factors and the activity of COVID-19, we argue that the contribution of meteorological factors to a reduction of the risk of COVID-19 was minimal when the effects of control measures were taken into account. In this study, we assessed how much variability in COVID-19 activity is attributable to city-level socio-demographic characteristics, meteorological factors, and the control measures imposed. We obtained the daily incidence of COVID-19, city-level characteristics, and meteorological data from a total of 102 cities situated in 27 provinces/municipalities outside Hubei province in China from 1 January 2020 to 8 March 2020, which largely covers almost the first wave of the epidemic. Generalized linear mixed effect models were employed to examine the variance in the incidence of COVID-19 explained by different combinations of variables. According to the results, including the control measure effects in a model substantially raised the explained variance to 45%, which increased by >40% compared to the null model that did not include any covariates. On top of that, including temperature and relative humidity in the model could only result in < 1% increase in the explained variance even though the meteorological factors showed a statistically significant association with the incidence rate of COVID-19. In conclusion, we showed that very limited variability of the COVID-19 incidence was attributable to meteorological factors. Instead, the control measures could explain a larger proportion of variance.

Introduction Pneumonia cases associated with a novel coronavirus were first recognised at the end of December 2019 in Wuhan City, Hubei Province of China. The 2019 coronavirus disease  soon spread to all 34 provinces of China by the end of January. In response to this epidemic, a lockdown was imposed in Wuhan city starting from 23 January 2020, followed by travel restrictions in Hubei. By 29 January 2020, a total of 7,711 confirmed cases and 170 deaths were diagnosed in China, and 31 provinces/municipalities had launched the highest level (level I) of response for major public health emergencies [1] which aimed at preventing and controlling the emergency, to guide and standardize emergency-handling strategies and to minimize the harm caused by such emergency in a prompt and effective manner [2]. The control measure strategies covered nine main medical, social and political aspects: direct leading from the State Council, definition of risk areas, screening of the floating population, traffic control, social distancing, resource mobilization, information release, public education and maintaining social stability. Despite the restrictive political and societal interventions, the disease soon becomes a global public health threat. As of 11 March 2020, the World Health Organization (WHO) reported 118,319 confirmed cases and 4,292 deaths in over 100 countries/ regions [3] and announced COVID-19 as a global pandemic on the same day [4].
Scholars have been discussing the potential effects of meteorological factors on COVID-19 transmission. Previous influenza studies found that in cold and dry weather, respiratory droplets remain airborne longer, the virus is more stable and hosts tend to have weakened immunity, which therefore facilitate virus transmission [5]. Existing laboratory data also suggested that severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was more stable at a low temperature [6]. Yet, among the population-based studies, the meteorological effects were inconsistent [7][8][9][10][11][12] and none have assessed the extent to which the effect contributed to the variability of COVID-19 incidence. It was argued that the inconsistent findings may be due to the decreasing impacts of meteorological conditions after the imposition of political and societal measures for epidemic control [12].
While many studies have focused on identifying the association between meteorological factors and the activity of COVID-19, we hypothesize that the impact of the meteorological factors on risk of COVID-19 was minimal when the effects of control measures were taken into account. In this study, we aim to use data from the first wave of epidemic in China (outside Hubei) to assess how much variability in COVID-19 activity is attributable to city-level socio-demographic characteristics, meteorological factors, and control measures of level I responses. We argue stringent control measures are necessary to control COVID-19 regardless the meteorological conditions of an area.

Ethics statement
The is a statistical modelling study using publicly available data and all the data were in aggregated level without personal information so no ethical issues were encountered.

Settings and primary data screening
We obtained data from a total of 102 cities situated in 27 provinces/municipalities outside Hubei province in China from 1 January 2020 to 8 March 2020, which covers almost the first wave of the epidemic. These cities were selected given that at least 20 cases of COVID-19 were confirmed during the study period. While the epidemics in the Chinese cities outside Hubei province consistently characterised by a mixture of imported from Hubei and local cases, Hubei was regarded as an epidemic centre that exhibited completely different spatial dissemination and temporal dynamics of COVID-19 [13], and thus, we excluded all cities in Hubei province from the analysis.

Level I responses in different Chinese provinces and municipalities
By 25 January 2020, all 27 provinces in our study had initiated the level I response to major public health emergencies. The schedule of level I responses and corresponding control measures in each province/municipality were summarized in S1 Table. As a follow-up to the provincial response, many cities launched more specific and multi-dimensional measures. For example, closure of museums, tourist area and religious institutes in Hangzhou, Zhejiang Province, forced masks-wearing and body temperature-checking in public spaces in Shenzhen, Guangdong Province, as well as cancelling mass gathering activities and screening for individuals with travel history from Hubei in Quanzhou, Fujian Province.

COVID-19 surveillance data
Daily number of confirmed cases of COVID-19 in different Chinese cities from 1 January 2020 to 8 March 2020 were obtained from the webpage of the National Health Commissions of the People's Republic of China [14]. We employed daily incidence, which is defined as the number of cases with illness onset on that day divided by the population size (per million population) in a city, to describe the activity of COVID-19.
To adjust for the delay between date of illness onset and date of confirmation of COVID-19 diagnosis, we rebuilt the epidemic curves using the following recurrence equation: where n onset i (j) and n confirm i (j) are the number of cases with onset day j and the number of cases confirmed on day j in city i respectively, and ϕ(u) is the discretized probability density function of delay duration U that was assumed to follow a gamma distribution with mean of 8.8 days and standard deviation of 4.6 days between 1 and 27 January 2020 and mean of 5.3 days and standard deviation of 3.0 days from 28 January 2020 onwards [13].

Meteorological data and other covariates
Daily meteorological data including average ambient temperature and relative humidity in each of the cities were collected from the National Climate Data Center [15] and were averaged over all weather stations in a city. As absolute humidity has been demonstrated to have a stronger association with respiratory diseases [16][17][18][19], we employed vapour pressure determined by Clausius-Clapeyron equation as a proxy for absolute humidity in our analysis [20][21][22].
In order to account for the variability between cities in our analysis, we also collected cityspecific characteristics including population size, population density, sociodemographic status (i.e. gross domestic product (GDP) per capita, proportion of individuals having tertiary education or above, and proportion of elderly population (i.e. aged >64 years) [23], and geographic distance to Wuhan, which served as a proxy for potential accessibility to the epidemic centre.

Statistical analysis
To account for between-city variation, we employed generalized linear mixed effect models (GLMMs) to examine the variability in the incidence of COVID-19 explained by different combinations of variables. The GLMMs were fitted using the data from the date of the epidemic start (i.e. date of having the first case with illness onset) to the date of epidemic end (i.e. date of the last case) in each of the included cities. Suppose y ij , the daily incidence rate on day j in city i (i.e. n onset ij /population size of city i), follows a Poisson distribution with mean λ ij , the full model form is as follow: where β 0 is the grand intercept, x p i is the p-th city-specific characteristic variable of city i with regression coefficient β p , x q ij is the q-th time-varying meteorological variable of city i on day j with regression coefficient β q , x m ij is the variable with regression coefficient β m which captures the incremental effect of control measures of level I responses implemented on day k as defined below: ( To account for the time trend, we included a variable time i , which is the number of days since the date of the first case with illness in city i with regression coefficient β t in the model. In the GLMM, the city-specific random effect is modelled as α i which followed a normal distribution with mean 0 and variance σ α 2 . The use of the random effect is to capture the city-specific heterogeneity that cannot be accounted for by our data. To account for over-dispersion of the outcome variable, y ij was assumed to follow a negative binomial distribution when the standard Pearson chi-squared statistic divided by its residual degree of freedom (χ 2 /df) was greater than two.
We compared five regression models: (i) model with time trend only (M1, base model), (ii) model with city-specific characteristics and time trend (M2), (iii) model with city-specific characteristics, meteorological factors, and time trend (M3), (iv) model with city-specific characteristics, control measures variable, and time trend (M4), and (v) model with city-specific characteristics, meteorological factors, control measures variable, and time trend (M5, full model) using R-squares (R 2 ) proposed by Nakagawa and colleagues [24], so as to determine which variable combination best explains the variance of the activity of SARS-CoV-2. We used R 2 fixed to depict the proportion of variance explained by the fixed effects and R 2 random to depict the proportion of variance explained by the random effects of cities' heterogeneity. ΔR 2 fixed was used to determine the proportion of variance explained by the additional fixed effect terms in each of the M2 to M5 compared with M1. To avoid the problem of collinearity, the impact of vapour pressure was studied in another set of models by replacing temperature and relative humidity with vapour pressure. Relative risks (RR) with corresponding 95% confidence intervals (CIs) and p-values (p) were employed to quantify the effects of the variables on risk of COVID-19.
A stratified analysis by climate zone was conducted to examine the difference in proportion of variance explained by factors between temperate and subtropical/tropical cities. Of the 102 Chinese cities included, 45 located in the temperate zone and 57 located in the subtropical or tropical zones. Apart from that, we categorized the control measures into 5 types: social distancing, screening and contact tracing, quarantine of risky populations, hospital-related measures, and other public health measures in order to examine the robustness of the composite variable of the level I responses in the GLMM and to assess the statistical significance for each types of the control measures. A similar model form was used (S1 Text).
In the sensitivity analysis, we tested whether adding an interaction term between meteorological factors and control measures in the model would enhance the explained variance. Since day of week was shown to be associated with the consultation pattern of some non-acute diseases [25,26], we examined the variability of our results when the day-of-week term was included into the model. We also compared the main results from models using different lags for meteorological factors (i.e. 3 and 7 lag days) to assess the robustness of our findings. All analyses were carried out using software SAS version 9.4. Table 1 shows the descriptive statistics of the city-specific characteristics of the selected 102 cities. The population size ranged from around 600 thousand (Sanya, Hainan province) to 34 million (Chongqing municipality), whereas the population density ranged from 66/km 2 (Wuzhong, Ningxia province) to 6,523/km 2 (Shenzhen, Guangdong province). The GDP per capita and distance to Wuhan ranged from 22 thousand to 190 thousand Chinese yuan and 210 km to 3,270 km respectively. Beijing municipality had the largest proportion of residents with tertiary education (42.3%), while Chongqing had the largest proportion of population aged >64 years (12.9%).

Results
Across all the included cities, the daily ambient temperature and relative humidity ranged from -23.6˚C to 29.5˚C and 9.4% to 100% respectively ( Table 1). The overall median of cityspecific mean temperature was 6.9˚C (range: -15.0˚C to 22.6˚C) and the median of city-specific mean temperature increased from 4.8˚C (range: -18.5˚C to 22.2˚C) in January 2020 to 10.0˚C (range: -7.4˚C to 24.3˚C) in March 2020 (Fig 2A). The overall median of city-specific mean relative humidity was 74.4% (Range: 44.9% to 89.7%) and the median of city-specific mean relative humidity decreased slightly from 76.3% (range: 51.6% to 90.9%) in January 2020 to 73.8% (range: 30.3% to 97.9%) in March 2020 (Fig 2B).
Before 22 January 2020, most of the cities had a daily incidence rate below 2 per million population ( Fig 2C). Before further upsurge of epidemic outbreaks, the Chinese provincial governments have declared level I responses during 23 to 25 January 2020. After that, Shenzhen in Guangdong province experienced the peak daily incidence of 6.9 per million inhabitants on 28 January 2020 among all cities outside Hubei province. A downward trend was observed in the epidemic curve about a week after the implementation of control measures of level I responses. Table 2 shows the model comparison results. Compared with M1, which solely included the time trend, including city-specific characteristics in the model (M2) could only explain an additional 3.22% of the variance in the incidence rate. Further inclusion of temperature and relative humidity as time-varying fixed effects in the model (M3) boosted the explained variance to 11.8%. However, having the control measure effects included in the GLMM (M4) substantially raised the explained variance to 45.0%, which increased by >40% compared to the null model. On top of this effect, including the meteorological effects in the model (M5) only resulted in < 1% increase in the explained variance even though temperature and relative humidity showed a statistically significant association with the incidence rate of COVID-19 (temperature: RR = 0.984, 95% CI: 0.969-0.999, p = 0.040; relative humidity: RR = 0.993, 95% CI: 0.988-0.997, p = 0.001). In the full model, no city-specific characteristics (i.e. distance to Wuhan, population density, GDP per capita, proportion of tertiary education, and proportion of elderly population) were significantly associated with the COVID-19 incidence. When temperature and relative humidity in the models were replaced with vapour pressure, the increases in explained variance were similar (Table 3). Nevertheless, a decrease in vapour pressure was statistically significantly associated with an increased risk of the COVID-19 incidence (RR = 0.958, 95% CI: 0.939-0.976, p<0.001).
While the analysis was stratified by climate zone, the additional variances explained by the control measures were similar between the temperate and subtropical/tropical cities when compared with the variance explained in the model without the effects of M3 control measure (Table 4). However, the contribution of meteorological factors in the explained variance of the subtropical/tropical cities was around 3-fold more than that in the temperate cities (i.e. ΔR 2 fixed = 14.4% vs ΔR 2 fixed = 5.04% in M3). The temperature and relative humidity even became statistically insignificant in the full model when fitting the data of temperate cities   When the control measures were categorized by types, the overall variance explained in the models was reduced by around 8% (Table 5). However, as with the major finding, including the meteorological effects in the model (M5) only resulted in 2% increase in the explained variance with the statistical significances of the temperature and relative humidity remain unchanged. Among all types of control measure, imposing social distancing (RR = 0.912, 95% CI: 0.892-0.932, p<0.001), screening and contact tracing (RR = 0.945, 95% CI: 0.926-0.965, p<0.001), hospital-related measures (RR = 0.954, 95% CI: 0.941-0.967, p<0.001), and other public health measures (RR = 0.942, 95% CI: 0.927-0.958, p<0.001) were significantly associated with a lower risk of COVID-19. Quarantine of risky populations was not found to be a significant predictor.
As shown in the sensitivity analysis, our results were robust to variance explained by the delayed effects of meteorological factors (S2 Table). When the lags of temperature and relative humidity were increased in the models, a slight decrease in the explained variance was observed (lag = 3 days: R 2 fixed = 10.4% in M3; lag = 7 days: R 2 fixed = 9.15% in M3) and both of the temperature and relative humidity tended to be less significant. Compared with R 2 fixed of M4, R 2 fixed of the full model that accounted for lag effects did not change remarkably and was kept at around 45%. Adding the interaction terms (S3 Table) or the day-of-week term (S4 Table) in the model did not enhance the proportion of variance explained.

Discussion
Although laboratory findings showed that the stability of SARS-CoV-2 was sensitive to temperature and relative humidity [6,27,28] in controlled environments, the meteorological effect varies greatly at population level when host factors were taken into account. In this study, we employed data from 102 Chinese cities which experienced the first wave of epidemic to assess how much variability in COVID-19 activity was attributable to city-level socio-demographic characteristics, meteorological factors, and control measures of level I responses. According to our results, despite temperature and relative humidity were significantly associated with the risk of COVID-19, we could not identify a substantial variability of the COVID-19 incidence was attributable to meteorological factors once the effect of control measures of level I response was taken into account. Instead, the implementation of control measures was associated with a larger proportion of variance explained with regard to the activity of COVID-19 and the result was robust to variations in climate zones of the cities and lag effects of meteorological factors.
Our findings support that control measures have significant effects on COVID-19 incidences while climatic conditions are less important in the limits of this study. This corroborates with an investigation by te Beest and colleagues [29] which focused on seasonal influenza, another respiratory disease with likely identical transmission route (via contact, droplets, and fomites). They [29] showed that the effect of absolute humidity could only explain very limited proportion of variance in disease transmission intensity, instead, depletion of susceptible during an epidemic that might be done by vaccination contributed to onethird of total variance. Our study suggests a similar perspective that the effect of host factors likely contributes much variability to COVID-19 transmission at population level even though laboratory findings suggested the viral spreading ability of coronavirus reduced in hot condition [30]. This, on the other hand, suggests that if a vaccine is not available, non- pharmaceutical interventions to reduce the frequency of host contacts (e.g. social distancing) are required to induce a decrease in COVID-19 incidence.
Although we could not show a large proportion of variance explained by the meteorological factors, temperature and relative humidity were negatively associated with the risk of COVID-19. Another investigation in China also indicated that temperature was a driver of the COVID-19 outbreak and the incidence decreased with the rise of temperature [31]. Consistent with a study in the United States [7], higher temperature was significantly associated with a linearly decreasing risk of COVID-19. Our study echoed with a recent systematic review that hot and wet climates were related to a decrease in spread of COVID-19 [32]. Nevertheless, the association identified in our study contradicts the results of an earlier study showing that high temperature favoured the transmission of COVID-19 in Brazil [33]. Yet, we noted that the Brazil study did not account for the increase in intensity of control measures along time. Such inconsistency of association between temperature and disease transmission was typically observed in respiratory diseases across different zones and hemispheres [13]. We also showed relative humidity and absolute humidity were correlated with the activity of COVID-19 and the result was consistent with other studies [7,33]. We speculate COVID-19 shares similar viral characteristics with influenza in which a lower humidity level could enhance the survival and transmission of the virus [18].
In our analysis, we employed random effects to capture the city-specific heterogeneity that cannot be accounted for by our data. Random effects term, together with the fixed effects terms, helped to increase the variance explained by the models to around 60% of the total variability. The remaining unexplained variance could be attributed to many other factors. For example, we did not capture the between-province heterogeneity which might be inherited from the variation in the compliance of the control measures in level I response (S1 Table). Nevertheless, we conducted an additional analysis by including different types of control measures in the GLMMs and the results were consistent with our major finding though a decrease of model fitness was observed. We also found majority of control measures (i.e. social distancing, screening and contact tracing, hospital-related measures, and other public health measures) was significantly associated with a lower risk of COVID-19 activity. Moreover, different levels of reporting rates may contribute to the unexplained variance, especially when the public awareness of the newly emerged COVID-19 has increased compared to the start of the epidemic which might only be partly captured by the time effect.
There are several major limitations in our study. First, we did not account for the changes in number of susceptible individuals by taking it as one of the fixed effects in our statistical models. However, given COVID-19 is a newly emerged infectious disease, the effect of variation in number of susceptibles on our results is likely to be comparatively minor [29]. Second, we did not study the impacts of other meteorological variables such as rainfall because majority of studies have only documented the impacts of temperature and humidity. Ultraviolet radiation was also shown to be insignificantly associated with the transmission of COVID-19 [10]. Moreover, since the pollution level in China was likely to be reduced at the moment due to shutdown of business and industrial activities, pollutants were not included in our analysis so as to avoid interpretation of non-causal relationship [34]. However, we cannot completely rule out a potential effect of pollutants on exacerbating the prognosis of COVID-19 especially in elderly with chronic conditions as we observed more cases and deaths in the elderly [35][36][37]. In addition, our study period covered the first wave of COVID-19 epidemic which only lasted for three months. Our findings may thus not be generalized to other seasons although our study period covered a wide range of meteorological variation in most Chinese cities across a year. Third, we used a single variable to capture the effect of control measures of level I response in the model due to complexity in differentiating the impacts of each control measure. Further modelling investigation using more information to rank the importance of factors in explaining the reduction of COVID-19 incidence is warranted.
In conclusion, even though meteorological factors were associated with COVID-19, we could not find an apparent impact of them and only the effect of control measures could explain a large portion of variability in COVID-19 activity. Therefore, we argue stringent control measures are necessary to control COVID-19 regardless the meteorological conditions of an area. Given that no vaccine is available to date, our investigation provides an additional evidence, as advocated by World Meteorological Organization [38], rather than relying on changes in the natural environment for mitigation, active non-pharmaceutical interventions are necessary to curb the COVID-19 pandemic.