Advertisement
  • Loading metrics

Spatial analysis of COVID-19 spread in Iran: Insights into geographical and structural transmission determinants at a province level

Spatial analysis of COVID-19 spread in Iran: Insights into geographical and structural transmission determinants at a province level

  • Ricardo Ramírez-Aldana, 
  • Juan Carlos Gomez-Verjan, 
  • Omar Yaxmehen Bello-Chavolla
PLOS
x
?

This is an uncorrected proof.

Abstract

The Islamic Republic of Iran reported its first COVID-19 cases by 19th February 2020, since then it has become one of the most affected countries, with more than 73,000 cases and 4,585 deaths to this date. Spatial modeling could be used to approach an understanding of structural and sociodemographic factors that have impacted COVID-19 spread at a province-level in Iran. Therefore, in the present paper, we developed a spatial statistical approach to describe how COVID-19 cases are spatially distributed and to identify significant spatial clusters of cases and how socioeconomic and climatic features of Iranian provinces might predict the number of cases. The analyses are applied to cumulative cases of the disease from February 19th to March 18th. They correspond to obtaining maps associated with quartiles for rates of COVID-19 cases smoothed through a Bayesian technique and relative risks, the calculation of global (Moran’s I) and local indicators of spatial autocorrelation (LISA), both univariate and bivariate, to derive significant clustering, and the fit of a multivariate spatial lag model considering a set of variables potentially affecting the presence of the disease. We identified a cluster of provinces with significantly higher rates of COVID-19 cases around Tehran (p-value< 0.05), indicating that the COVID-19 spread within Iran was spatially correlated. Urbanized, highly connected provinces with older population structures and higher average temperatures were the most susceptible to present a higher number of COVID-19 cases (p-value < 0.05). Interestingly, literacy is a factor that is associated with a decrease in the number of cases (p-value < 0.05), which might be directly related to health literacy and compliance with public health measures. These features indicate that social distancing, protecting older adults, and vulnerable populations, as well as promoting health literacy, might be useful to reduce SARS-CoV-2 spread in Iran. One limitation of our analysis is that the most updated information we found concerning socioeconomic and climatic features is not for 2020, or even for a same year, so that the obtained associations should be interpreted with caution. Our approach could be applied to model COVID-19 outbreaks in other countries with similar characteristics or in case of an upturn in COVID-19 within Iran.

Author summary

Iran was among the first countries reporting a rapid increase in the number of COVID-19 cases. Spatial epidemiology is useful to study the spatial distribution of a disease and to identify factors associated with the number of cases of such disease. By applying these methods, we aimed to identify whether there are clusters of regions in Iran with high or low number of COVID-19 cases and the association of different factors with these numbers, considering spatial relationships and maps representing these associations. Interestingly, we found regions of high number of cases and that more COVID-19 cases were present in provinces with more urbanization, aging population, number of physicians, efficient communications, and greater average temperatures, whereas less COVID-19 cases were present in provinces with more literacy. This study allowed us to understand the spatial behavior of the disease and the importance of having adequate health policies, literacy campaigns, and disseminating health information to the population.

Introduction

On 11th March 2020, the General Director of the World Health Organization (WHO), Dr. Tedros Adhanom Ghebreyesus, declared the new infectious respiratory disease COVID-19, caused by the infection of novel coronavirus SARS-CoV-2 as a pandemic, due to the rate of growth of new cases, the number of affected people, and the number of deaths [1]. As of the time of this writing (April 15th, 2020), the number of infected cases world-wide corresponded to more than 1 million, being the most affected countries: Italy (16,523 deaths), Spain (13,341 deaths), USA (10,792 deaths), France (8,911 deaths), United Kingdom (5,373 deaths), and Iran (3,739 deaths) [2,3].

Iran was among the first countries outside of China to report a rapid increase in the number of COVID-19 cases and associated deaths; its first confirmed cases were reported on 19th February 2020 in the province of Qom imported from Wuhan, China [4]. Nevertheless, some reports suggest that the outbreak may have happened two or six weeks before the government official announcement [5]. Iran had one of the highest COVID-19 mortality rates early in the pandemic, and its rate of spread has been amongst the highest. However, as with other countries, it may be a sub-estimation of cases, and there may be other cases not officially reported [6].

The large count of COVID-19 cases and mortality in Iran are multifactorial. Iran’s response to the epidemic has been highly affected by several imposed economic sanctions and armed conflicts within the last 20 years. Moreover, its difficult economic situation due to a recession, having inflation rates that are among the highest in the region, has taken a toll on its public health system [7,8]. Although there are approximately 184,000 hospitals and primary health-care staff, limitations in the availability of COVID-19 testing kits, protective equipment, and ventilators are quite important. On the other hand, over the last years, Iran has slowed the rate of mortality associated with infectious and maternal diseases. It is currently undergoing an epidemiological transition where infectious diseases interact with chronic conditions [2]. In this sense, Iran may represent other similar developing world countries with poor health systems and an increased prevalence of chronic diseases.

Spatial statistics have emerged as a useful tool for the analysis of spatial epidemiology, concerning mapping and statistical analyses of spatial and spatial-temporal incidences of different pathogens. The aim of this paper is to perform spatial analyses which allow us to better understand the COVID-19 outbreak in Iran, not only in terms of the strength of its presence and socioeconomic and structural factors which facilitate the disease spread within Iranian provinces, but also in terms of how the disease is spatially distributed and which variables are spatially related with it considering the spatial effect to obtain adequate inferences. Given the role of climate and socio-economic factors in determining the distribution of cases and its impact world-wide, we also aimed to incorporate said factors as predictors of SARS-CoV-2 spread [9,10]. This could aid to understand the burden of COVID-19, its distribution in the country, and its implication on public health within Iran and similar countries [11] and could contribute to public health measures by providing insight to inform the implementation of interventions or to understand socio-demographic factors associated with the SARS-CoV-2 spread and COVID-19 heterogeneity as it has been applied to previous infectious diseases [1216].

Material and methods

Data sources

We obtained province-specific data considering 31 provinces or polygons in Iran (Table 1). From the Statistical Centre of Iran [17], we extracted information concerning: 1) people settled in urban areas in 2016 (%), calculated from the population and household of Iran by province and sub-province information of the census, 2) people aged ≥60 years in 2016 calculated from the population disaggregated by age groups, sex, and province information of the census, 3) population density (people per km2) in 2016, 4) literacy rate of population aged ≥6 years in 2016, obtained from the document of selected results from the 2016 census, 5) the Consumer Price Index percent changes on March 2020 for the national households in contrast to the corresponding month of the previous year (point-to-point inflation), and 6) the average temperature (°C) of provincial capitals and 7) annual precipitation levels (mm) in 2015, both part of the climate and environment information. From the Iran data portal [18], we obtained, from the health section: 1) the number of physicians employed by the ministry of health and medical education in 2006 and 2) the number of beds in operating medical establishments in 2006 and from the government finance section: 3) the province contribution to gross domestic product (GDP) in 2004. We used a Transportation Efficiency Index (TEI) [19], constructed through Data Evelopment Analysis, being an indicator of the extent in which each province efficiently utilize their transportation infrastructure. The TEI has values between zero and one, values near to one indicate provinces better communicated, but we standardized it (values of each province minus its mean divided by the associated standard deviation) to allow interpretations in a better scale in terms of how the increase in a certain number of standard deviations of the TEI is associated with the number of COVID-19 cases. The cumulative number of cases with confirmed COVID-19 by Province from February 19th to March 18th, 2020, was also obtained [20]. It is important to notice that, in order to obtain more accurate rates of cases with COVID-19, population size in 2020 by province was derived by using mathematical projection methods (arithmetic, geometric, exponential, and logistic or saturation methods) using information contained in the population and housing censuses from 2006, 2011, and 2016. Since all methods provided similar results, we show here only those associated with the arithmetic method. Shapefiles were obtained from the Stanford Libraries Earthworks: https://earthworks.stanford.edu/catalog/stanford-dv126wm3595, in which files are freely available for academic use and other non-commercial use [21].

thumbnail
Table 1. Features extracted for spatial analyses disaggregated by Iranian provinces to predict the spread of COVID-19 cases.

Abbreviations: GDP, Gross Domestic Product; TEI, Transportation Efficiency Index

https://doi.org/10.1371/journal.pntd.0008875.t001

COVID-19 rate estimation by Iranian provinces

We obtained quantile maps associated with raw rates of COVID-19 cases, as well as smoothed case rates by province using an empirical Bayes estimator, which is a biased estimator that improves variance instability proper of rates estimated in small-sized spatial units [22] (i.e. provinces with a larger population size have lower variance than provinces with a smaller population size). Since raw and smoothed rates were surprisingly similar, only results of smoothed rates are reported. We also obtained maps concerning excess or relative risk, serving as a comparison of the observed number of cases by the province to a national standard. For the variable concerning the number of people aged ≥60 years by province, raw, and smoothed rates were obtained using empirical Bayes, and the latter were used in all analyses.

Spatial weight estimation and spatial autocorrelation

Since all spatial analyses require spatial weights, we obtained queen contiguity weights [23]. Provinces were considered as neighbors when they share at least a point or vertex in common, obtaining a squared matrix of dimension 31 (31x31 matrix) with all entries equal to zero or one, the latter value indicating that two provinces are neighbors. From these neighbors, weights are calculated by integrating a matrix in a row-standardized form, i.e., equal weights for each neighbor and summing one for each row. Moran's I statistic was obtained as an indicator of global spatial autocorrelation [24], and its significance was assessed through a random permutation inference technique based on randomly permuting the observed values over the spatial units [25]. Local indicators of spatial autocorrelation (LISA) were obtained, being these a decomposition of Moran’s I used to identify the contribution of each province in the statistic [26]. LISA was used to derive significant spatial clustering through four cluster types: High-High, Low-Low, High-Low, and Low-High. For instance, the High-High cluster indicates provinces with high values of a variable that are significantly surrounded by regions with similarly high values. Analogous to Moran's I and LISA, estimates can be calculated to identify the spatial correlation between two variables and to identify bivariate clustering [27]. For instance, to identify provinces with high values in a first variable surrounded by provinces with high values for a second variable (cluster High-High). Bivariate clustering and quartile maps were obtained for each of the significant variables in a linear spatial model, to have a better understanding of the individual spatial effect of each of these variables over the smoothed rates associated with the disease.

Spatial multivariate linear models

Spatial multivariate linear models were fitted to identify variables that significantly impact the number of log-transformed COVID-19 cases [28]. This response variable was chosen since the corresponding model better satisfies all statistical assumptions, the other variables introduced in the Data section were simultaneously introduced as explanatory, first removing from the model all variables generating multicollinearity. Ordinary Least Squares (OLS) estimation was used to identify whether a linear spatial model was necessary by using a Lagrange Multiplier (LM) and a robust LM statistics to compare the non-spatial model with spatial models [29]. Two kinds of spatial models were compared; the spatial-lag model considers the spatially lagged response as an additional explanatory variable, whereas the spatial-error model considers that the error is a linear function of a spatially lagged error plus another error term. Another model was obtained by performing a backward selection process, considering the elimination of the most non-significant variable in each step and the minimization of the Akaike Information Criterion (AIC). This process allowed us to identify whether the associations obtained through this model were similar as those obtained through the model including all variables. For significant variables in the linear spatial models, interpretations in the original scale (i.e., as counts) were derived and we performed bivariate LISA significant clustering between each of these significant variables and the rate of cases with COVID-19, as explained above. All statistical analyses were conducted using GeoDa version 1.14.0. A two-tailed p-value<0.05 was considered as the significance threshold.

Results

Rates description and spatial autocorrelation of COVID-19 case rates between provinces

Maps for quartiles corresponding to the smoothed rates and excess risk of COVID-19 cases are shown in Fig 1. We observed that the highest rates of COVID-19 and excess risk values were located in the Northern region of Iran corresponding to the provinces of Qom, Marzaki, Mazandaran, and Semnan. There were also high rates associated with the provinces of Alborz, Gilan, Qazvin, and Yazd (last quartile). We observed significant spatial autocorrelation (Moran’s I = 0.426, p = 0.002), indicating that COVID-19 rates between provinces are significantly spatially related. From the heat and significance maps corresponding to significant clusters using an empirical Bayes spatial technique, we delimited a High-High cluster in red, indicating a northern zone around Tehran and Alborz with significant high COVID-19 rates surrounded by areas with similarly high rates. Conversely, we delimited a Low-Low cluster in blue indicating southern provinces with small rates surrounded by areas with similarly lower rates, which includes the provinces of Bushehr, Homozgan, Sistan, and Baluschestan. Interestingly, Golestan showed in light purple, has significantly lower COVID-19 rates despite being surrounded by a cluster of provinces with higher rates (Fig 2).

thumbnail
Fig 1. Maps associated with COVID-19 cases by province from February 19th to March 20th, 2020.

A) Quartiles were corresponding to rates smoothed through an empirical Bayes procedure. B) Excess or relative risk. Shapefiles obtained from the Stanford Libraries Earthworks: https://earthworks.stanford.edu/catalog/stanford-dv126wm3595.

https://doi.org/10.1371/journal.pntd.0008875.g001

thumbnail
Fig 2. Spatial clustering associated with rates of COVID-19 cases by province from February 19th to March 20th, considering queen contiguity weights.

A) Significant spatial clustering obtained through Local Indicators of Spatial Autocorrelation (LISA) comparisons. Four types of a cluster are possible: High-High, Low-Low, High-Low, and Low-High. For instance, the High-High cluster (red) indicates provinces with high values of a variable that are significantly surrounded by regions with similarly high values. B) P-values associated with the spatial clustering in A), C) Scatter plot associated with the smoothed rates vs. their corresponding spatially lagged values, including the associated linear regression fitting, whose slope corresponds to the Moran's I statistic, a global spatial autocorrelation measure. Shapefiles obtained from the Stanford Libraries Earthworks: https://earthworks.stanford.edu/catalog/stanford-dv126wm3595.

https://doi.org/10.1371/journal.pntd.0008875.g002

Selection of multivariate linear spatial model for COVID-19 spread

Since the variable hospital beds is strongly associated with variables GDP and number of physicians (Kendall correlation coefficients above 0.55), we eliminated it from all models to avoid multicollinearity (S1 Fig). We confirmed that a spatial model was necessary since the error term from the OLS fitting showed significant spatial autocorrelation (Moran’s I = 0.134, p-value = 0.025). Additionally, the LM and Robust LM statistics indicated that a spatial lag model was required since the spatial parameter (ρ) associated with the spatially lagged response was significant (LM = 10.669, p-value = 0.001; Robust LM = 13.557, p-value < 0.001), which did not occur with the spatial error model since the corresponding spatial parameter was not consistently significant (LM = 1.256, p-value = 0.262; Robust LM = 4.144, p-value = 0.042). Thus, only the spatial lag model was fitted obtaining a significant spatial parameter (ρ = 0.723, p-value<0.001), which indicated that the rate of an area in the linear model is affected by COVID-19 rates in neighboring areas (R2 = 0.877, σ2 = 0.146). Normality and homoscedasticity assumptions were reasonably satisfied.

Predictors of COVID-19 spatial spread in Iranian provinces

The significant variables associated with the model obtained through the selection scheme were the same as those associated with the model including all variables. This simplified model excluded population density, Consumer Price Index, and annual precipitation. The estimated coefficients were similar for both models; however, we analyzed the estimations associated with the model including all variables to consider effects controlled for these three variables. Hence, the variables that significantly impact the log-transformed number of COVID-19 cases include: the percentage of people settled in urban areas (p-value = 0.019), smoothed rate of people aged ≥60 years (p-value < 0.001), literacy rate (p-value = 0.006), average temperature (p-value < 0.001), number of physicians employed (p-value < 0.001), and the TEI (p-value = 0.035) (Table 2). A 10% increase in urban population or a 1% increase in the population aged ≥60 years has a percentage increase of 29.29% (95%CI 26.55–32.10%) and 46.65% (95%CI 26.54% -69.95%), respectively, on the number of COVID-19 cases. Moreover, an increase of 1°C in the temperature levels, an increase of 1 physician, or an increase of one deviation over the standardized TEI also have a percentage increase of 11.98% (95%CI 5.54–18.80%), 0.08% (95%CI 0.06–0.11%), and 16.98% (95%CI 1.14–35.30%), respectively, over the number of COVID-19 cases. Finally, a 1% increase in the literacy rate showed a percentage decrease of 10.44% (95%CI 3.18–17.16%) on the number of cases.

thumbnail
Table 2. Spatial lag models estimated via maximum likelihood to predict log-transformed COVID-19 case distribution between Iranian provinces (model including all variables and model obtained through a selection scheme).

https://doi.org/10.1371/journal.pntd.0008875.t002

Spatial lag predictors and province clusters

Quartile maps for each of the significant variables in the spatial lag model are shown in Fig 3. Finally, concerning bivariate LISA significant clustering (Fig 4), we observed a positive spatial relationship (Moran’s I = 0.341, p-value = 0.002) between urban population and COVID-19 rates; provinces with high urban rates surrounded by areas with high COVID-19 rates are the same as the ones in the High-High cluster for COVID-19, except for Mazandaran, and similarly for the Low-Low cluster, except for Bushehr. There is also a positive spatial relationship (Moran’s I = 0.279, p-value = 0.002) between the population aged ≥60 and COVID-19 rates. Both High-High and Low-Low clusters include similar provinces as the ones in the clusters for COVID-19, except for Qom and Alborz, which have significantly lower rates of people aged ≥60 years but are spatially surrounded by areas with high disease rates. Concerning literacy rates, we also identified a positive spatial relationship between literacy and disease rates (Moran’s I = 0.362, p-value = 0.005). The associated High-High cluster and that obtained for COVID-19 rates are formed by the same provinces, whereas in the south, Hormozgan and Bushehr have high literacy rates but are surrounded by areas with low disease rates.

thumbnail
Fig 3. Quartiles associated with all the explanatory variables significant in the spatial lag model with response variable the logarithm of the number of COVID-19 cases.

A) People settled in urban areas in 2016 (%). B) People aged ≥60 years, rates obtained through empirical Bayes smoothing. C) Literacy of population aged ≥6 years in 2016 (%). D) Average temperature (°C) of provincial capitals in 2015. E) Number of physicians employed by the ministry of health and medical education in 2006. F) Transportation Efficiency Index (TEI). Shapefiles obtained from the Stanford Libraries Earthworks: https://earthworks.stanford.edu/catalog/stanford-dv126wm3595.

https://doi.org/10.1371/journal.pntd.0008875.g003

thumbnail
Fig 4. Bivariate LISA's significant spatial clustering between each of the significant variables in the spatial lag model and the rate of cases with COVID-19 smoothed through the empirical Bayes approach.

The scatter plots associated with the variables vs. the spatially lagged smoothed rate of COVID-19 cases are presented as well, including the associated linear regression fitting, whose slope corresponds to the bivariate Moran's I statistic, a global spatial bivariate autocorrelation measure. A) People settled in urban areas in 2016 (%). B) People aged ≥60 years, rates obtained through empirical Bayes smoothing. C) Literacy of population aged ≥6 years in 2016 (%). D) Average temperature (°C) of provincial capitals in 2015. E) Number of physicians employed by the ministry of health and medical education in 2006. F) Transportation Efficiency Index (TEI). Shapefiles obtained from the Stanford Libraries Earthworks: https://earthworks.stanford.edu/catalog/stanford-dv126wm3595.

https://doi.org/10.1371/journal.pntd.0008875.g004

Concerning average temperature levels, the global spatial autocorrelation is negative (Moran’s I = -0.107, p-value = 0.103). The High-High clusters for temperature and COVID-19 rates are similar, except for Marzaki and Alborz, where there is significantly lower temperature surrounded by areas with high COVID-19 rates. In the south, there is a significantly high temperature with spatially lower disease rates.

There is a positive spatial relationship (Moran’s I = 0.302, p-value = 0.003) between the number of physicians and the COVID-19 rate. There is a High-High cluster in the north with a High-Low zone between formed by Marzaki, Qom, and Semnan, with a significantly lower number of physicians; however, they are spatially surrounded by areas with higher disease rates. Concerning the TEI, the spatial correlation is close to zero (Moran’s I = -0.096, p-value = 0.112), indicating a particular random global spatial relationship between TEI and the disease rate. The High-High cluster is the same as the High-High cluster for the disease, except for Mazandaran and Alborz, which have significantly low TEI but are surrounded by areas with high disease rates. In the south, two provinces, which formed a Low-Low cluster for COVID-19 cases, are now areas with high TEI spatially associated with areas with low disease rates.

Discussion

Here, we demonstrate that the rates of COVID-19 cases within Iranian provinces are spatially correlated. This could be due to the origin of the outbreak, which started on the north of Iran, and can be seen through an important province cluster with the highest number of COVID-19 cases that we found around Tehran and Qom. Several mathematical models have been used to model the COVID-19 outbreak, mostly focused on forecasting the number of cases and assessing the capacity of country-level healthcare systems to manage disease burden [3032]. In the present report, we demonstrate that the spatial relationship and socio-demographic factors associated with the provinces must be considered to model the disease adequately, and this report also highlights structural factors that may lead to inequities in COVID-19 spread. Of relevance, we highlight the role of social determinants of health in sustaining SARS-CoV-2 transmission and provide additional evidence that human mobility or province interconnectedness might be associated in favoring disease spread [33].

Importantly, our approach demonstrates that urbanization, aging population, education, average temperatures, number of physicians, and inter-province communications are associated with the case numbers amongst Iranian provinces. The obtained results do not consider the spatial effect, which is accumulated since the spatially lagged response is part of the explanatory variables, and they consider fixed values for all variables except the one being interpreted. Overall, these variables spatially correlate with the COVID-19 province clustering indicating a consistent association with the observed variables. The greatest increase in the number of COVID-19 cases is associated with people aged ≥60 years, urban population, and how well the provinces are communicated, with age having one of the most important associations, an increase of 1% in the corresponding rate implies a percentage increase of 46.65% over the number of cases. Of relevance, mortality attributable to COVID-19 complications is higher in this age group, and age increases the likelihood of developing the symptomatic disease and increased disease severity [34,35]. Nevertheless, the association with older age could have different meanings depending on the number of comorbidities, with some reports labeling COVID-19 as an age-related disease [36]. Our data demonstrate that the spatial spread of COVID-19 has a relationship with population aging structures, a concept that must be explored in this setting to obtain population-specific estimates and lethality and which could represent a significant structural inequality related to COVID-19 burden [37].

Urbanization rates also are associated with a percentage increase over the number of COVID-19 cases; we observed a similar association regarding province interconnectedness, which goes in line with recent information on human mobility and its effect in decreasing disease spread through social distancing [33]. Urbanization, as a demographic phenomenon, leads to increased interconnectedness and human mobility as well as increased population density; these two factors facilitate disease spread. Emerging zoonotic diseases similar to SARS-CoV-2 have been linked to major structural factors that have been reported in other studies, including population growth, climate change, urbanization, and pollution [38,39]. Thus, communication and the degree of urbanity, and what this implies in terms of pollution, overcrowding, among other factors, seem to be relevant to determine the number of COVID-19 cases and should imply geographical targets for public health interventions to monitor disease spread and disease containment [40].

The only effect associated with a decrease in the number of COVID19 cases in our study was attributed to literacy, which might reflect several factors that ultimately influence disease spread. Data from several countries, including Iran, identified that higher health literacy was associated with a lower number of COVID-19 cases, probably reflecting attitudes towards public health measures including social distancing, early disease detection, and hand hygiene [41,42]. Interestingly, this poses a potential public health intervention given that individuals with reduced health literacy, not only might have higher rates of COVID-19, but also increased likelihood for depression and impaired quality of life in suspected cases. Literacy’s protective effect on disease spread also indicates a strong influence on social inequity and vulnerability as risk factors for COVID-19 spread, particularly on the influence of health equity, which will likely define the long-term impact of COVID-19 in many developing countries [43].

Concerning average temperature levels, we were able to obtain information associated only with the capitals and not the provinces, being a limitation of the analyzed information, obtaining some inconclusive results. On one hand, the global spatial autocorrelation was negative, though not statistically significant, indicating that global areas with higher temperatures are spatially related to areas with lower disease rates. On the other hand, on the spatial linear model, we derived that more temperature is associated with more cases. However; the former result does not contradict the latter since the direct effect in each province of a variable over the response is different from the spatial relationship between two variables. The latter considers one of the variables as spatially lagged (COVID-19), and thus the direct effect between variables in the same province is not included. In fact, this problem occurs in all the bivariate analysis, so care should be taken in all the interpretations. Notably, our results are consistent with previous analyses which have analyzed the impact of climate on SARS-CoV-2 stability and spread [44]. However, these results should be further studied considering the climate data limitations, that we obtained mixed results, and that some studies suggest there is no evidence that spread rates of the disease decline with higher temperatures [45].

Our study had some strengths and limitations. We approached COVID-19 using spatial analysis, which allowed us to identify province-level factors that are associated with the disease spread and which may be shared by other countries with similar socioeconomic or geographic structures by potentially identifying targets for country-wide public health interventions. This approach considers disease spread beyond individual-specific factors and could also be used to monitor areas of a potentially high number of undiagnosed cases that could facilitate disease spread and the surge of delayed waves of COVID-19 after initial mitigation [46,47]. Methodologically, all our analyses consider the spatial nature of the data. We identified significant spatial clustering and in terms of the spatial multivariate linear model, by including a spatial effect, we consider that the number of cases in an area is affected by those in neighboring areas. In this way, a lack of independency between spatial units is considered, being independence assumed in a usual linear model, thus obtaining more precise estimations. Of course, other statistical methods are available for this task, as generalized linear mixed models or geographically weighted regression; however, they do not use spatial weights, making our results more comparable with the Moran’s I or spatial clustering, which are based on such weights. A limitation of our approach is that most of the variables used to explain COVID-19 disease rates were taken from previous years and not updates, given the unavailability of recent estimates. Furthermore, smoothed COVID-19 rates were calculated using a projection of the population in 2020 since the most recent census corresponds to 2016, thus rates could have slightly different values. In this sense, the explanatory variables were not projected since information of previous years was not always available; however, precise projections for each variable were out of the scope of this work; and, it is also possible that some variables have a time lagged effect over the response. However; the time lagged effect we included was unintentional and dependent on the information available and not considering a lagged time effect as defined by experts; for instance, for GDP we used a time lag of 16 years, when perhaps it should have been of fewer years. When obtaining estimates using both projected and population size in 2016, we observed no significant changes in the results, which confirms the robustness of our approach. In fact, with all the mathematical projection methods similar results were obtained. However, the projections by province could be improved by considering a demographic balance equation and probabilistic projection methods as the ones obtained by country by the UN [48]. In this sense, we suspect similar results would still be obtained since our projected values by country are similar to those obtained by the UN. We also observed that the smoothed and raw rates of COVID-19 cases were similar, with an absolute difference between them of at most 0.607 (considering rates for every 1000 individuals), this was probably due to Iran not having provinces with extremely small or large population size. Future work could be focused on evaluating spatio-temporal modeling, which could be useful to monitor disease spread and identify additional factors relating not only to transmission rates but also to transmission dynamics. Since COVID-19 is currently challenging health systems all over the world, science-centered public health decisions could benefit from spatial modeling to investigate larger factors targeted for public health interventions.

In conclusion, COVID-19 spread within Iranian provinces is spatially correlated. The main factors associated with a high number of cases are older age, high degrees of urbanization, province interconnectedness, higher average temperatures, lower literacy rates, and the number of physicians. Structural determinants for the spread of emerging zoonotic diseases, including SARS-CoV-2, must be understood in order to implement evidence-based regional public health policies aimed at improving mitigation policies and diminishing the likelihood of disease re-emergence.

Supporting information

S1 Fig. Kendall correlation coefficients between all potential explanatory variables obtained to identify multicollinearity.

https://doi.org/10.1371/journal.pntd.0008875.s001

(PNG)

References

  1. 1. Sohrabi C, Alsafi Z, O’Neill N, Khan M, Kerwan A, Al-Jabir A, et al. World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19). Int J Surg. 2020 Apr;76:71–76. pmid:32112977
  2. 2. Forouzanfar MH, Sepanlou SG, Shahraz S, Dicker D, Naghavi P, Pourmalek F, et al. Evaluating causes of death and morbidity in Iran, global burden of diseases, injuries, and risk factors study 2010. Arch Iran Med. 2014 May;17(5):304–320. pmid:24784860
  3. 3. Johns Hopkins Coronavirus Resource Center—Johns Hopkins Coronavirus Resource Center [Internet]. [cited 2020 Apr 17]. Available from: https://coronavirus.jhu.edu/map.html.
  4. 4. Eden J-S, Rockett R, Carter I, Rahman H, de Ligt J, Hadfield J, et al. An emergent clade of SARS-CoV-2 linked to returned travellers from Iran. Virus Evol. 2020 Jan;6(1):veaa027. pmid:32296544
  5. 5. How Iran Became a New Epicenter of the Coronavirus Outbreak | The New Yorker [Internet]. [cited 2020 Apr 17]. Available from: https://www.newyorker.com/news/our-columnists/how-iran-became-a-new-epicenter-of-the-coronavirus-outbreak.
  6. 6. Lachmann A. Correcting under-reported COVID-19 case numbers. medRxiv. 2020 Mar 18;
  7. 7. COVID-19: Iran records 4,585 coronavirus deaths as restrictions eased | Mena–Gulf News [Internet]. [cited 2020 Apr 17]. Available from: https://gulfnews.com/world/mena/covid-19-iran-records-4585-coronavirus-deaths-as-restrictions-eased-1.70954074.
  8. 8. GHO | By country | Iran (Islamic Republic of)—statistics summary (2002—present) [Internet]. [cited 2020 Apr 17]. Available from: https://apps.who.int/gho/data/node.country.country-IRN.
  9. 9. Bello-Chavolla OY, González-Díaz A, Antonio-Villa NE, Fermín-Martínez CA, Márquez-Salinas A, Vargas-Vázquez A, et al. Unequal impact of structural health determinants and comorbidity on COVID-19 severity and lethality in older Mexican adults: Considerations beyond chronological aging. J Gerontol A, Biol Sci Med Sci. 2020 Jun 29; pmid:32598450
  10. 10. Armitage R, Nellums LB. Water, climate change, and COVID-19: prioritising those in water-stressed settings. Lancet Planet Health. 2020;4(5):e175. pmid:32442491
  11. 11. Gross B, Zheng Z, Liu S, Chen X, Sela A, Li J, et al. Spatio-temporal propagation of COVID-19 pandemics. medRxiv. 2020 Mar 27;
  12. 12. Chipeta MG, Giorgi E, Mategula D, Macharia PM, Ligomba C, Munyenyembe A, et al. Geostatistical analysis of Malawi’s changing malaria transmission from 2010 to 2017. [version 2; peer review: 2 approved, 1 approved with reservations]. Wellcome Open Res. 2019 Jul 4;4:57. pmid:31372502
  13. 13. Sanna M, Wu J, Zhu Y, Yang Z, Lu J, Hsieh Y-H. Spatial and temporal characteristics of 2014 dengue outbreak in guangdong, china. Sci Rep. 2018 Feb 5;8(1):2344. pmid:29402909
  14. 14. Carmo RF, Nunes BEBR, Machado MF, Armstrong AC, Souza CDF. Expansion of COVID-19 within Brazil: the importance of highways. J Travel Med. 2020 Jun 27; pmid:32597484
  15. 15. Deutsch-Feldman M, Brazeau NF, Parr JB, Thwai KL, Muwonga J, Kashamuka M, et al. Spatial and epidemiological drivers of Plasmodium falciparum malaria among adults in the Democratic Republic of the Congo. BMJ Glob Health. 2020 Jun;5(6). pmid:32601091
  16. 16. Lin D, Cui Z, Chongsuvivatwong V, Palittapongarnpim P, Chaiprasert A, Ruangchai W, et al. The geno-spatio analysis of Mycobacterium tuberculosis complex in hot and cold spots of Guangxi, China. BMC Infect Dis. 2020 Jul 1;20(1):462. pmid:32611396
  17. 17. Presidency of the I.R.I. Statistical Center of Iran [Internet]. Statistical Center of Iran. 2020 [cited 2020 Jul 22]. Available from: https://www.amar.org.ir/english/
  18. 18. Iran Data Portal [Internet]. [cited 2020 Apr 17]. Available from: https://irandataportal.syr.edu/.
  19. 19. Performance Evaluation of the Provinces of Iran Reading To the Measures of Freight and Passenger Transportation [Internet]. [cited 2020 Apr 17]. Available from: http://www.ijte.ir/article_69681.html.
  20. 20. Islamic Republic News Agency. Iran’s coronavirus toll update March 22, 2020 [Internet]. 2020 [cited 2020 Jul 21]. Available from: http://archive.vn/nVmn5.
  21. 21. First-level Administrative Divisions, Iran, 2015 | Stanford Digital Repository [Internet]. [cited 2020 Sep 17]. Available from: http://purl.stanford.edu/dv126wm3595.
  22. 22. Rate Transformations and Smoothing Luc Anselin | Semantic Scholar [Internet]. [cited 2020 Apr 17]. Available from: https://www.semanticscholar.org/paper/Rate-Transformations-and-Smoothing-Luc-Anselin-Lozano-Koschinsky/88d8b02de84f97f556cfe0ef5a91a7df229cf363#paper-header.
  23. 23. Cressie Noel. Statistics For Spatial Data (wiley Classics Library). Revised. Hoboken, NJ: Wiley-interscience; 1993.
  24. 24. Moran PAP. Notes on continuous stochastic phenomena. Biometrika. 1950 Jun;37(1/2):17. pmid:15420245
  25. 25. Wrigley N, Cliff AD, Ord JK. Spatial processes: models and applications. Geogr J. 1982 Nov;148(3):383.
  26. 26. Anselin L. Local Indicators of Spatial Association-LISA. Geogr Anal. 2010 Sep 3;27(2):93–115.
  27. 27. Lopes D, Assunção R. Visualizing Marked Spatial and Origin-Destination Point Patterns With Dynamically Linked Windows. Journal of Computational and Graphical Statistics. 2012 Jan;21(1):134–154.
  28. 28. Ward M, Gleditsch K. Spatial Regression Models. 2455 Teller Road, Thousand Oaks California 91320 United States of America: SAGE Publications, Inc.; 2008.
  29. 29. Silvey SD. The lagrangian multiplier test. Ann Math Statist. 1959 Jun;30(2):389–407.
  30. 30. Sahafizadeh E, Sartoli S. Estimating the reproduction number of COVID-19 in Iran using epidemic modeling. medRxiv. 2020 Mar 23;
  31. 31. He J, Chen G, Jiang Y, Jin R, He M, Shortridge A, et al. Comparative Analysis of COVID-19 Transmission Patterns in Three Chinese Regions vs. South Korea,Italy and Iran. medRxiv. 2020 Apr 14;
  32. 32. Muniz-Rodriguez K, Fung IC-H, Ferdosi SR, Ofori SK, Lee Y, Tariq A, et al. Transmission potential of COVID-19 in Iran. medRxiv. 2020 Mar 10; pmid:32320641
  33. 33. Kraemer MUG, Yang C-H, Gutierrez B, Wu C-H, Klein B, Pigott DM, et al. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science. 2020 Mar 25;
  34. 34. Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020 Mar 28;395(10229):1054–1062. pmid:32171076
  35. 35. Wu JT, Leung K, Bushman M, Kishore N, Niehus R, de Salazar PM, et al. Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China. Nat Med. 2020 Apr;26(4):506–510. pmid:32284616
  36. 36. Santesmasses D, Castro JP, Zenin AA, Shindyapina AV, Gerashchenko MV, Zhang B, et al. COVID-19 is an emergent disease of aging. medRxiv. 2020 Apr 15; pmid:33006233
  37. 37. Kashnitsky I, Aburto JM. COVID-19 in unequally ageing European regions. 2020 Mar 18;
  38. 38. Liu Q, Cao L, Zhu X-Q. Major emerging and re-emerging zoonoses in China: a matter of global health and socioeconomic development for 1.3 billion. Int J Infect Dis. 2014 Aug;25:65–72. pmid:24858904
  39. 39. Wu T, Perrings C, Kinzig A, Collins JP, Minteer BA, Daszak P. Economic growth, urbanization, globalization, and the risks of emerging infectious diseases in China: A review. Ambio. 2017 Feb;46(1):18–29. pmid:27492678
  40. 40. Kost GJ. Geospatial Hotspots Need Point-of-Care Strategies to Stop Highly Infectious Outbreaks: Ebola and Coronavirus. Arch Pathol Lab Med. 2020 Apr 16;
  41. 41. Lin Y-H, Liu C-H, Chiu Y-C. Google searches for the keywords of “wash hands” predict the speed of national spread of COVID-19 outbreak among 21 countries. Brain Behav Immun. 2020 Apr 10; pmid:32283286
  42. 42. Chung RY-N, Dong D, Li MM. Socioeconomic gradient in health and the covid-19 outbreak. BMJ. 2020 Apr 1;369:m1329. pmid:32238351
  43. 43. Wang Z, Tang K. Combating COVID-19: health equity matters. Nat Med. 2020 Apr;26(4):458. pmid:32284617
  44. 44. Aboubakr HA, Sharafeldin TA, Goyal SM. Stability of SARS-CoV-2 and other coronaviruses in the environment and on common touch surfaces and the influence of climatic conditions: a review. Transbound Emerg Dis. 2020 Jun 30; pmid:32603505
  45. 45. Jamil T, Alam IS, Gojobori T, Duarte C. No Evidence for Temperature-Dependence of the COVID-19 Epidemic. medRxiv. 2020 Mar 31; pmid:32984240
  46. 46. Kissler SM, Tedijanto C, Goldstein E, Grad YH, Lipsitch M. Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period. Science. 2020 Apr 14; pmid:32291278
  47. 47. Li R, Pei S, Chen B, Song Y, Zhang T, Yang W, et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2). Science. 2020 Mar 16;
  48. 48. World Population Prospects—Population Division—United Nations [Internet]. [cited 2020 Jul 23]. Available from: https://population.un.org/wpp/Download/Probabilistic/Population/.