Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Spatial epidemiological study of the distribution, clustering, and risk factors associated with early COVID-19 mortality in Mexico

Spatial epidemiological study of the distribution, clustering, and risk factors associated with early COVID-19 mortality in Mexico

  • Ricardo Ramírez-Aldana, 
  • Juan Carlos Gomez-Verjan, 
  • Omar Yaxmehen Bello-Chavolla, 
  • Carmen García-Peña


COVID-19 is a respiratory disease caused by SARS-CoV-2, which has significantly impacted economic and public healthcare systems worldwide. SARS-CoV-2 is highly lethal in older adults (>65 years old) and in cases with underlying medical conditions, including chronic respiratory diseases, immunosuppression, and cardio-metabolic diseases, including severe obesity, diabetes, and hypertension. The course of the COVID-19 pandemic in Mexico has led to many fatal cases in younger patients attributable to cardio-metabolic conditions. Thus, in the present study, we aimed to perform an early spatial epidemiological analysis for the COVID-19 outbreak in Mexico. Firstly, to evaluate how mortality risk from COVID-19 among tested individuals (MRt) is geographically distributed and secondly, to analyze the association of spatial predictors of MRt across different states in Mexico, controlling for the severity of the disease. Among health-related variables, diabetes and obesity were positively associated with COVID-19 fatality. When analyzing Mexico as a whole, we identified that both the percentages of external and internal migration had positive associations with early COVID-19 mortality risk with external migration having the second-highest positive association. As an indirect measure of urbanicity, population density, and overcrowding in households, the physicians-to-population ratio has the highest positive association with MRt. In contrast, the percentage of individuals in the age group between 10 to 39 years had a negative association with MRt. Geographically, Quintana Roo, Baja California, Chihuahua, and Tabasco (until April 2020) had higher MRt and standardized mortality ratios, suggesting that risks in these states were above what was nationally expected. Additionally, the strength of the association between some spatial predictors and the COVID-19 fatality risk varied by zone.


COVID-19 is a respiratory disease caused by SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), which has caused almost twenty million cases worldwide and caused 790,000 deaths as of August 20th, 2020 [1]. SARS-CoV-2 is a highly contagious RNA virus from the Coronaviridae family, with a small genome of about 30,000 nucleotides closely related to the bat coronavirus (RaTG13) [2]. Given its global spread, there is an urgent need for scientific and health systems around the world to understand the epidemiology, pathogenicity, and mechanisms of immunological defences to develop possible therapeutic and public health alternatives to fight against one of the most outstanding threats to public health since Spanish Influenza over a hundred years ago [3].

According to the World Health Organization (WHO), the groups most susceptible to acquire infection and develop adverse outcomes are people with underlying medical conditions and older adults (>65 years old), particularly those living in nursing homes. Medical conditions which have been associated with increased susceptibility for adverse outcomes related to SARS-CoV-2 infection include chronic obstructive pulmonary disease (COPD), chronic kidney disease (CKD), cardiovascular diseases, liver diseases, moderate asthma, immunosuppression (HIV/AIDS, bone marrow transplantation, cancer treatment, and genetic immune deficiencies), and particularly, severe obesity, diabetes, and hypertension [4]. These associations may be related to the strong link between pro-inflammatory cytokines in response to infection and the pathogenesis of SARS-CoV-2, which can be seen in pneumonia patients with severe COVID-19 disease exhibiting systemic hyper-inflammation known as cytokine storm or as a secondary hemophagocytic lymphohistiocytosis [5].

According to the Organisation for Economic Co-operation and Development (OECD), in the last ten years, the countries with the highest prevalence of cardio-metabolic conditions that are now linked to increased risk of severe COVID-19, including obesity, type 2 diabetes, and hypertension, are Mexico and the United States of America (USA), when considering adults from 15 to 74 years old [6]. Thus, considering that the population pyramid is becoming more rectangular through time, indicating an aging population structure, the high rates of diabetes and obesity in Mexico are likely to lead to higher mortality rates attributable to COVID-19 even in younger populations [7].

In response to SARS-CoV-2 spread worldwide, mobility restrictions have been imposed to reduce community-level transmission; nevertheless, the early influence of human mobility, particularly internal and external migration, has primarily driven the spread of COVID-19 [8]. When considering the high transmissibility of SARS-CoV-2, human mobility gains relevance in the early stages of spread, in which people travelling from other countries can drive increasing transmission rates [9]. SARS-CoV-2 spread related to human mobility is relevant, particularly considering that the risk of transmission and adverse outcomes are related to inequalities and suboptimal socio-economic conditions [10]. In this sense, the risk of becoming infected and dying might increase in areas without optimum socio-economic conditions, such as spatial units in which people live in overcrowded households or without access to potable water or drainage.

Spatial analyses allow us to understand how mortality risks are distributed throughout a territory, detect spatial clusters, and measure how the effects of variables associated with this risk vary within any given territory. In this sense, several examples of geo-epidemiological studies in autoinflammatory diseases, specific syndromes, infectious diseases, and other conditions, have helped develop public health policies and knowledge of disease spread [1113]. In the present work, we performed spatial analyses to measure spatial relationships corresponding to the geographical phenomenon of the COVID-19 outbreak in Mexico and its associated fatality risks. We used statistical methods, including spatial clustering through local indicators of spatial autocorrelation and generalized geographically weighted regression to accomplish these objectives. Additionally, we aimed to identify spatial units or regions in Mexico that should be examined in more detail to better understand the propagation of the disease and its associated fatality risk in the early stages of COVID-19 spread.

Material and methods

Data sources

We obtained state-level variables concerning the 32 states of Mexico (Tables 1 and 2). Individual-level variables were obtained from Mexico’s epidemiological surveillance entity (Direccion General de Vigilancia Epidemiologica, Secretaria de Salud), the data set corresponding to observations until April 21st 2020 [14]. These variables concern health and very general socio-economic information associated with people who were suspected of having COVID-19 in Mexico (people who by their concerns were suspected of having the disease, presenting symptoms, or who were in contact with someone with COVID-19) and underwent real-time RT-PCR for SARS-CoV-2 confirmation. Available variables include the presence of diabetes, obesity, chronic renal problems (CKD), chronic obstructive pulmonary disease (COPD), pregnancy, hypertension, immunosuppression, cardiovascular disease, pneumonia, as well as age, and whether the patient was hospitalized, admitted to an intensive care unit (ICU), or required intubation. We grouped age into groups, as explained in the model selection process below, and we finally used three age groups for subsequent analyses: 10–39, 40–69, and 70 years old and over. We also computed the risk of death due to the disease on individuals who were tested for SARS-CoV-2, or mortality risk from COVID-19 among tested individuals (MRt), by considering as a death that which was recorded after having a positive test (the information concerning positive tests and death is available in the epidemiological data set), a method consistent with the official numbers. We aggregated all variables at a state level as counts and used them as relative frequencies (risks for the health variables and proportion of individuals for each age group) in all analyses.

Table 1. Features extracted for all the analyses by state used to predict the mortality risk from COVID-19 among tested individuals in Mexico.

Additional variables concerning socio-demographic, economic, mobility, and climatic features were obtained at a state-level. In terms of socio-demographic variables, from the National Institute of Geography and Statistics (INEGI), we extracted information concerning population density in various contexts: density as the number of people per km2 in 2015 and the proportion of people in a household living in a crowded place in 2017; literacy rate of population aged ≥15 years in 2015; people settled in rural areas in 2010 (%) (a location was considered as rural when there were <2,500 habitants); and the number of physicians available for every 1000 people in 2015, which was obtained from the National Institute of Public Health (INSP). In terms of economic variables, we obtained from INEGI the state contribution to gross domestic product (GDP) in 2018; however, we used the multiplicative effect between a dummy variable concerning the states with the biggest cities in Mexico and GDP, in which the linearity assumption with the transformed response was improved. We also obtained information concerning people living in poverty in 2018 (%), as it is defined and calculated by CONEVAL according to a multidimensional index obtained from per capita income and an index of social deprivation [15]. In terms of mobility, we extracted from INEGI information concerning internal migration, as the rate of people aged five years and over living in another state five years before 2014, and external migration, as the rate of people aged ≥5 years living in another country five years before 2014, both proxies of internal and external mobility, respectively. Concerning mobility, we also used the number of flights in 2019 by state, which we calculated from information associated with the number of flights by the airport in Mexico (Ministry of Communication and Transport) [16]. Finally, information concerning average temperature (°C) in March 2020 was obtained from the National Council of Water (CONAGUA).

The risk of mortality in tested individuals (MRt) was chosen as the dependent variable based on relevant indicators of COVID-19 epidemiology in Mexico. The number of tests is associated with detection rates. Given the likely under-detection of mild SARS-CoV-2 cases, particularly at the beginning of the outbreak, standardizing deaths by tested cases considers the extent of detection, which could similarly be influenced by structural factors [17]. The remaining variables obtained and calculated from the different data sources were treated as explanatory, except for hospitalization, ICU, and intubation, which we considered as control variables, being an approximate measure of the presence of severe COVID-19 cases and possibly access to services attending COVID-19 in a region.

COVID-19 MRt estimation by state

To determine the mortality distribution throughout the country, we obtained quantile maps associated with raw and smoothed MRts of COVID-19 cases, and, to consider the different age and sex structures in each state the risks were adjusted for these variables using the average of the 32 states population age and sex compositions as the standard [18]. The risks by state adjusted for sex and age were smoothed using an empirical Bayes estimator, a biased estimator that improves variance instability proper of risks estimated in small-sized spatial units [19]. However, the analyses were performed with both raw and smoothed risks to compare results. We also obtained maps concerning the standardized mortality ratio (SMR) adjusted for sex and age, understanding an SMR, as a comparison of the observed number of events by state to a national standard, the latter using the expected number of events considering as if risks in a state were the same as those at a national level. In all standardization processes, the age structure corresponded to the age groups: 0–9,10–39, 40–69, and 70+, which were derived from the process described in the model selection step.

Spatial weight estimation and spatial autocorrelation

To determine how much the MRt is spatially associated, first, we obtained queen contiguity weights [20], which consider as neighbours those states sharing at least a point in common, obtaining a squared matrix of dimension 32 with all entries equal to zero or one, where one indicates that two states are neighbours. From these neighbours, weights are calculated by integrating a matrix in a row-standardized form. Moran’s I statistic [21] was obtained as a measure of global spatial autocorrelation, and its significance was assessed through a random permutation inference technique based on simulations. To determine regions with similar mortality behaviours local indicators of spatial autocorrelation (LISA) adjusted for sex and age were obtained [22] and used to derive significant spatial clustering through four cluster types: High-High, Low-Low, High-Low, and Low-High. For instance, the Low-Low cluster indicates states with low values of a variable that are significantly surrounded by regions with similarly low values.

Spatial multivariable linear model

To determine which variables are associated with the MRt and the direction of the association, considering at the same time the spatial nature of the data, a multivariable Generalized Geographically Weighted Regression (GGWR) was fitted [23]. In this model, a dependent variable is measured for each spatial unit. Independent variables (inputs) are simultaneously considered as well, such that the corresponding parameters depend on the coordinates in which the state is spatially located (centroids in the case of polygons); therefore, a parameter is associated with each state and independent variable. To estimate such a model a weighting diagonal matrix is considered; we used Gaussian spatial weighting to generate it. These weights determine the relationship from any state to another in terms of the distance (Euclidean) between the centroids of states and a bandwidth. The bandwidth determines which spatial units are similar under the GGWR and can be selected using automatized methods; we used a cross-validation (CV) method with an adaptive scheme, i.e. a different bandwidth was used for each unit. Since the response variable is a count (number of deaths), a GGWR with a Poisson distribution and logarithm link function was used, including as offset term the number of people tested with COVID-19 in a logarithmic scale, thus modelling the MRt instead of just the number of deaths.

A global multivariable model for all states, a Poisson multivariable linear model (generalized linear model or GLM) with offset and a logarithmic link function, was also fitted, and significant variables were identified [24]. To obtain the best possible model, including variables with the greatest effect on the risks and satisfying as much as possible all statistical assumptions, we used the following selection process: 1) We fitted univariable Poisson models with offset and logarithmic link function, identified significant effects, and ordered them in absolute value from highest to lowest; 2) We identified variables with good and acceptable linear association with the log-transformed mortality risk by obtaining scatter plots between variables and the transformed risk, including a smoothed LOESS (locally estimated scatterplot smoothing) curve; and, when possible, variables were transformed to improve this assumption, as for GDP as explained above; 3) VIF was used to assess multicollinearity; thus, we fitted a model including variables with acceptable and good linear association, and eliminated any variable with a VIF>10; 4) We added to the resulting model one by one significant variables in the univariable models. First, we added the variable with the highest effect and fitted the associated model identifying whether VIF>10. If not, we added the variable, and if VIF>10, the model was not modified. We proceeded with the resulting model repeating the same process with the second-highest effect; and so on; 5) From this process, we ended with a model consisting of all variables with the greatest univariable effects (0.08 and above in absolute value), better linear behaviour, and without multicollinearity (VIF< = 10); 6) Associations between inputs were observed, mainly, between some explanatory variables and the confounders, but we wanted to obtain the pure effect associated with each explanatory variable after controlling for the confounders. Hence, we used residuals associated with three Poisson models in which ICU, intubated, and hospitalization were used as outputs, including a subset of the variables contained in the fatality risk model chosen in 5) as inputs. This process allowed us to obtain the effects each confounder has over the mortality risk eliminating the effects that explanatory variables have over them. These were confounders as in econometrics or experimental design analyses [25]. Thus, the interpretation of the estimated parameters is not of interest, but they are included to avoid spurious effects associated with the other variables we are interested in once controlling for the severity of the disease, 7) We evaluated goodness-of-fit and validated all model assumptions. For age, we obtained age groups: 0–1, 2–9, 10–19, …, 80–89, and 90 years old and over, fitted the corresponding univariable Poisson models with offset, and identified significant age groups and the direction of the association to obtain a new set of age groups: 10–39, 40–69, and 70 and more; and proceeded with these variables as with the others. Moran’s I and its significance were obtained for each explanatory variable and confounder in the model to determine if each variable’s values were spatially correlated.

The GGWR with the same variables as in the GLM was fitted, and multiplicative effects over the MRts (i.e. the exponentiated estimated parameters) associated with each variable were calculated. Maps associated with these effects were obtained, presenting only those associated with the explanatory variables significant in the global model. To compare the GGWR and GLM models, the squared sum of residuals (SSR), the coefficient of determination R2, and the Akaike Information Criterion (AIC) were obtained. All statistical analyses were conducted using R version 3.6.2 through the spdep, rgdal, and spgwr packages for spatial analyses and car package for the correlation analysis and GeoDa 1.14.0 were also used for some spatial analyses. The significance level for all analyses was 5% (i.e., alpha = 0.05).


MRt description and spatial autocorrelation of COVID-19 MRt between states

Maps for quartiles corresponding to the raw and smoothed MRts are similar, except for six states, Durango, Nayarit, and Zacatecas, which from a category using the raw risks move to the next higher quartile in the smoothed risks, the opposite occurring for Coahuila, Mexico, and Veracruz. Only the map concerning the smoothed values is shown (Fig 1A). We observed the largest MRts (raw and smoothed risks above 3.5% and 2.5%, respectively) and the greatest SMR (2.00–4.00) in Baja California, Chihuahua, Quintana Roo, and Tabasco (Fig 1B). Globally, there is a non-significant spatial autocorrelation (Moran’s I = -0.054, p = 0.390); however, there is a noticeable Low-Low cluster in the Northeast around San Luis Potosi, two Low-High clusters around Yucatan and Sonora, and one High-Low cluster around Colima (Fig 2). These Low-High clusters make sense since the associated states, especially Sonora, are surrounded by the states with the highest MRts in all the country. We verified that the spatial autocorrelation value and clustering were precisely the same using both the raw values or those obtained with the Bayes spatial technique.

Fig 1. Maps associated with COVID-19 deaths in Mexico by state until April 21st, 2020, adjusted for sex and age.

A) Quartiles corresponding to mortality risks among tested individuals smoothed through an empirical Bayes procedure. B) Standardized mortality ratio. Shapefile from under a CC BY license, with permission from Carlos Efraín Porto Tapiquén, original copyright 2015. Note: The quantities in brackets beside the categories correspond to the number of states in each category.

Fig 2. Spatial clustering associated with mortality risk from COVID-19 among tested individuals adjusted for sex and age in Mexico by state until April 21st, 2020, considering queen contiguity.

A) Significant spatial clustering obtained through Local Indicators of Spatial Autocorrelation (LISA) comparisons. There are four types of clusters: High-High, Low-Low, High-Low, and Low-High, e.g. a Low-Low cluster (blue) indicates states with low values of a variable significantly surrounded by regions with similarly low values. B) P-values associated with the spatial clustering in A), C) Scatter plot associated with the smoothed risks vs their corresponding spatially lagged values, including the associated linear regression fitting, whose slope is the Moran’s I statistic. Shapefile from under a CC BY license, with permission from Carlos Efraín Porto Tapiquén, original copyright 2015. Note: The quantities in brackets beside the categories correspond to the number of states in each category.

Fit of multivariable generalized global and linear geographical models for mortality risk from COVID-19 among tested individuals

We found the presence of serious multicollinearity problems through a preliminary analysis obtained by fitting a Poisson multivariable linear model with offset and including all variables since we obtained Variance Inflation Factors (VIF) with values above 50 for some variables. Additionally, a correlation analysis between all input variables was performed (Fig 3A), identifying very correlated variables. Thus, we followed the model selection process, as explained above. Our final global model included as inputs: diabetes (%), obesity (%), GDP, internal and external migration (%), age group of 10 to 39 years (%), physicians-to-population ratio, cardiovascular disease (%), ICU (%), hospitalization (%), and intubated (%). The latter three variables are confounders and, as discussed in the Methods section, to eliminate effects of other variables on them, they are used as residuals associated with appropriate Poisson models with independent variables: diabetes (%), obesity (%), age (%), physicians-to-population ratio, cardiovascular disease (%), plus ICU (%) for the model associated with the intubated variable. Inputs with a significant spatial autocorrelation were the percentages associated with external migration (I = 0.214, p-value = 0.027), age group of 10 to 39 years (I = 0.2, p-value = 0.037), intubated (I = 0.245, p-value = 0.014), and cardiovascular disease (I = 0.359, p-value < 0.001). We found that all variables were jointly significant (LR = 472.19, p-value<0.001). Additionally, through a PP-plot associated with the standardized Pearson residuals and associated Anderson-Darling test (A = 0.260, p-value = 0.689) and scatter plots between the fitted values and standardized residuals, and a similar plot using the root of the residuals instead, we determined that the link function and the way the explanatory variables are related with the response seems correct and there is a good fit. However, we found some overdispersion according to the Chi-squared statistic divided by its degrees of freedom of 1.972, but no significant overdispersion according to an analysis fitting a negative binomial model (p-value = 0.5). There were not any significant pairwise interaction effects between explanatory variables: those between diabetes and all health-related variables, physicians-to-population ratio, and age; those between obesity and all health-related variables; those between GDP and all non-health-related variables; those between migration (internal and external) and non-health-related variables; those between age and all variables and similarly for the physicians-to-population ratio.

Fig 3. Figures associated with the selection and goodness-of-fit of a multivariable generalized geographically weighted model (GGWR), with a poison distribution, offset, and a logarithm link function explaining the mortality risk from COVID-19 among tested individuals.

A) Correlation plot including the raw risks and B) Representation of Pearson residuals by state. Shapefile from under a CC BY license, with permission from Carlos Efraín Porto Tapiquén, original copyright 2015. Notes: (1) Offset: Log of the total number of people tested in a state; (2) Multiplicative changes in mortality risks (MRt) from the effects of different risk factors by state.

A multivariable GGWR with a Poisson distribution, adaptive kernel, and the same input variables was also fitted. The R2 was greater in the GGWR (0.871 vs 0.860) than in the GLM, and the SSR and AIC were both lower (3.35 vs 3.50 in the logarithmic scale and 287.15 vs 308.42, respectively). In Table 3, we summarise the multiplicative effects over the risks, exponentiated parameters, i.e. minimum, quartiles, and maximum, associated with each variable for all states in the GGWR and the global values corresponding to the GLM. Pearson residuals associated with the GGWR are shown in Fig 3B, showing that the worst fit corresponded to two states: Veracruz and Yucatan.

Table 3. Statistics by variable (minimum, maximum, and quartiles) corresponding to the effects by state* over the mortality risk from COVID-19 among tested individuals (MRt) under the GGWR and similar effects and p-values associated with a global model (all models consider a Poisson distribution, offset term**, and logarithmic link function).

Predictors of COVID-19 spatial mortality in Mexico

The exponentiated estimated parameters under the GGWR for each state were obtained for each variable (not shown), and maps were obtained based on the.shp file provided in [26]. Nevertheless, we only present the maps for those explanatory variables significantly associated with the log-transformed MRt in the global model (GLM) (Fig 4). These significant variables were diabetes (%), obesity (%), GDP, internal and external migration (%), age group of 10 to 39 years (%), and physicians-to-population ratio. However, care should be taken when the map for GDP is interpreted considering this variable corresponds to GDP on states not having the biggest cities, as an interaction term between GDP and a binary variable, thus having a fixed value of zero in four states, which are represented as blank spaces in the map. All estimated terms were interpreted by considering fixed values for all variables except the one being interpreted.

Fig 4. Multiplicative estimated effects over the tested mortality risk due to COVID-19 among tested individuals (MRt) under a GGWR for those variables that are significantly associated with the response under a global model (Poisson models with offset and a logarithmic link function).

Shapefile from under a CC BY license, with permission from Carlos Efraín Porto Tapiquén, original copyright 2015. Note: (1) Offset: Log of the total number of people tested in a state; (2) Multiplicative changes in mortality risks (MRt) from the effects of different risk factors by state.

Age and metabolic predictors of MRt in Mexico

Prevalence of obesity has a global (for all states) significant positive association (multiplicative effect) with the COVID-19 MRt (1.122; 95%CI 1.081–1.166). Locally the effect is between 1.120 in Quintana Roo to 1.125 in Sinaloa, having a similar effect on all Mexico, though slightly larger on the North and Centre. For diabetes (%), there is also a significant positive association with the MRt (1.152; 95%CI 1.079–1.230) with local effects between 1.147 in Colima to 1.154 in Yucatan, having a positive effect in all Mexico; but especially in the Centre, South, and Yucatan peninsula. On the other hand, the proportion of individuals between 10 and 39 years old has a significant negative association with the COVID-19 MRts (0.907; 95%CI 0.873–0.944), locally the effect is between 0.907 in Chiapas to 0.910 in Chihuahua, thus having a similar association in all the country.

Mobility and socio-economic predictors of MRt in Mexico

The percentage of internal migration in the spatial unit a patient comes from has a significant positive association with the COVID-19 MRts (1.267; 95%CI 1.183–1.356). Locally the effect is between 1.250 in Durango to 1.276 in Quintana Roo, having more effect in the South, centre, and Yucatan peninsula. The percentage of external migration in the spatial unit a patient comes from is also significantly positively associated with the COVID-19 MRt (1.349; 95%CI 1.052–1.728). Locally the effect is between 1.322 in Morelos to 1.361 in Baja California. This effect exists throughout Mexico, but it is more robust on the North and Baja California peninsula. The physicians-to-population ratio is also significantly positively associated with the MRt (1.866; 95%CI 1.625–2.142), with local effects between 1.845 in Colima to 1.865 in Yucatan, having a slightly larger effect on the North and Yucatan peninsula. Finally, GDP excluding the states having the biggest cities in the country (Nuevo León, México, Ciudad de México, and Jalisco) is also significantly positively associated with the MRt (1.304; 95%CI 1.198–1.429), locally the effects are between 1.298 in Oaxaca and 1.307 in Baja California.


In the present study, we demonstrate the relevance of spatial statistical analyses to understand how MRts are distributed throughout Mexico, the presence of spatial clusters related to COVID-19 MRts, the type of associations (negative or positive) with some independent variables and how these associations varied among states. Through our analysis, we were able to identify spatial units or regions in which care could have been considered in terms of mortality due to SARS-CoV-2. Identifying such areas could allow us to understand how these risks were distributed in the early stages of the outbreak.

Considering the global results, variables significantly associated with an increase in the COVID-19 MRt include variables associated with diabetes, obesity, external and internal migration, physicians-to-population ratio, and GDP in states that do not include one of the four largest cities in Mexico. Meanwhile, the proportion of individuals between 10 to 39 years old is significantly associated with a decrease in the MRt of COVID-19, which agrees with previous results analyzed using Mexican data. Cardio-metabolic diseases are associated with adverse COVID-19 outcomes and worst prognosis, probably since they are linked to chronic inflammation, which may synergize with the cytokine storm [27].

Regarding internal and external migration (%), these variables have a solid association with COVID-19 mortality among tested individuals, being external migration the variable with the second-highest positive association. This finding could be related to infectious diseases incidence given that greater movements of people transport pathogens from a geographical region to another, such as for Mycobacterium tuberculosis and/or HIV outbreaks [2830]. Another important variable is the physicians-to-population ratio, which is related to urbanization, population density, and poverty. In this context, from the correlation plot, it can be seen that there is a large positive association between physicians-to-population ratio, population density, and overcrowded households (%), and a large negative association between this variable and both the rural and poverty proportions by state.

On the other hand, the positive association between GDP in states that do not include one of the four largest cities and the MRt might be related to economic activities of these states that prevent adequate self-isolation measures and, unlike the states with the largest cities, do not have the health infrastructure to prevent death among clinical cases.

The greatest MRts, both raw and smoothed, corresponded to the states of Chihuahua, Quintana Roo, Tabasco, and Baja California (raw values of 4.84%, 4.55%, 4.27%, and 3.96%, respectively, observations until April 21st, 2020). It is noticeable that SMRs for such states are between 2 and 3, suggesting that the risks are above what is nationally expected in these states. We observed a spatial cluster concerning states with low risks; however, at least until the period of our study, there was no significant clustering of states with high risks using the LISA technique.

The fact that Quintana Roo is an important touristic centre might explain the higher number of COVID-19 cases. However, according to our models, the elevated MRt in this state is most strongly associated with the levels of diabetes, internal migration, and the physicians-to-population ratio, whose possible interpretation was discussed above. In Chihuahua and Baja California, the variables with a particularly strong positive association with MRt were external migration (%), obesity (%), and GDP. Meanwhile, in Tabasco, these variables corresponded to diabetes (%) and internal migration (%).

In terms of local effects, the physicians-to-population ratio (heavily associated with urbanicity, overcrowding households, population density, and less poverty) is the variable with the highest positive association with the MRt, but it has a relatively similar strength of association across all of Mexico. External migration (%) has the second-highest association with the MRts, particularly in those states on the north side of the country, in which the risks were the highest. In contrast, internal migration has the fourth-highest association, particularly in the Centre, South, and Yucatan peninsula. The association of the percentages of obesity and diabetes with MRts is similar for both; for obesity, there is relatively a similar association in all the country. Nevertheless, for diabetes, there is a slightly higher association in the Centre, South, and Yucatan peninsula. The associations have also been reported in other studies in other geographical regions [3133], and although it is still under study, they may be associated with inflammatory stages [34]. Finally, the proportion of individuals between 10 and 39 years old has a relatively similar effect throughout Mexico.

Economic effects, such as poverty, might be better studied in a disaggregated model, including it as a measure at the individual level. Unfortunately, such information is unavailable in the epidemiological data set, and at most, information concerning poverty at a municipality level could be attached to each individual, since the state is the spatial unit containing municipalities. However, we would still be using aggregated values, and in the methodology used to calculate poverty in Mexico, the state values are estimators obtained from a representative sample, whereas the municipality values are estimated through small area estimation techniques; thus, making the former more reliable. In this sense, all analyses were performed at a state instead of at a municipal level. The reason behind this decision is that there were many municipalities with zero values in the early stages of the pandemic, which is the focus of our analysis, and this would have resulted in many issues concerning model fit and convergence considering the probability distributions available for GGWR and other spatial linear models. To obtain similar results, we would have required different tools, such as zero-inflated geographical models, that are not currently available. Future studies could focus on the characteristics of municipalities with zero cases and mortalities, and focus on spatio-temporal analyses of these COVID-19 data.

Our results are robust in terms of the models fitted since most of the statistical assumptions were satisfied; and, though numerically there could be some overdispersion, after fitting a quasi-Poisson and a negative binomial linear model, we obtained similar results and no significant overdispersion according to the latter model. It is essential to notice that we are studying fatality risks associated with those individuals tested for the disease (MRt); thus, care should be taken if results want to be extrapolated. However, using the projected population in 2020 [35], instead of the tested individuals to model the mortality risks, we obtained similar results. We think that an analysis over the population is somewhat inaccurate because all health-related variables correspond to prevalence in individuals in the data set, which do not necessarily agree with those in the population, the same issue was present for age distribution. If we used population values by state, we would waste all the epidemiological data, except for the number of deaths. Additionally, in the analysis, the number of infected and/or deaths is even more poorly estimated when analyzing the early spread of the disease since, as in many countries, there was a highly selected nature of early tests.

It is essential to mention that a limitation in our results is that the associations should not be extrapolated to lower aggregation levels or individuals due to the potential for ecological bias [36]. Despite these limitations, we were able to identify some spatial predictors of mortality risks associated with COVID-19 at an early stage of the pandemic.

In conclusion, metabolic diseases (%), internal and external migration (%), physicians-to-population ratio, GDP per capita in states without the biggest cities, and the proportion of individuals in the age group between 10 to 39 years old were significantly associated with early COVID-19 mortality risks in Mexico. These predictors likely influence the growth of the pandemic moving forward, but variables, such as the prevalence of metabolic diseases, cannot be easily modified in the short-term. However, identifying variables in Mexico associated with the risks and in specific geographical areas could help in the identification of public policies that could limit the impact of future epidemics. Even though our study focused on the early stages of SARS-CoV-2 spread, its results allow us to understand how the pandemic evolved within Mexico and the possible measures that should be taken to control additional waves of COVID-19 or similar diseases in Mexico and specific zones of the country.


  1. 1. Coronavirus Update (Live): 22,787,319 Cases and 795,367 Deaths from COVID-19 Virus Pandemic—Worldometer [Internet]. [cited 20 Aug 2020].
  2. 2. Zhou P, Yang X-L, Wang X-G, Hu B, Zhang L, Zhang W, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579: 270–273. pmid:32015507
  3. 3. Javelle E, Raoult D. COVID-19 pandemic more than a century after the Spanish flu. Lancet Infect Dis. 2020; pmid:32795410
  4. 4. Bello-Chavolla OY, Bahena-López JP, Antonio-Villa NE, Vargas-Vázquez A, González-Díaz A, Márquez-Salinas A, et al. Predicting mortality due to SARS-CoV-2: A mechanistic score relating obesity and diabetes to COVID-19 outcomes in Mexico. J Clin Endocrinol Metab. 2020;105. pmid:32474598
  5. 5. McGonagle D, Sharif K, O’Regan A, Bridgewood C. The role of cytokines including Interleukin-6 in COVID-19 induced pneumonia and macrophage activation syndrome-like disease. Autoimmun Rev. 2020;19: 102537. pmid:32251717
  6. 6. Bello-Chavolla OY, Rojas-Martinez R, Aguilar-Salinas CA, Hernández-Avila M. Epidemiology of diabetes mellitus in Mexico. Nutr Rev. 2017;75: 4–12. pmid:28049745
  7. 7. Warning For The U.S.: Diabetes And Coronavirus Cocktail Killing Young People In Mexico [Internet]. [cited August 20th 2020].
  8. 8. Zhou Y, Xu R, Hu D, Yue Y, Li Q, Xia J. Effects of human mobility restrictions on the spread of COVID-19 in Shenzhen, China: a modelling study using mobile phone data. The Lancet Digital Health. 2020;2: e417–e424. pmid:32835199
  9. 9. Cruz-Pacheco G, Bustamante-Castañeda JF, Caputo JG, Jiménez-Corona ME, Ponce-de-León-Rosales S. Dispersion of a new coronavirus SARS-CoV-2 by airlines in 2020: temporal estimates of the outbreak in Mexico. Rev Invest Clin. 2020;72: 138–143. pmid:32584328
  10. 10. Bello-Chavolla OY, González-Díaz A, Antonio-Villa NE, Fermín-Martínez CA, Márquez-Salinas A, Vargas-Vázquez A, et al. Unequal impact of structural health determinants and comorbidity on COVID-19 severity and lethality in older Mexican adults: Considerations beyond chronological aging. J Gerontol A, Biol Sci Med Sci. 2020; pmid:32598450
  11. 11. Moroni L, Bianchi I, Lleo A. Geoepidemiology, gender and autoimmune disease. Autoimmun Rev. 2012;11: A386–92. pmid:22142547
  12. 12. Restrepo-Jiménez P, Molano-González N, Anaya J-M. Geoepidemiology of Sjögren’s syndrome in Latin America. Joint Bone Spine. 2019;86: 620–626. pmid:30776490
  13. 13. Chen WM, Zhou QR, Wang XM, Mao HS, Wang P, Zhou L, et al. [Integration of spatial epidemiology and molecular epidemiology used for study on tuberculosis]. Zhonghua Liu Xing Bing Xue Za Zhi. 2016;37: 1683–1686. pmid:27998421
  14. 14. Datos Abiertos—Dirección General de Epidemiología | Secretaría de Salud | Gobierno | [Internet]. [cited 18 Apr 2020].
  15. 15. Metodología para la medición de pobreza en México | CONEVAL [Internet]. [cited 20 Aug 2020].,y%20medici%C3%B3n%20de%20la%20pobreza.
  16. 16. Secretaria de Comunicaciones y Transportes: 5.5 Estadística Operacional de Aeropuertos / Statistics by Airport [Internet]. [cited 18 Nov 2020].
  17. 17. Bello-Chavolla OY, Antonio-Villa NE, Vargas-Vázquez A, Fermín-Martínez CA, Márquez-Salinas A, Bahena-López JP. Profiling cases with non-respiratory symptoms and asymptomatic SARS-CoV-2 infections in Mexico City. Clin Infect Dis. 2020; pmid:32857829
  18. 18. Preston S, Heuveline P, Guillot M. Demography: Measuring and Modeling Population Processes. 1st ed. Malden, MA: Wiley-Blackwell; 2000. p. 308.
  19. 19. Anselin L., Lozano-Gracia N., Koschinky, J. Rate Transformations and Smoothing. Techical report. Urbana, IL: Spatial Analysis Laboratory, Department of Geography, University of Illinois; 2006.
  20. 20. Cressie Noel. Statistics For Spatial Data (wiley Classics Library). Revised. Hoboken, NJ: Wiley-interscience; 1993.
  21. 21. Moran PAP. Notes on continuous stochastic phenomena. Biometrika. 1950;37: 17. pmid:15420245
  22. 22. Anselin L. Local indicators of spatial association-LISA. Geogr Anal. 2010; 27: 93–115.
  23. 23. Fotheringham AS, Brunsdon C, Charlton M. Geographically Weighted Regression: The Analysis Of Spatially Varying Relationships. 1st ed. Chichester, England: Wiley; 2002. p. 284.
  24. 24. Agresti A. Categorical Data Analysis. 3rd ed. Hoboken, NJ: Wiley; 2012. p. 752.
  25. 25. Kutner M, Nachtsheim C, Neter J, Li W. Applied Linear Statistical Models. 5th ed. Mcgraw-hill/irwin; 2004. p. 1396.
  26. 26. Porto Tapiquén CE, Orogénesis Soluciones Geográficas. Estados de México [Internet]. 2015 [cited 19 Feb 2021].
  27. 27. Bornstein SR, Dalan R, Hopkins D, Mingrone G, Boehm BO. Endocrine and metabolic link to coronavirus infection. Nat Rev Endocrinol. 2020;16: 297–298. pmid:32242089
  28. 28. Pareek M, Greenaway C, Noori T, Munoz J, Zenner D. The impact of migration on tuberculosis epidemiology and control in high-income countries: a review. BMC Med. 2016;14: 48. pmid:27004556
  29. 29. Weine SM, Kashuba AB. Labor migration and HIV risk: a systematic review of the literature. AIDS Behav. 2012;16: 1605–1621. pmid:22481273
  30. 30. Gross B, Zheng Z, Liu S, Chen X, Sela A, Li J, et al. Spatio-temporal propagation of COVID-19 pandemics. EPL. 2020; 131 58003.
  31. 31. McGurnaghan SJ, Weir A, Bishop J, Kennedy S, Blackbourn LAK, McAllister DA, et al. Risks of and risk factors for COVID-19 disease in people with diabetes: a cohort study of the total population of Scotland. Lancet Diabetes Endocrinol. 9: 82–93. pmid:33357491
  32. 32. Di Fusco M, Shea KM, Lin J, Nguyen JL, Angulo FJ, Benigno M, et al. Health outcomes and economic burden of hospitalized COVID-19 patients in the United States. J Med Econ. 2021; 1. pmid:33555956
  33. 33. Zhang N, Wang C, Zhu F, Mao H, Bai P, Chen L-L, et al. Risk factors for poor outcomes of diabetes patients with COVID-19: A single-center, retrospective study in early outbreak in China. Front Endocrinol (Lausanne). 2020;11: 571037. pmid:33071977
  34. 34. Schwarz B, Sharma L, Roberts L, Peng X, Bermejo S, Leighton I, et al. Cutting edge: Severe SARS-CoV-2 infection in humans is defined by a shift in the serum lipidome, resulting in dysregulation of Eicosanoid Immune Mediators. J Immunol. 2021;206: 329–334. pmid:33277388
  35. 35. Datos Abiertos de México—Proyecciones de la Población de México y de las Entidades Federativas, 2016–2050—Proyecciones de la Población de los Municipios de México, 2015–2030 (base 2) [Internet]. [cited 19 Nov 2020].
  36. 36. Pearce N. The ecological fallacy strikes back. J Epidemiol Community Health 2000; 54; 5: 326–327. pmid:10814650