COVID-19 incidence and mortality in the Metropolitan Region, Chile: Time, space, and structural factors

Demographic, health, and socioeconomic factors significantly inform COVID-19 outcomes. This article analyzes the association of these factors and outcomes in Chile during the first five months of the pandemic. Using the municipalities Metropolitan Region’s municipalities as the unit of analysis, the study looks at the role of time dynamics, space, and place in cases and deaths over a 100-day period between March and July 2020. As a result, common and idiosyncratic elements explain the prevalence and dynamics of infections and mortality. Social determinants of health, particularly multidimensional poverty index and use of public transportation play an important role in explaining differences in outcomes. The article contributes to the understanding of the determinants of COVID-19 highlighting the need to consider time-space dynamics and social determinants as key in the analysis. Structural factors are important to identify at-risk populations and to select policy strategies to prevent and mitigate the effects of COVID-19. The results are especially relevant for similar research in unequal settings.


Introduction
The novel coronavirus, known as Severe Acute Respiratory Syndrome 2 (SARS-Cov2), firstly described in China at the end of 2019, has produced the new coronavirus disease (COVID- 19), declared as a pandemic by the World Health Organization (WHO) on January 30, 2020 [1,2]. By July 31, 2020, this pandemic has caused over seventeen million confirmed cases and 668,910 deaths worldwide [3].
The Americas currently face the heaviest burden of the pandemic and Chile has been one of the countries more affected by this new virus [3]. Despite implementing testing, contact tracing, isolation practices, health messaging, and lockdown efforts [4,5], Chile reached 395,261 cases by July 30, 2020 and had one of the highest mortality rates, globally [6,7].  countries [42][43][44][45][46]. In Chile, age is highly correlated to COVID-19-related hospitalizations and deaths [13]. Although evidence currently is unclear, the percentage of children could help explain viral spread as studies have found children carry higher viral loads [47,48]. They also could face COVID-19-related health complications [49,50]. Additionally, other variables, such as proportion of migrants, the population density, and rurality were included, as they help explain both infection rates and mortality [51][52][53][54][55][56].
Health-related indicators, including the health system's features and health outcomes, have been used to identify contagion and mortality patters, primarily through pre-existing health conditions. We, first, included the population covered by the public health insurance, a selfreported variable that captures barriers to healthcare, and a dummy variable that identifies people that live far from a health center (2.5 kilometers or more). We expected these variables to explain mainly COVID-19-related health outcomes (deaths), as they are proxies for healthcare access and quality, and also reflect other broader inequalities across the population. We, then, constructed a dummy variable to capture whether individuals report having at least one COVID-19-related health condition, as defined by the Centers for Disease Control and Prevention (CDC) and the Chilean Ministry of Health, [13,57,58].
Finally, several variables that capture the socioeconomic level were considered. Water availability can be seen both as a socioeconomic and health indicator. Given the nature of the virus, access to water is expected to impact people's preventive behaviors, particularly as hand-washing is a key COVID-19 prevention strategy [59,60]. Poverty as an indicator captures several risk factors associated both with higher rates of infection and death. Poverty is associated with people's vulnerability to COVID-19 and impacts the variables of interest through several channels: poor sanitary conditions, access to information, inability to follow prevention strategies -such as hand-washing, the use of facemasks, and social distancing-, and lower access to healthcare, among others [37,[61][62][63][64]. The CASEN survey includes a variable called multidimensional poverty, an index of socioeconomic vulnerability that summarizes several aspects related to social determinants and COVID-19 (see S2 Table) [65]. Using this index has advantages and disadvantages. While it simplifies the estimation by capturing several socioeconomic factors into one variable, as used in other studies, it hinders the understanding of the underlying channels through which social determinants impact health outcomes [66,67]. Consequently, we estimate models including this index and others in which the different dimensions -income poverty, overcrowding, education, health insurance coverage, and job status-are considered independently. All these variables can explain infections and deaths by capturing a household's structural inability to follow the preventive measures (hand-washing, use of face masks, physical distancing, and quarantine) and seek healthcare (e.g. health literacy, financial feasibility, etc.) [68][69][70][71][72][73]. Finally, the use of public transportation and the availability of green spaces indicate people's ability to comply with social distancing strategies [56]. The indicators potentially impact infection rates and could also influence COVID-19 deaths, since they show mobility challenges within the city, a factor that could be relevant in health emergencies [74]. Maps and spatial information were obtained from the website of the Inter-Ministerial Committee on Geographic Information [75].
All dummy variables were transformed to capture the percentage of people reporting each condition in each municipality. Continuous variables represent the municipalities' average value. The survey's expansion factor at municipal-level was used to expand the sample's values to a population estimate.
A summary of the variables used and sources of information is presented in Table 2. For the independent variables, data includes information collected from March 3rd to July 30th, 2020.  Multidimensional poverty Percentage of people classified as poor using the index of multidimensional poverty, in each administrative unit. Index that measures deficiencies per household in education (schooling (7.5%), attendance (7.5%) and backwardness (7.5%)), health (insurance affiliation (7.5%), malnutrition (7.5%) and access to care (7.5%)), work and social security (occupation (7.5%), social security (7, 5%) and pensions (7.5%)), housing and environment (basic services (7.5%), housing status and overcrowding (7.5%) and environment (7.5%)) and networks and social cohesion (perceived social support and participation (3.33%), equal treatment (3.33%) and security (3.33%)) in the administrative unit (municipality)

Casen 2017
Income poverty Percentage of people who cannot meet their basic needs, estimated from a basic food basket in the administrative unit (municipality), using the CASEN definition.

Casen 2017
No overcrowding Percentage of people who declares living in a household with less than 2.5 persons per exclusive use bedroom in the administrative unit (municipality)

Casen 2017
Critical overcrowding Percentage of people who declares living in a household with 5 and more persons per exclusive use bedroom; and households without exclusive use bedrooms in the administrative unit (municipality)

Casen 2017
Years of education Average number of years of schooling of the population aged 15 and over in the administrative unit (municipality) Casen 2017 Self-employed worker Percentage of people who declares their job status being self-employed worker in the administrative unit (municipality) Casen 2017 The selected variables cover an ample spectrum of dimensions (demographic, health, and socioeconomic factors) related to the impact (infection and deaths) of COVID-19. In terms of the spatial analysis, they include variables to explain geographical variations based on compositional issues (i.e. differences in the kind of people who live in each place) as well as contextual explanations (i.e. differences between the places), understanding that both are relevant in the relationship between health and place [76].
Data analysis. To acknowledge the temporal dynamics of the disease, the study uses a standardized 100-days period since the first case was reported in each municipality. As for the role of space and place in explaining COVID-19 cases and deaths, we used a spatial analysis approach to look at the data. To identify the determinants of infection and mortality due to COVID-19 in the Metropolitan Region in Chile, we carried out multivariable regressions to explain the set of dependent variables, using the three groups of explanatory variables described above. As previously stated, one of the most salient features when performing municipality-level analysis in the Metropolitan Region in Chile is the presence of geographic clusters, particularly when looking at socioeconomic indicators. Considering the way in which COVID-19 is transmitted and the potential differences in access and quality of healthcare, we expect space and place to play an important role in determining both values in the independent and dependent variables, as well as their interactions, justifying the use of spatial analysis.
The first step in the spatial analysis is the definition of neighbors. In this case, we used a first-order queen contiguity matrix, i.e. we defined neighbors as all municipalities that share a border. Given the nature of the data-the existence of clusters in different areas of the region and the high heterogeneity in the size of the municipalities-a distance-based approach was discarded [77]. Analysis was also carried out using the k-nearest neighbors criterion (using the average number of contiguous neighbors, 5 neighbors) and main results hold.
To understand the impact of different variables in the incidence and mortality due to COVID-19 in the region, we use ordinary least squares (OLS) regressions. We estimate several models using an exploratory approach to the data. If space and place are relevant in

Variable Variable description Source
Green spaces Ratio of green area (m 2 ) per person living in the administrative unit (municipality)

System of Urban Development Indicators and Standards (SIEDU) of the National Institute of Statistics
Use public transportation Percentage of people in each administrative unit (municipality) that declares using public transportation regularly use public transportation Casen 2017 Water inside the house Percentage of people that declares having public water with an in-home tap in the administrative unit (municipality) Casen 2017 Public health insurance Percentage of the population that is covered by health insurance (public, private or armed forces) in the administrative unit (municipality) Casen 2017 Difficulty getting healthcare Percentage of the people that claims to have had a problem obtaining care (any of the following reasons: had no time, had no money, it is too expensive, asked but did not get an appointment with the physician, or got an appointment but further in time), in the last three months in the administrative unit (municipality) Casen 2017 Distance to health center Percentage of the people in each administrative unit (municipality) that declares living at less than 2.5 kilometers from a health center determining the variables in our analysis, then the errors in the OLS regressions are spatiallycorrelated and, consequently, the results are biased [78,79]. To test the presence of spatial correlation on the OLS regression, the global Moran's I test-that indicates both the existence and degree of spatial autocorrelation [79]-is applied to the regression's residuals, although there are several ways to perform this test [79][80][81]. The statistics is used to test the null hypothesis of spatial randomness, showing whether the residuals are randomly distributed [82,83]. If the null hypothesis is rejected, the OLS analysis needs to be adjusted to consider the spatial effects.
There are two main strategies to estimate spatial autoregressive models: spatial lag and spatial error. The spatial lag model (also known as contagion model) incorporates space as a right hand-side variable, estimating a coefficient for the spatial effect; the error model does not incorporate spatial as a covariate, but includes it in the structure of the residuals [84][85][86]. Conceptually, spatial lag models seem more appropriate to adjust for spatial autocorrelation in the case of infections, since it is expected that the number of COVID-19 cases in one municipality affect the cases in the neighboring areas, the contagion effects. However, this is not necessarily true for mortality, particularly once controlling for the case incidence rate. In this case, a spatial error model appears more suitable, since unobserved spatial effects are expected to drive the spatial autocorrelation in the residuals. Consequently, spatially correlated regressions of cases are adjusted using a spatial lag model, and an error model is utilized for the death regressions.
Incidence and mortality data were collected and calculated using Microsoft Excel. Descriptive statistics and multivariable regressions were estimated using STATA, and spatial analysis and visualizations were conducted in GeoDa. Table 3 shows the descriptive statistics of the sample. First, it is observed the large heterogeneity in most variables between municipalities, both in the dependent and independent variables. As stated before, this reflects the different realities within the Metropolitan Region, as well as the differences in terms of COVID-19 outcomes. Second, spatial autocorrelation appears as statistically significant and positive for all the variables of interest (different measures of infection and mortality), as well as many of the independent variables, particularly those related to social determinants of health, showing that several of the variables of interest tend to cluster spatially.

Data and spatial tests
This result gives a first warning on the potential effect of space between COVID-19 outcomes and municipal-level features. Spatial autocorrelation and measures can be broadly classified into global and local measures [87]. Moran's I is a global test that indicates the presence of spatial correlation; a local test can be used to answer where this correlation is. In this case, the Gi � -an statistic that indicates the extent to which a location is surrounded by a cluster of high or low values-is used to identify areas where hot and cold spots detection of selected variables respect to the global average [87,88]. Fig 2 shows the results for the Gi � tests in selected dependent and independent variables. A common feature in the use of local measures of autocorrelation is multiple and dependent testing: because the same hypothesis is tested several times (and using similar data), statistically significant results will be found just by chance (false discovery rate). In this case, figure shows values without this correction, which can lead to over identification of these clusters [89]. The figure exhibits clusters around the Santiago downtown area vs peripheral municipalities, as well as an east-west pattern, particularly for socioeconomic variables.

Infections.
To identify how different variables relate to COVID-19 infections and deaths, several multivariable regressions were estimated. Table 4 presents the results for the infectionrelated dependent variables. As shown in Table 1, each variable has three different ways to be measured: cumulative incidence rate (columns 1 and 2), peak of cases (columns 3 and 4), and days to the peak (columns 5 and 6). Each model is estimated using multidimensional poverty (even columns) and a set of socioeconomic variables (odd columns).
First, it is noted that the determinants of level and change in infections differ. As expected, the use of public transportation shows a significant and positive association with cumulative incidence rate; results for change-type regression (columns 5 and 6) are less consistent, with the share of people 65+, rurality, self-employment, green spaces, and difficulty to get healthcare showing significant coefficients. Second, multidimensional poverty appears as important to explain the number of cases (columns 1 and 3), while the effect vanishes when looking at the impact of a set of socioeconomic variables instead of the index of socioeconomic All variables are expressed as the share of the municipality's total population, except for "Population density", "Green spaces", "Years of education" and "Minutes in public transportation" that report the municipality's average.Min and Max refer to the minimum and maximum values at municipality, not individual level. Significance level ��� p<0.01 �� p<0.05 vulnerability (columns 2 and 4). The effect, as expected, is also positive showing that multidimensional poverty is a risk factor for COVID-19 contagion at the municipality level. Third, the model's overall fit is better in the level-type regressions, explaining 60% of the variation in the dependent variables. In the case of estimations 5 and 6, count models using Poisson regression were also estimated; results-in terms of significance and sign of the coefficients-hold when using this alternative. Jarque-Bera tests fail to reject the null hypothesis of normality of errors. Finally, Moran's I tests for spatial autocorrelation of the residuals show that the hypothesis that OLS residuals are distributed randomly in the space cannot be rejected for all models.
Deaths. Table 5 presents the same set of estimations for the mortality variables. In this case, four different models are estimated for each one of the independent variables, cumulative mortality rate (columns 1 to 4), peak of deaths (columns 5 to 8), and days to the peak of deaths (columns 9 to 12). As before, each model is estimated using either the multidimensional poverty index or a set of socioeconomic factors; additionally models are estimated including and

PLOS ONE
COVID-19 incidence and mortality in the Metropolitan Region, Chile excluding the cumulative incidence rate of cases as explanatory variable (even and odd columns, respectively).
The first result is that the determinants of infections and deaths are not the same. However, as expected, some variables seem to explain variation in both types of variables. As in the case of infections, level-type and change-type estimations present common and specific determinants. For deaths, for level-type variables (columns 1 to 8) the share of people over 65 years old, population density, multidimensional poverty, and the prevalence of cases have significant and positive coefficients; overcrowding and distance to a health center also contribute to explain whether a municipality reaches the peak of cases faster or slower. Just like in the case of infection models, multidimensional poverty captures an effect that is not explained by a broad set of socioeconomic factors. The model also does a better job explaining cumulative rates than peaks and days to the peak, and the overall fit (R 2 ) is larger than for the infection regressions in Table 4. As before, using Poisson regressions for days to the peak does not change the main results. Jarque-Bera tests fail to reject the null hypothesis of normality of errors. Unlike the infection regressions, in this case, the hypothesis of a random spatial

PLOS ONE
COVID-19 incidence and mortality in the Metropolitan Region, Chile distribution of the residuals is rejected in 8 out of 12 cases, highlighting the need to use spatial regression models to account for the presence of spatial autocorrelation.

Simplified models and spatial regressions
Based on the results from Tables 4 and 5, a simplified set of regressions is estimated. Some potential problems with inference are related to the degrees of freedom due to a large number of explanatory variables and the relatively small sample (n = 52), as well as the existence of multicollinearity between the variables. As for the previous estimations, these results should be interpreted with a conservative criterion. Table 6 shows these reduced models, based on the previous results (infection and deaths OLS regressions). As emphasized above, both variables (cases and deaths) share some explanatory variables and differ in others. In this case, it is also observed that level-type and change-type models have different determinants. Notably, the overall fit of the model using a reduced set of independent variables is similar to the one  Finally, considering the presence of spatial autocorrelation in the OLS residuals, spatial regressions are used to control for these effects. As discussed above, a spatial lag regression is used for the infection model (column 1), while spatial error regressions are estimated for death models (columns 2 to 10). In both cases, the same specification used in Tables 3-6 was  utilized. First, adding a spatial dimension removes the spatial correlation in the residuals in six cases where OLS residuals show spatial autocorrelation: days to peak of cases (column 1), peak of deaths (columns 2 to 5), and days to peaks of deaths (columns 6 to 10). However, overall the results improve, reflecting the addition of a previously omitted significant variable. Not only does overall fit increase (R 2 ) but results, in terms of individual coefficients, become more consistent. As Table 7 shows, the percentage of children and the rurality reduces the magnitude in the peak of deaths, but reduces the days to the peak. The opposite occurs for multidimensional poverty, again, a risk factor to explain the level and velocity of deaths. The speeding effect is also observed for the percentage of people 65+, migrants, overcrowding, and distance to health centers, while the years of education increase the number of days to reach the peak. As before, population density shows positive and significant coefficients for the peak of deaths regressions.

Discussion
Using municipality-level data on 52 administrative units, the article explored the effects of different sets of variables, acknowledging the need to consider a broad set of variables, as well as time dynamics and spatial effects in the analysis. Several conclusions are drawn from the results. First, there are common and idiosyncratic elements that explain the prevalence and dynamics of infections and mortality. It is necessary to recognize these different approaches when discussing the "impact" on COVID-19. The proposed models better explain variation in levels of infections and deaths than changes (measured as days to the peak) due to conceptual issues and measurement issues. Among conceptual issues, different variables affect different outcomes; research on the outcomes of COVID-19 needs to incorporate varied perspectives (measures of impact and determinants) to shed light on the problem. For measurement, due to data restrictions, the incidence variables are precise in capturing the underlying concept (scale of impact), while the use of incidence at the peak and days to the peak have a less clear interpretation. Better data could help improving these estimations by defining, for example, the share of cases within a given period, or the growth rate of cases. Second, as long as different models are required to tell different stories (34 estimations in this case), a significant part of the municipal variation in infections and deaths can be explained by a rather small number of variables (as in Table 6). Results highlight the role of social determinants of health in explaining the dissimilar impacts of COVID-19 in the Metropolitan Region as well as other findings in other countries about unequal distribution of COVID-19 [37,[90][91][92]. In our study, the multidimensional poverty index informs COVID-19 infections and deaths, capturing the complex nature of the problem, highlighting the role played by structural determinants-particularly with poverty and vulnerability-and reflecting similar outcomes to other studies [93][94][95][96]. This result has more relevance considering that COVID-19 also has an impact on socioeconomic factors, generating a vicious circle between both problems [97][98][99]. Elucidating social determinants as an indicator of impact can inform policy response and future prevention efforts. Third, the results identify different types of variables that explain the COVID-19 outcomes: demographic, health-related, and socioeconomic. However, these determinants can also be grouped according to their degree of changeability. Some of the relevant indicators can be seen-at least in the short run-as fixed (such as poverty and age distribution), while others are more flexible (like the use of public transportation and the availability of a health center). From a public policy perspective, this classification is useful: the first group of indicators can be assumed in the short-term as "explanatory" but be used for long-term planning while the second group of indicators are "policy tools" and can be used as control knobs to short-term responses to the pandemic.

Limitations
The study's limitations must be taken into account when interpreting the results. First, there are time-differences between dependent and independent variables: while dependent variables reflect information between March and July 2020, independent variables primarily originate from a survey carried out three years earlier. Unfortunately, there is no other municipalitylevel representative source of data with more updated information. However, most of the variables are expected to be similar today as aggregated data generally changes slower than individual-level data and most variables reflect structural factors-such as demographic and socioeconomic features-that are not expected to change significantly in a three-year period. "Policy tools" variables (such as the patterns of use of public transportation) experience similar change in absence of a policy shock.
Second, choosing an ideal methodology to measure the relevant outcomes can present a challenge as different specifications can lead to different results. Our analysis confronted this challenge by using several perspectives (infections and deaths, and levels and change). Moreover, although we use publically available, official reports [39][40][41] and secondary databases, previously utilized in other studies in Chile [5,29], data could be still subject to some bias, as shown by the recent discussion around cases and death in the country [100]. In addition, we made methodological choices when selecting the unit of analysis focusing on municipalities and using percentages of individuals instead of, for example, households, to calculate the independent variables. Each choice has ramifications for interpretation. The small number of observations makes the precision of the inference difficult. We reported several models to show different perspectives and test different hypothesis. Also, the analysis does not delve into the scale of the reported coefficient; although several estimations allow comparisons between models, the meaning of the coefficients in each regression needs to be done carefully, particularly considering the presence of spillover effects in the spatial lag models, which depend on the degree of spatial correlation between the observations [86]. Consequently, interpretation of the results and their significance should follow a conservative criterion.
Finally, the definition of neighbors is crucial one since it could be driven the results; although in this case the selection of contiguous units is justified by the features of the data (the existence of units of different sizes and the spatial patterns of the variables of interest). The Metropolitan Region is a good case for spatial analysis but other geographical areas could also be of interest to understand the COVID-19 dynamics and policy responses (e.g. for establishing regional or international sanitary customs).

Conclusions
This multi-perspective analysis of the COVID-19 impact in the Metropolitan Region of Chile highlights patterns and dynamics of the disease and the need to investigate social determinants of health and spatiotemporal dynamics in analyzing COVID-19 [90]. The study also prompts further research questions including comprehensive effects once the pandemic ends. We used multiple indicators-for example, a 100-day period to understand evolving factors-but recognize the analysis is a snapshot of an ongoing pandemic as well as our measure for the speed of change. We based our estimation on municipalities' structural features; the understanding of COVID-19 outcomes can also improve by adding real-time variables related to people's behavior, such as the percentage of people using facemasks, or better data on mobility within and between geographical areas. Overall, while improved data availability and quality can expound on changes in COVID-19 outcomes, we believe the analysis contributes to the worldwide effort for understanding the social determinants and effects of the COVID-19. Additionally, other spatial methods could be used to explore the data, as well as a deeper analysis of the scale of the effects presented.
We expect this study would be useful for policymakers in Chile, particularly in assessing future prevention and management strategies. In this line, the next generation of COVID-19 policies should, for example, take into account these results when designing quarantines and mobility restrictions (considering the spatial dynamics of the disease), defining people-at-risk and (considering the relevance of multidimensional poverty), and implementing short and long-term strategies (considering the role of public transportation). Additionally, we hope the results can encourage the implementation of evidence-based solutions and tailored interventions to help minimize the negative effects of the pandemic tackling social inequalities in other contexts.
Supporting information S1