Spatial analysis of COVID-19 incidence and the sociodemographic context in Brazil

Background Identified in December 2019 in the city of Wuhan, China, the outbreak of COVID-19 spread throughout the world and its impacts affect different populations differently, where countries with high levels of social and economic inequality such as Brazil gain prominence, for understanding of the vulnerability factors associated with the disease. Given this scenario, in the absence of a vaccine or safe and effective antiviral treatment for COVID-19, nonpharmacological measures are essential for prevention and control of the disease. However, many of these measures are not feasible for millions of individuals who live in territories with increased social vulnerability. The study aims to analyze the spatial distribution of COVID-19 incidence in Brazil’s municipalities (counties) and investigate its association with sociodemographic determinants to better understand the social context and the epidemic’s spread in the country. Methods This is an analytical ecological study using data from various sources. The study period was February 25 to September 26, 2020. Data analysis used global regression models: ordinary least squares (OLS), spatial autoregressive model (SAR), and conditional autoregressive model (CAR) and the local regression model called multiscale geographically weighted regression (MGWR). Findings The higher the GINI index, the higher the incidence of the disease at the municipal level. Likewise, the higher the nurse ratio per 1,000 inhabitants in the municipalities, the higher the COVID-19 incidence. Meanwhile, the proportional mortality ratio was inversely associated with incidence of the disease. Discussion Social inequality increased the risk of COVID-19 in the municipalities. Better social development of the municipalities was associated with lower risk of the disease. Greater access to health services improved the diagnosis and notification of the disease and was associated with more cases in the municipalities. Despite universal susceptibility to COVID-19, populations with increased social vulnerability were more exposed to risk of the illness.

, which can either course asymptomatically, as a flu-like syndrome, or evolve to acute respiratory distress syndrome (ARDS). On January 30, 2020, the World Health Organization (WHO) declared the disease a public health emergency of international concern. On March 11, 2020, the WHO declared COVID-19 a pandemic [1][2][3][4]. The first case in Brazil was reported on February 25, 2020. On March 20, the Brazilian Ministry of Health confirmed community transmission of SARS-CoV-2 throughout the country's territory and adopted mitigation measures since then to control the pandemic [5].
As of September 15, 2020, the WHO had recorded 29,155,581 confirmed cases of COVID-19, with 926,544 deaths [6]. Brazil is currently the world's third leading country in number of COVID-19 cases, with 4,345,610, and second in the number of deaths, with 132,006, for a case-fatality rate of 3.0% [7].
Since the vaccines are still in the experimental phase and there is no scientific evidence that corroborates the efficacy and safety of antiviral drugs in COVID-19, nonpharmacological interventions are priorities for reducing the number of cases, thereby avoiding overload of health services. Promotion of social distancing, restricting circulation of persons, wearing masks, and spreading information on measures in personal hygiene and prevention have been identified as the main strategies for fighting the disease [8]. Still, many of these strategies are not feasible for millions of individuals who live in irregular housing settlements, in territories characterized by increased social vulnerability, with precarious housing and sanitation.
Recently published Brazilian and international studies have found relations between sociodemographic, environmental, and healthcare factors and COVID-19 incidence, where structural conditions contribute to exposure to risk and the capacity for the community's recovery from the pandemic [9][10][11][12][13][14][15][16][17][18][19][20]. A study conducted in the Ceará state (Brazil), investigated the correlation of the human development index by municipalities with the incidence of COVID-19 [13]. A study carried out in the United States used the Gini index as a measure of income inequality [19]. Variables such as mean income, health care facilities, education level and race have also been included as potential risk factors for the disease [10,11,[14][15][16]20]. Nevertheless, the association of these factors with COVID-19 remains to be better understood.
Brazil is a country of continental proportions and major social and economic heterogeneity, and like many other emerging countries, it presents great potential for spread of the disease.
Knowledge of disease's spatial dynamic and its relations with social determinants is essential for the identification of areas with increased potential for spread of the infection, prioritization of prevention and control measures in these areas, implementation of more restrictive social distancing, and the health system's preparation for treating cases.
In this sense, the use of tools to understand socioeconomic factors and levels of inequality associated with the development of the disease are relevant, such as the Gini coefficient, which is usually used to measure income inequality, being always non-negative with values between zero and one [21,22].
Geographic information system (GIS) has been used to assess the spatial distribution of infectious diseases. In Brazil, the COVID-19 Panel presents updated data described in graphs, tables and maps. This data map can aid in analyzing the spread of COVID-19 and improving the quality of care. The characterization of risk areas could contribute to stakeholders decision-making during the pandemic. Thus, statistical techniques for spatial analysis to help determine relationship between several explanatory variables and disease outbreak.
This study aimed to analyze spatial distribution of COVID-19 incidence in Brazilian municipalities and to investigate the association between incidence of the disease and sociodemographic determinants to better understand the social context and the epidemic's spread in the country.

Design
This analytical ecological study evaluated the association between demographic, socioeconomic, and healthcare covariables and COVID-19 incidence. The analytical units were Brazil's 5,570 municipalities. Brazil consists of 26 states and the Federal District. The states are grouped administratively into five major geographic regions (Fig 1).

Data collection
The numbers of confirmed COVID-19 cases by municipality were obtained from the Coronavirus Panel, updated daily by the Ministry of Health (https://covid.saude.gov.br/). The study period was February 25 to September 26, 2020. COVID-19 incidence by municipality (outcome) was calculated as the ratio between the absolute number of cases and the resident population in the municipality, multiplied by 10 4 . Data on the resident population by municipality correspond to the estimates by the Federal Accounts Court (TCU) for the year 2019 (http://tabnet.datasus.gov.br/cgi/deftohtm.exe? popsvs/cnv/popbr.def), based on census data for 2010 by the Brazilian Institute of Geography and Statistics (IBGE).
The study used demographic, socioeconomic, and healthcare covariables to investigate possible associations with the outcome. These variables were selected due to their public availability and relevance as social determinants of health. In the particular case of health determinants, it is important to highlight the relationship between variables of availability of health professionals (such as doctors and nurses) and socioeconomic development, since locations with greater development in general also have a more developed job markets, which, in turn, attracts such professionals. In this sense, the availability of these professionals is a proxy for social inequality [23]. Table 1 shows the covariables.

Data analysis
Local empirical Bayesian smoothing was used to reduce the effect of the instability in COVID-19 incidence in Brazilian municipalities, weighting the incidence of the disease in a municipality with the incidence in the neighboring municipalities. Besides, since the incidence rates' distribution was not Gaussian, log transformation was applied to approach it to normal distribution. The study's outcome was thus the log COVID-19 incidence with Bayesian smoothing (LOGCOV), hereinafter "incidence of the disease" or "COVID-19 incidence". Due to the large number of covariables, a selection was performed according to the criteria of correlation with statistical significance and epidemiological characteristics. Correlation between the outcome and the covariables was analyzed by Spearman's correlation coefficient. A correlation matrix was constructed to identify collinearity between the covariables. In this stage, covariables that presented significant correlation with incidence of the disease at 5% (p < 0.05) were selected for the modeling. In cases of correlations greater than 0.5 between the covariables, only the covariable that added the most to the linear model was included. The outcome's spatial dependence was measured with global Moran's index (GMI). Clusters of spatial dependence were identified by calculating the Local Index of Spatial Association (LISA). We then constructed the LISA spreading maps (LISA Map) and the significance map.

Global and local models
Global and local regression models were used to identify the best fit for COVID-19 incidence in Brazilian municipalities. The global regression models were ordinary least squares (OLS), spatial autoregressive model (SAR), and conditional autoregressive model (CAR). The OLS includes the traditional linear regression approach, taking into account the independence of observations. Diagnosis of collinearity between the covariables selected by the OLS model was verified by variance inflation factor (VIF), defining VIF values less than 10 as absence of collinearity [24]. Since the global Moran's index detected spatial dependence of COVID-19 incidence, it was necessary to control this dependence using the SAR and CAR spatial models. The SAR model incorporates the spatial dependence of the outcome (incidence of the disease). The CAR model includes the spatial effect jointly in the model's random component (error).
Among the local models, we opted to use multiscale geographically weighted regression (MGWR) to fit a regression model to each area datum, considering a bandwidth (neighborhood). The bandwidth selection method was bi-square adaptive kernel, which removes the effect of the analytical units outside the neighborhood area [25]. MGWR also employs corrected Akaike information criterion (AICc) to indicate the best size of bandwidth for each covariable. Global Moran's index was used to identify the spatial dependence of the three model's residuals.
The criteria for comparison of the final fit of the OLS, SAR, and CAR models were the R 2 determination coefficient and the Akaike information criterion (AIC); the former assesses the degree to which incidence of the disease can be explained by the covariables, and the latter considers the maximum likelihood and the amounts of explanatory variables used.
Preparation of the database to unify the variables, correlation analyses, and the OLS, SAR, and CAR regression models were performed in the R statistical program, version 3.6.1 [26]. The MGWR regression model's fit was assessed with the mgwr package in the Python programming environment [27].

Results
A total of 4,698,163 COVID-19 cases were reported from February 25 to September 26, 2020. Spearman's correlation matrix (Table 2) shows that the covariables ELDER and PMR presented significant inverse correlation with COVID-19 incidence. The covariables BLKBRN, POP_DENS, GINI and EDUC9 showed significant positive correlation with incidence of the disease. Considering that the covariables ELDER, PPLAN, LOGINCOM, MHDI, and BLKBRN were highly correlated with each other, they were excluded from the modeling.
According to Fig 2, the highest crude incidence rates of the disease occurred in the North of Brazil and on the coastline, especially in the Northeast and Southeast. Fig 3 shows that after Bayesian smoothing, few differences occurred in the distribution of COVID-19 incidence.
Spatial autocorrelation also corroborated this small disparity between the two distributions (crude and smoothed). Moran's index for crude incidence was 0.484 (p = 0.001), and for smoothed incidence it was 0.503 (p = 0.001). These results suggest that COVID-19 incidence presented spatial dependence between the municipalities. Fig 4 presents the LISA Map for smoothed COVID-19 incidence. Municipalities with high incidence surrounded by neighboring municipalities with high incidence of the disease were Table 2. Spearman's correlation matrix.    Table 3 shows the global models' results. The higher the GINI index, the higher the incidence of the disease at the municipal level. Likewise, the higher the nurse ratio per 1,000 inhabitants in the municipalities (NURS), the higher the COVID-19 incidence. Meanwhile, proportional mortality ratio (PMR) was inversely associated with incidence of the disease. The R 2 and AIC of the SAR model were the best of the three models, indicating that the three covariables explained 48.9% of the variability in COVID-19 incidence in this model. The residuals from the spatial models were also controlled after adjusting for spatial dependence, since they presented GMI close to zero.

LOGCOV
As shown in Table 4, these covariables presented VIF < 10, indicating low multicollinearity in the OLS model.  Table 5 shows the summary results of the MGWR model. The fit improved when compared to the global regression models. R 2 increased to 0.699, AICc remained at 9690, and the residuals were also controlled (GMI = -0.055, p = 0.999).
The maps with the MGWR model's coefficients showed the same direction as the association found in the global regression models. The GINI index showed a significant positive association with COVID-19 incidence in half of Brazil's territory, including the entire North region, except the west of Acre State. In addition, in the parts of Maranhão, Piauí and Bahia states (Northeast region); in the Central-West region, in the north of the state of Mato Grosso; in the Southeast region, including the entire Espírito Santo and Rio de Janeiro states, as well as part of the Minas Gerais. In these areas, the GINI index was also positively associated with incidence of the disease (Fig 6).  PMR coefficients showed a significant negative association with COVID-19 incidence in all major geographic regions of Brazil, except for a major part of the Northeast region, south of the state of Mato Grosso do Sul, west of São Paulo, north of Paraná and some municipalities of the states of Amapá, Pará and Tocantins (Fig 7).

Discussion
Spatial distribution of COVID-19 incidence was high in the North of Brazil and part of the Northeast region, including its coastline, extending to a large part of the Southeast region. Importantly, Brazil's coastline concentrates many of the country's state capitals. Multivariate analyses identified three covariables as predictors of COVID-19 incidence in Brazilian municipalities: GINI, nurse ratio per 1,000 inhabitants, and proportional mortality ratio (PMR).
Considering the covariable PMR, lower values for this indicator suggest worse levels of quality of life and economic development [28]. In the current study, proportional mortality ratio was inversely associated with COVID-19 incidence in most Brazilian municipalities (the higher the incidence, the lower the PMR), possibly because this covariable is a proxy for social development.
In the same direction, Baqui et al. [29], found higher COVID-19 mortality in the North and Northeast regions of Brazil, related to higher prevalence of comorbidities and lower socioeconomic development in these regions. Andrade et al. [11] identified high-risk clusters in six municipalities in the central-south region of Sergipe state in Northeast Brazil, including the state capital Aracaju, which has the state's highest population density and the lowest socioeconomic level. Maciel, Castro-Silva and Farias [13] found spatial dependence of COVID-19 incidence among municipalities of Ceará state in Northeast Brazil and moderate direct correlation with the municipal human development index.
High PMR values are related to low levels of social vulnerability, as in the comparison with ethnic groups. Baqui et al. [29] found higher mortality in the brown and black populations, highlighting brown ethnicity as the second leading risk factor for death, next to age. Barbosa et al. [30] observed that incidence rate and mortality rate also revealed, respectively, a significant positive correlation with the proportion of black (Afro-Brazilian) and brown (mixed race) skinned people and with the income ratio. Mollalo, Vahedi and Rivera [15], who evaluated global spatial models for municipalities in the United States, found directly association between COVID-19 incidence and social inequality, median household income, percentage of black female population and percentage of nurse practitioners. In the current study, the GINI index, which measures social inequality, was also directly associated with COVID-19 incidence. This association coincides with the results obtained by other studies on COVID-19 cases and deaths [18,19] and corroborates the role of inequality as an important social determinant of health.
Cordes and Castro [14] found an inverse association between COVID-19 and higher levels of education and income in New York. However, Rafael et al. [20], in the city of Rio de Janeiro, found higher incidence in regions with high income, which suggests that access to testing occurs unequally in Brazil's reality, with the wealthiest-population having greater access to testing.
The covariable nurse ratio per 1,000 inhabitants, considered an indicator of healthcare capacity, was associated with higher COVID-19 incidence [15] and lower incidence of deaths from the disease [17]. The availability of human resources in health, represented in this study by nursing professionals, influences the capacity for detecting and reporting the disease and thus tends to increase the observed incidence.
In addition to human resources, according to a study in the United States, COVID-19 cases are underestimated, with high proportions of asymptomatic cases associated with a general lack of access to testing in a context of widespread underreporting, which can boost the search for healthcare services and thus explain a positive association between nursing professionals and detection of the virus [14].
The current study presents some limitations related to variations in the notification of cases between municipalities and over time, potentially introducing information bias due to underreporting and missing data, frequent limitation in ecological studies, especially in epidemic periods. However, the inclusion of possible confounders and the use of local models allowed the analysis and identification of relevant predictors at the population level. There is also the possibility of an information bias in the covariables, since many of them were obtained from the 2010 Population Census by the IBGE, while the outcome was calculated from the number of cases reported in 2020. Some municipalities may have undergone changes in their sociodemographic characteristics, which could influence the results at the local level. However, the study assumed the non-occurrence of significant changes in the municipalities' sociodemographic profile over the course of ten years.
Although the entire population was susceptible to COVID-19 at the beginning of the pandemic, at the municipal level the disease has shown distinct repercussions, considering the various socioeconomic groups. As observed elsewhere, populations with increased social vulnerability are more exposed to the risk of infection. These populations experience limitations in adhering to social distancing measures, due to their work situation, largely informal [14], and precarious housing and sanitation conditions, factors that hinder maintenance of the hygiene needed to prevent and control transmission of the disease [13]. Effective actions to control the spread of COVID-19 and reduce the mortality from the disease should take these factors into account. The findings of this research reinforced by the methodology employed helped in the global efforts to understand the spatial dynamics of COVID-19, in order to offer tools at a specific geographical level for targeted interventions.