Identifying hotspots of cardiometabolic outcomes based on a Bayesian approach: The example of Chile

Background There is a need to identify priority zones for cardiometabolic prevention. Disease mapping in countries with high heterogeneity in the geographic distribution of the population is challenging. Our goal was to map the cardiometabolic health and identify hotspots of disease using data from a national health survey. Methods Using Chile as a case study, we applied a Bayesian hierarchical modelling. We performed a cross-sectional analysis of the 2009–2010 Chilean Health Survey. Outcomes were diabetes (all types), obesity, hypertension, and high LDL cholesterol. To estimate prevalence, we used individual and aggregated data by province. We identified hotspots defined as prevalence in provinces significantly greater than the national prevalence. Models were adjusted for age, sex, their interaction, and sampling weight. We imputed missing data. We applied a joint outcome modelling approach to capture the association between the four outcomes. Results We analysed data from 4,780 participants (mean age (SD) 46 (19) years; 60% women). The national prevalence (percentage (95% credible intervals) for diabetes, obesity, hypertension and high LDL cholesterol were 10.9 (4.5, 19.2), 30.0 (17.7, 45.3), 36.4 (16.4, 57.6), and 13.7 (3.4, 32.2) respectively. Prevalence of diabetes was lower in the far south. Prevalence of obesity and hypertension increased from north to far south. Prevalence of high LDL cholesterol was higher in the north and south. A hotspot for diabetes was located in the centre. Hotspots for obesity were mainly situated in the south and far south, for hypertension in the centre, south and far south and for high LDL cholesterol in the far south. Conclusions The distribution of cardiometabolic risk factors in Chile has a characteristic pattern with a general trend to a north-south gradient. Our approach is reproducible and demonstrates that the Bayesian approach enables the accurate identification of hotspots and mapping of disease, allowing the identification of areas for cardiometabolic prevention.

Introduction Cardiovascular disease is the main cause of death worldwide and also an important cause of disability [1]. Cardiovascular disease and its risk factors have a greater impact on vulnerable populations [2]. Many middle-income countries are undergoing a nutritional transition from a traditional towards a more industrialized diet along with a decline in physical activity [3].
Between 1970 and 1980, Chile experienced a decrease in undernutrition and general mortality and, after 1980, this coincided with an increase in obesity [4]. Data from health surveys in Chile show a growing prevalence of diabetes (4.2%, 9.4%, and 12.3%) and obesity (23. Chile has an unequal population distribution, having a high population density in the central region and sparsely populated remote areas. The dispersed population presents a challenge to the national public health organization as public hospitals are mostly situated in the central region and remote regions have few hospitals and inhabitants of these areas have difficulty of access to services [8].
Chile is also characterised by heterogeneity in socioeconomic factors such as income and education [9][10][11]. The northern region has the highest gross domestic product per capita and the south the lowest [8]. In addition, Chile has a mixed private-public health system. Most people with more financial resources are affiliated with the private health system and people with fewer resources are mainly affiliated with the public system. Affiliates in the private system have four times the cost of health care than those in the public health system. Moreover, in the northern region, there are also private hospitals that can be used by people affiliated with the private health system [12].
With limited resources, decision-makers have to prioritize. In order to reduce health disparities, public health policy makers require detailed regional and local knowledge about the distribution of chronic diseases and their risk factors to better allocate resources to meet population need [13]. However, in many countries, only national or regional health data are available and it is necessary to analyse the data in a smaller geographic unit to have more refined data.
In addition to traditional determinants of health for chronic diseases, such as demographic characteristics and behavioural factors, neighbourhood can influence health [14]. De Groot (2019) show that populations in urban areas had higher levels of low-density lipoprotein (LDL) cholesterol and triglycerides compared to rural areas [15]. In addition, differences in neighbourhoods in terms of food accessibility and walkability may be associated with higher cardiovascular mortality and premature death [16].
Moreover, national prevalence statistics may not reveal differences between regions or geographic inequalities [17]. Traditional analytic approaches, such as those used in populationbased surveys -designed for national level inferences-often lack statistical power to explore sparsely populated geographical areas, unless these are consciously oversampled [18]. Bayesian analyses can be used to improve prevalence estimates in sparsely sampled areas by inferring information from surrounding and similar areas.
The main cause of mortality in the Chilean adult population are circulatory system diseases [19]. Moreover, Chile has a high prevalence of diabetes, obesity and hypertension. Therefore, in this study, we aim to identify hotspots of cardiometabolic risk factors to detect key areas where public intervention is needed. In addition, we aim to examine the geographical variation in the prevalence of cardiometabolic health using a Bayesian hierarchical modelling. Our hypothesis is that we will observe areas with a high prevalence of cardiometabolic conditions and that we will observe a geographic trend in the distribution of health conditions.

Study population/design
The study population was composed of participants in the Chilean Health Survey. This survey is a population-based study representative of the Chilean adult population that collects data on demographic, behavioural, physical and mental health. It has a clinical examination with blood samples, and individual data about place of residence (region, province and commune). The survey has a cross-sectional design and data collection has been carried out in three waves to date (2003, 2009-2010 and 2016-2017) [5][6][7]. It applies a methodology comparable with other health surveys in the Americas [20].
We analyse the second wave of the Chilean Health Survey 2009-2010 (CHS-2), because this was the first time that all regions of the country were sampled and resident location data were collected and available for research at the time of the analysis. The sample was random with households as units. The national, regional, urban and rural levels were represented in the design. The sample was complex and obtained through a stratified and multistage sampling process and with non-proportional distribution of surveys by stratum [6]. The target population was participants aged 15 years and older. According to a projection of population census data from 2002 to January 2010, the total population of 15 years and over was 13,177,032 inhabitants. The survey was answered by 85% of the eligible population and 5,434 people were finally interviewed. The Research Ethics Committee of the School of Medicine of the Pontificia Universidad Católica de Chile gave the Ethical approval and all participants signed an informed consent.
We included people who participated in all CHS-2 visits: survey, clinical examination with blood samples and provided geographic data.
For organizational purposes, the Chilean territory is divided into three hierarchical units: regions, provinces and municipalities. The CHS-2 was the first wave of the Chilean Health Survey to sample participants in all regions of the country. In order to have a detailed and at the same time interpretable information, provinces were chosen as the unit of analysis. Additionally, for description purposes, we described greater geographic units called great regions and divided the country in north (latitude: -18˚to -31˚), centre (latitude: -32˚to -37˚), south (latitude: -38˚to -43˚) and far south great region (latitude: -44˚to -53˚). For creating the posterior mean prevalence maps, we used a basic map of Chile which is freely available at: http://labgeo. ufro.cl/catalogos/chile.html [21]

Statistical analyses
We calculated descriptive statistics using the unweighted sample population, comparing characteristics by sex and geographic area.
Outcomes. Diabetes was diagnosed with a fasting glucose � 126 mg/dl or with selfreported medical diagnosis (excluding diabetes during pregnancy) [22]. Obesity was diagnosed with a BMI � 30 kg/m 2 calculated with measured weight and height [23]. Hypertension was diagnosed with either a measured systolic blood pressure � 140 mmHg or diastolic blood pressure � 90 mmHg or self-reported antihypertensive treatment [24]. High LDL was diagnosed with a value > 160 mg/dl [25].
Descriptive variables. Continuous variables are described as mean (SD) or median (IQR) according to the observed distribution and calculated from non-weighted sample population. Urban regions were defined as group of concentrated dwellings with more than 2,000 inhabitants. Education and income were used as proxies of the socioeconomic status [9,11]. Education level was categorized as low (< 8 years), intermediate (8 to 12 years) and high (> 12 years). Income was categorized into tertiles in low (<254€/month), intermediate (254-491 €/month) and high income (>491€/month) Physical activity was defined as self-reported frequency of at least once a week of mild, moderate or vigorous activity. Underweight was defined as BMI < 18.5; normal weight as 18.5 � BMI < 25; overweight 25 � BMI < 30; and obesity as BMI � 30 kg/m 2 . Central obesity was defined as waist circumference > 102 cm in men or > 88 cm in women. Consumption of alcohol was classified as usual consumption of 3 or more drinks a day, 2 drinks a day, 1 drink a day, no drink a day or never.
Bayesian estimation for handling missing data. To deal with missing data we applied Bayesian imputation. All imputation in Bayesian models was done within Markov chain Monte Carlo. We assumed a missing at random mechanism. We observed missing data in determinants and outcomes, therefore we applied a Bayesian paradigm (BUGS software) for imputation of missing data in outcomes [26]. In addition, we specified priors for missing data in determinants. In the case of missing sampling for some provinces, we assumed restricted prior distributions using the global mean sampling weight.
Bayesian hierarchical modelling for estimating probabilities. The statistical methods of this study are explained in detail in Lawson et al [27]. To study the geographic distribution of outcomes and their interrelations, we chose a flexible Bayesian hierarchical modelling approach, which included extra georeferenced confounding [28]. Models took into account the individual and aggregated (province) dimensions.
At the individual level, our models included fixed and random effects. The random effects consisted of uncorrelated and correlated spatial effects. The model included a sampling weight for each individual, as well as age, sex and age-sex interaction as fixed effects [29,30].
Additionally, we fitted a model at an aggregated level (by province). We divided the sum of cases by condition and province by the number of samples per province. The fixed effects were the mean sampling weight per province, the mean age per province, the percentage of males per province and the age-sex interaction. We used the aggregated model for calculating the posterior mean prevalence. We report posterior mean prevalence with 95% credible intervals (95% CrI), which correspond to the 2.5 and 97.5 percentiles of the posterior mean distribution. We included spatial effects in the aggregated model with the objective of representing the province within which the individual resides. These spatial effects included an uncorrelated effect, which captures the clustering tendency of the outcome and deals with small number of sampling in some provinces [29].
We applied a joint model approach assuming a correlation between each outcome (diabetes, obesity, hypertension and high LDL cholesterol) within the same individual [31]. To estimate overall posterior mean probabilities, we calculated the average of the global simulated parameters of all provinces with 95% CrI based on quantile probability intervals. The two Bayesian models described in this analysis (imputation model, individual and aggregated by province joint outcome models) are different but they were run in the same Markov Chain Monte Carlo iterations. The equations / terms of the statistical analysis model have been published by us elsewhere [32] To roughly compare Bayesian hierarchical modelling with a frequentist approach to estimate the prevalence of the disease, we fitted generalized linear mixed-effects models using penalized quasi-likelihood separated for each outcome. Fixed effects included age, sex, age-sex interaction and sampling weights. Random effects were provinces. The results were parameter estimates and their 95% confidence intervals.
Hotspots of cardiometabolic health. To detect provinces with exceptionally high prevalence of cardiovascular risk factors and diabetes, we used posterior exceedance probability estimates that were greater than a chosen threshold [33]. The chosen thresholds were the estimated median values of the posterior prevalence for each outcome.

Sensitivity analysis
We adjusted the basic model for income and education at individual level in order to assess the impact of these confounders.
We used WinBUGS [34] to fit joint models, R2WinBUGS and rube packages [35] to call a BUGS model, tmap package [36] for creating maps and package MASS [37] to fit generalized linear mixed models via penalized quasi-likelihood.

Descriptive statistics
Among the 5,434 participants included in the CHS-2, 5,293 (97.4%) also had residential location data. From those, 4,780 participated in the physical examination/blood sampling and were included in this analysis. Sixty per cent of participants were women and the mean age (SD) was 46 years (18.5). Table 1 shows demographic and lifestyle characteristics of the study population stratified by sex and geographic area (great regions). The last column shows the percentage of missing data for each variable. All variables had less than 5% missing data except LDL cholesterol that had 44%. The percentage or urban people was significantly higher in the north and lower in the south compared to the centre. In addition, a significant lower level of education, income and percentage of employment was observed in the south compared to the centre. In contrast, high to moderate physical activity level and percentage of non-smokers was higher in the south compared to the centre. A higher income level was observed in the north compared to the centre.

Joint modelling outcome: Posterior mean prevalence (Bayesian prevalence)
The nationwide posterior mean prevalence (% (95% CrI)) of diabetes, obesity, hypertension and high LDL cholesterol were 10.9 (4.5, 19 Posterior mean prevalence of diabetes was lower in the far south (9.0%) compared to the central, region (11%). Posterior mean prevalence of obesity was higher in the south (34%) and far south (37%), compared to the centre (28%). Posterior mean prevalence of hypertension was higher in the south (42%) compared to the centre (39%). Posterior mean prevalence of high LDL cholesterol was higher in the north (15%), south (15%) and far south (19%) (Figs 1-4) compared to the centre (11%).
In the sensitivity analysis where the basic models were further adjusted by income and education, there was no significant changes in prevalence of the outcomes (S2 Table). Regarding estimation, we did not observe any significant associations of education or income at the aggregate level, although at the individual level, a significant association of education for obesity (mean estimation (95% CrI) 0.31 (0.08, 0.54) and income for hypertension was observed (mean estimation (95% CrI) -0.019 (-0.032, -0.006)).

Joint modelling outcome: Hotspots
For diabetes, we detected only one hotspot: San Felipe de Aconcagua (pn 19), situated in the centre great region (Fig 1B1-1B2).

Discussion
The results of this study revealed cardiometabolic health hotspots mainly in the centre, south and far south great regions. We observed also a characteristic pattern of chronic diseases in the Chilean territory, with increasing prevalence from north to south and hot spots mainly in the central and southern regions. Our results showed that Chile is one of the countries with the highest prevalence of diabetes and obesity in the Americas [38,39]. The prevalence of hypertension in Chile was higher than the global age-standardized prevalence in Latin America [40]. In contrast, the prevalence of high LDL cholesterol was lower than the prevalence in the US in 2010 [41].
Hotspots were often located in more deprived areas. For example, a diabetes hotspot was located in San Felipe de los Andes, which was in the 4th percentile of the Socio-Economic Development Index (SEDI), which included income per capita, education and housing) and two obesity hotspots were located in Biobío and Cautín, which were at the 1st percentile of SEDI, the lowest in the country [42]. These findings could be attributed to environmental factors present in clusters, such as a more accelerated nutritional transition [43], as described in the provinces in the centre area in the Maule region (provinces 29 to 32). In addition, the socio-economic level of the central area was very heterogeneous. The highest prevalence of diabetes and obesity was Cauquenes (pn 29), in the region of Maule, which has one of the lowest gross domestic product per capita in the country [8]. In addition, we found that some socioeconomic factors such as education or income were potential confounder factors (S2 Table). Differences in the prevalence of chronic diseases among different regions may reflect geographic inequities, such as adequate and timely access to health facilities in remote areas. Research in Russia has shown that rapid access to percutaneous coronary intervention is highly dependent on the region and more difficult for rural areas [13]. In Chile, access to health facilities could be particularly difficult in rural areas in the south and far south.
Our finding, that diabetes was more prevalent in the central and southern region, could be explained by certain socio-economic characteristics such as low income/education, as is demonstrated in other surveys [44]. However, the high prevalence of diabetes in the central region could also be attributed to a combination of factors other than income and education, such as high prevalence of obesity, high blood pressure and lower physical activity levels [45]. In addition, environmental factors can be a possible cause for higher diabetes prevalence such as higher rates of urbanization [46].
Obesity was more prevalent in the south and far south of Chile. We found a significant association of low compared to high income. In addition, the south is a region with lower socio-economic resources, which could partly explain the high prevalence of obesity through food choice [47]. In the far south, the climate and geographic remoteness with low availability of healthy food could influence the high prevalence of obesity. Similar geographic conditions with the same problems of food availability can be observed in the far north of Canada [48].
Hypertension was less prevalent in the north and more prevalent in the south. We observed a significant association of income level on hypertension. Therefore, differences in income, education and nutrition habits can explain, at least in part, our results. The north and south have the highest and lowest income levels respectively and at the same time, the lowest and highest hypertension prevalence. Our results are in agreement with a meta-analysis including 54 studies that found that lower socioeconomic status, and especially lower education was associated with higher blood pressure [49]. High LDL cholesterol was less prevalent in the centre of Chile. These results suggest the possible role of the high concentration of health facilities in the centre of Chile.
Traditional generalized models that assume independence of observations are not the most appropriate method to analyse spatial data, because they ignore the spatial autocorrelation between people living in the same area [18]. Rather than calculating the prevalence of a condition in a region by using only data from that region, Bayesian hierarchical modelling use all available data, along with the geographical structure to obtain the best possible estimate of disease prevalence for a given area. These models infer information from surrounding and similar areas to improve estimates for areas that were not sampled or were poorly sampled [50].
The observed results correspond to a country in which the epidemiological transition trends in diabetes and obesity evolve over time in the population and the progressive prevalence increases first in the high-income and then in the low-income groups [51]. An epidemiological transition due to economic changes and urbanization was also observed in the South Asian population. This translates into lifestyle changes toward eating a highly refined calories and saturated fat diet, and less physical activity [52].
This study has several strengths. The CHS-2 had a high response rate (80%) among the eligible population and the sampling method was adequate to make this study representative of the core population. The survey applied robust quality control of fieldwork, laboratory measurements and analyses, in accordance with updated international standards. In addition, the analysis was based on both individual and aggregate levels in a joint model approach. Moreover, we imputed missing data, making the results less likely to be biased due to data missing at random. In addition, Bayesian hierarchical modelling was able to estimate prevalence (posterior probabilities) in provinces that were not sampled, taking information from neighbour provinces.
This study has several limitations. First, the analysis was cross-sectional and we did not adjust for all possible confounders. Also, we assessed diabetes with fasting glucose and selfreported medical diagnosis in addition to the measured fasting blood sugar to increase the sensitivity of the diagnosis (some participants with diagnosed diabetes may have normal blood sugar on the day of the exam). However, self-reported diagnosis may be subject of recall-bias. With LDL, percentage of missing data was high and therefore, the multiple imputed estimates from the Bayesian results show less precision. Furthermore, we cannot exclude selection bias in the sampling of some provinces. However, hierarchical Bayesian modelling provides precise results, in particular for areas with low sampling and missing data. This is due to the smoothing effect of neighbour provinces over the estimates.

Conclusions
The joint model analysis presented in this study gives a good approximation of the reality for the identification of hotspots in cardiometabolic outcomes and addresses the issues of small numbers and missing data. Hotspots were mostly located in certain provinces in the centre and south/far south great regions. This is an important piece of information for local public health authorities. In addition, the results of this study show evidence of the utility of Bayesian hierarchical modelling to monitor the general population in countries with issues of access to remote areas and heterogeneous distribution of the population. The methods used in this study, which are reproducible and scalable, allow the identification of affected provinces in order to inform priority actions for cardiometabolic prevention.
Our approach is innovative compared to other spatial analytical methods as the joint model analysis makes it possible to link the individual results while taking into account the correlation between areas (aggregated level). In addition, this methodology can also deal with missing data generating predicted estimates derived from the correlation between the results.
Supporting information S1 Table. Age and sex adjusted frequentist prevalence of cardiometabolic outcomes (diabetes, obesity, hypertension and high LDL cholesterol).