Spatial pattern and determinants of anaemia in Ethiopia

Anaemia is a condition in which the haemoglobin concentration falls below an established cut-off value due to a decrease in the number and size of red blood cells. The current study aimed (i) to assess the spatial pattern and (ii) to identify determinants related to anaemia using the third Ethiopian demographic health survey. To achieve these objectives, this study took into account the sampling weight and the clustered nature of the data. As a result, multilevel modeling has been used in the statistical analysis. The analysis included complete cases from 15,909 females, and 13,903 males. Among all subjects who agreed for hemeoglobin test, 5.22% males, and 16.60% females were anemic. In both binary and ordinal outcome modeling approaches, educational level, age, wealth index, BMI and HIV status were found to be significant predictors of anaemia prevalence. Furthermore, this study applied spatial methods to generate maps at regional level which could be useful for policy makers where great efforts should be concentrated to reduce the prevalence of anaemia. As revealed by Moran’s I test, significant spatial autocorrelation were noted across clusters. The risk of anaemia was found to vary across different regions, and higher prevalences were observed in Somali and Affar regions.


Introduction
Anaemia is a condition in which the number and size of red blood cells, or the haemoglobin concentration falls below an established cut-off value [1,2]. It is an indicator of both poor nutrition and health status. According to the 2011 World Health Organization (WHO) anaemia prevalence estimate, anaemia affects around 800 million children and women worldwide [2]. In low-income countries, the prevalence of anaemia remains high and is an area of priority [3]. Reducing anaemic incidence is recognized as an important component of the health of women and children, and the second global nutrition target for 2025 calls for a 50% reduction of anaemia among women of reproductive age [4].
Anaemia remains one of the biggest public health problems in Africa which is a malariaendemic continent [5]. In Africa, the prevalence of anaemia among women of the reproductive age group 15-49 years is 37.6% [2]. Crawley [6] illustrates areas of stable malaria transmission PLOS  were found more anaemic than malaria free geographical areas. Community-based estimates of anaemia prevalence among children in areas where malaria is endemic range from 49 to 76% [7]. In Ethiopia, the prevalence of anaemia is 19% and 23% for non-pregnant and pregnant women, respectively [2]. The prevalence of anaemia varies by place of residence, urban and rural; a higher proportion of women in rural areas are anaemic than those in urban areas [8]. Despite the magnitude of anaemia, geographical variability of anaemia prevalence and identification of risk factors are not well studied in Ethiopia. To our knowledge, there are no spatial studies that have reported the burden of anaemia for subjects older than 15 years old in Ethiopia.
Studies of anaemia mainly focus on pregnant women or children, among whom the burden is greatest [9][10][11][12][13]. In this study, we describe the spatial pattern across regions and attempt to identify risk factors of anaemia among males 15-59 years of age and females 15-49 years of age in Ethiopia. To date, studies on anaemia prevalence in Ethiopia have not assessed the spatial pattern or geographical heterogeneity [11,14,15]. Ignoring such geographical heterogeneity in data analysis leads to inefficient and inconsistent parameter estimates.
Spatial dependency in health-related data indicates that health outcomes in nearby neighborhoods are more similar to each other than those in distant neighborhoods. The study of geographical heterogeneity on health outcomes is modeled using multilevel or spatial mixed model [9,10,13]. Non-spatial multilevel modeling [16,17] cannot address spatial dependency because the method typically assumes that neighborhoods (i.e., spatial units) are statistically independent of each other, thus multilevel models have been criticized as non-spatial and unrealistic [18][19][20].
The present study aims to examine spatial patterns and identify determinants of anaemia in Ethiopia using the 2011 Ethiopian Demographic and Health Survey (EDHS 2011) data. First, the study examines whether there is a significant global spatial autocorrelation for the prevalence of anaemia. If the existence of global spatial dependency is confirmed, the subsequent objective will be to explore local spatial autocorrelation, and map the spatial distribution of anaemia prevalence by region. The generated anaemia prevalence map would have important implications to targeting policy for better intervention, and to identify variables that might account for the observed spatial patterns.
Thus, the main contribution of this study would be mapping the spatial distribution of anaemia prevalence by the survey cluster and regions of the country. Further, the study would be the first ever to map anaemia for both males and females aged 15 years and older in Ethiopia. Moreover, to identify predictors of anaemia, a multilevel analysis will be done by taking into account the sampling weight and clustered nature of the dataset.

Data
The data for this study were taken from the 2011 Ethiopian Demographic and Health Survey (EDHS 2011). The 2011 Ethiopia Demographic and Health Survey is the third comprehensive and nationally representative survey conducted in Ethiopia as part of worldwide Demographic and Health Surveys project. The main objective of the 2011 EDHS was to provide timely and reliable data on health and demographic outcomes at both national and regional levels [8]. The EDHS 2011 data were downloaded from the DHS website (http://dhsprogram.com) after being granted permission. More detailed information on DHS survey design and anaemia testing data has been summarized in [8]. The 2011 EDHS samples were selected using a stratified, two-stage cluster sampling design. In the first stage, 624 clusters of census enumeration areas (EAs), 187 in urban and 437 in rural areas, were included in the survey. Among 17,817 representative selected households in the 2011 EDHS, only complete cases from 15,909 females  aged 15-49 years and 13,903 males aged 15-59 years were used. For analysis purpose, potential  predictor variables such as wealth index, educational level, HIV status, BMI, age, residence  (urban vs rural), region where the respondent resided, and other variables (i.e. pregnancy status) were extracted from the dataset.
Further, since malaria is one of the risk factors of anaemia ( [21][22][23][24]), Plasmodium falciparum parasite rate (PfPR) to the considered DHS clusters were extracted from malaria atlas project (www.map.ox.ac.uk). Malaria Atlas Project (MAP) is one of the largest and most contemporary spatial database for Plasmodium falciparum parasite rate [25]. The details of the malaria data and statistical procedures for mapping it have been described elsewhere [26]. MAP presents age standardized to 2-10 years old PfPR which provides a valid estimate of the transmission intensity of malaria in that cluster [25]. To incorporate PfPR in our analysis, first we extracted geo-referenced clusters with non-zero values of PfPR from MAP, and assign those non-zero PfPR values to each individual located in nearby geo-referenced clusters.
The response variable in this study is haemoglobin level in the blood, a key indicator for anaemia. The raw measured values of haemoglobin were obtained using the HemoCue instrument and adjusted for altitude and smoking status [27]. Different cut-off points of haemoglobin level for different age groups were used to classify an individual as anemic [28]. WHO recommends specific hemoglobin levels below which an individual is specified as having anaemia, namely mild (10.0 to 11.9 g/dL), moderate (7.0 to 9.9 g/dL), and severe (<7.0 g/dL), and "any anaemia" corresponding to < 12.0 g/dL [1,8]. Further, to see the effect of covariates on anaemia, the outcome variable for the i th individual in the j th cluster (Y ij ) is dichotomized as follows: 0 if haemoglobin level > 12:0g=dL ðNon À anaemicÞ ( Since the analysis based on the above dichotomization cannot provide information about the status of anaemia level, further discrimination which takes into account the anaemia level was done based on the following categorization [1,28]: The number of female and male respondents per cluster ranged from 4 to 63 with an average of 27, and 3 to 78 with an average of 23, respectively. The number of anaemic females per cluster ranges from 0 to 20 with average 5, and 0 to 10 with average 2 for males.

Spatial autocorrelation
Spatial autocorrelation measures offer additional insight into the interdependence of spatial data. These measures quantify the correlation of a spatial random field Y(s) with itself at different locations [29]. They can be very useful to obtain information at exact locations (point-referenced data) or measurements that characterize area type data (areal data). Different statistics have been developed to test for the presence and magnitude of spatial association among areal units [30]. These include global distance-based measures such as Moran's I, and Geary's C (see [30][31][32] for discussion). The presence of spatial dependence is tested using Moran's I statistic [33].
where Y i represents the vector of observations at n different locations, and w ij are elements of a spatial weight matrix. Values of Moran's I are assessed by a test statistic (the Moran's I standard deviate) which indicates the statistical significance of spatial autocorrelation in model residuals. In this study, Moran's I is calculated after we aggregate the number of anaemic subjects by survey cluster.

Multilevel analysis
Typically, different surveys contain multiple levels of nesting. When analyzing such datasets, a multilevel model is generally more appropriate than an ordinary single-level regression model because it enables one to deal with the hierarchical structure of variables [16,17].
A majority of demographic and health survey (DHS) sample including the one considered in this study are representative samples randomly selected from the target population. Each interviewed unit represents a certain number of similar units in the target population. Thus, to draw a valid inference from such types of surveys, the representativeness of the sample must be taken into account [34][35][36]. When estimating multilevel models that are based on such surveys, sampling weights are incorporated into the likelihood [36]. In this analysis, sampling weight was taken into account in both the binary and ordinal modeling approaches.
The multilevel model assumes that individual-level (i.e., lower hierarchy) observations belonging to a particular cluster (i.e., higher hierarchy) are not independent of each other because they share similar characteristics of that cluster [16,17]. Multilevel analysis is needed as analytic means because the nested structure of the data requires simultaneous examination of cluster and individual-level variables [37]. The multilevel approach produces reliable standard errors and parameter estimates when outcomes for individuals within clusters are correlated [17]. Multilevel models consist of two sets of equations: one explaining variation at individual level, and the other explaining variation at cluster level.
In this study two different multilevel models were fitted: (1) multilevel logistic regression with a dichotomous dependent variable (anaemic versus not anaemic); and (2) multilevel ordinal logistic regression (severe, moderate, mild, not anaemic).

Multilevel Generalized Linear Models for Binary Outcome
The multilevel logistic regression model is a very popular choice for analysis of dichotomous data. Due to the fact that the probability of having anaemia possibly varies in different clusters, a cluster-level random intercept is introduced in the generalized linear mixed model. Let y ij denote the binary outcome for subject i in cluster j, and assume y ij follows a Bernoulli distribution with probability of success (in our case anaemic), p ij . Then, using the usual logit link function, a binary outcome can be associated with a linear predictor as follows: where β 0 is an intercept, β is an unknown parameter for individual level predictors, and u j are mutually independent Gaussian random effects used to capture within-cluster correlation. In standard multilevel models, u j is usually assumed to be a normally distributed random intercept with mean 0 and variance s 2 u [17].
To test whether the variance of the random intercept is significant (H 0 : d = d 0 = 0 against H a : d > 0), the likelihood ratio test was applied in which under H 0 the sum À 2llðd 0 Þ þ 2llðdÞ $ 1 2 w 2 ð0Þ þ 1 2 w 2 ð1Þ. In the expression 2ll(d 0 ) stands for the value of the loglikelihood function related to the model under H 0 whereas 2ll(d) equals the value of the log-likelihood function for the GLMM under consideration. When the logistic model is applied, the level-one residuals are assumed to follow the standard logistic distribution, with mean 0 and variance π 2 /3 % 3.29. This variance represents the within-group variance for intraclass correlation (ICC) for dichotomous data; ICC can be similarly defined for ordinal outcomes [38]. For Model 1, the intraclass correlation is:

Multilevel Generalized Linear Models for Ordinal Outcome
Using ordered outcomes yields more parsimoniously parameterized models. A common tool for analyzing regression data with ordinal responses is the cumulative threshold model [39]. The model assumes that the response variable Y ij , here anaemia status for subject i in cluster j, is a categorized version of a latent continuous variable, say individual haemoglobin level (as in this study). The model is where u j $ Nð0; s 2 u Þ as in Eq (2). The weighted multilevel analysis was done using Stata [40]. Spatial maps of anaemia prevalence by cluster and region, and spatial autocorrelation tests were done using ArcGIS 10.5 [41].

Results
In this section, the data introduced earlier are analyzed, and results of the analysis based on multilevel and spatial data analysis techniques will be presented. Recall that the aims of the study are to assess the spatial pattern and identify determinants of anaemia in Ethiopia for females and males aged 15 years and older. Table 1 shows prevalence of anaemia for males and females by region. The results reveal that in all regions anaemia prevalence is higher among females than males. Higher prevalence of anaemia was observed in Somali and Affar regions for both genders. The lowest prevalence was observed in Addis Ababa and SNNP region (Table 1). Table 2 provides a summary of the percentage of anaemia by gender under different categorical covariates including educational level, age, residence, HIV status, BMI, PfPR and wealth index. For both genders higher prevalence of anaemia was observed in rural areas compared with individuals who live in urban areas. Furthermore, anaemia prevalence decreases as wealth index and educational level increase for both genders. For females, the prevalence of anaemia is higher when age is above 18, while for males the prevalence is higher among younger males. To see whether this difference is statistically significant or not, a multilevel model which includes all potential predictors simultaneously has been fitted. The results are presented in Tables 3 and 4. Fig 1 presents the prevalence of anaemia at the various levels of anaemia by gender. The dotted line represents 95% confidence interval for the estimated prevalence of anaemia at different levels. The result reveals that, females are more anaemic than males.

Spatial data analysis
To get a general insight into the spatial clustering of anaemia, a global spatial statistic was estimated using Moran's I statistic (Eq 1). This was done after establishing the number of anaemia cases in each of the clusters. The test result showed the presence of significant global positive spatial autocorrelation for the prevalence of anaemia (for males I = 0.12, P-value < 0.0001, and females I = 0.15, P-value < 0.0001). The global Moran's I statistic result suggests that there is local clustering in the distribution of anaemia prevalence that need to be further explained using local spatial statistics. To get a general insight on the prevalence of anaemia by regions of the country, Fig 4 presents the prevalence of anaemia by regions of the country. The map reveals that the eastern part of the country had higher anaemia prevalence than the south-west region. As presented in Table 1 above, the prevalence was highest in Somali (44.16% (95% CI:40.37,48.01) for females, 7.5% for males) and then for the Affar region (34.48% (95% CI:31.59.37,48.01) for females, 6.8% for males), and lowest in Addis Ababa (9.37% (95% CI:8.04,10.90) for females, 1.3% for males) and SNNP region (11.26% (95% CI:9.92,12.75) for females, 3.2% for males). Fig 5 presents the risk map of districts by annual malaria parasite incidence in Ethiopia. The Figure reveals that the central part of the country is malaria free. Nowadays, the overall malaria prevalence in Ethiopia is very low [42]. Malaria parasite prevalence in areas <2,000m was 0.5 percent by microscopy blood-slide examination for all ages and 0.6 percent among children under 5 year [42]. Among many other factors, malaria plays a major causative role of anaemia globally. The malaria-attributable fraction of anaemia may then differ in different settings. In this study, the spatial pattern of anemia and malaria (Figs 3 and 4) is not similar.

Multilevel mixed-effects binary logistic regression model
As mentioned in the previous section, by considering sampling weights multilevel logistic regression models can be employed to identify potential predictors related to anaemia. The interpretations of the results here are given along adjusted odds-ratio as shown in Table 3. The results reveal that educational level, wealth index, BMI, HIV status, and age had significant effect on the prevalence of anaemia. As demonstrated in [14,[44][45][46], being pregnant significantly raises the likelihood of being anaemic. The current study shows that pregnant women were 1.43 times (43% more) likely to be anaemic than non-pregnant women. The effect of age on the prevalence of anaemia is different for males and females. The chance of being anaemic for females above 18 years of age was 1.33 times higher (33% more likely) than those below 18 years of age. Males older than 18 years of age were 35% less likely to be anaemic than those younger than 18 years. The place of residence (urban or rural) where the subject lives does not have any impact on the prevalence of anaemia. HIV positive females are 2.09 times more likely to be anaemic than HIV negative females. HIV positive males were 3.56 times more likely to be anaemic than HIV negative males. Females with BMI between 18.5 and 24.9 were 17% less likely to be anaemic than females with lower BMI (BMI <18.5). Further, the likelihood of being anaemic for overweight (BMI >24.9) females was 40% less likely than underweight females. There was no statistically significant difference between underweight and overweight males in the likelihood of being anaemic. But, males with BMI 18.5-24.9 were 40% less likely to be anaemic than underweight (BMI < 18.5) males.
In the binary outcome model, for males who live in areas with Plasmodium falciparum parasite rate between 0.015 and 0.025 (0.015 PfPR < 0.025) the estimated odds of being anaemic had increased than for males who live in areas with low Plasmodium falciparum parasite rate (PfPR < 0.005) [estimated odds ratio (OR) 2.02, 95% confidence interval (CI) 1. 29-3.15]. Due to few number of PfPR data extracted from MAP, statistically significant association between anaemia and areas with apparent Plasmodium falciparum parasite were not observed for females (Table 3).
Under the binary multilevel logistic model (Eq 1), the intraclass correlation was equal to 0:40 0:40þ3:29 ¼ 0:11, implying that two subjects located in the same cluster had a correlation equal to 0.11 to be anaemic for males.

Multilevel mixed-effects ordinal logistic regression model
In order to quantify the effect of each one of the determinants (by taking into account the ordinal nature of the outcome variable), we considered Eq (3) together with the cluster-level random effects. The assumption of proportional odds was tested using the user-contributed command for multilevel and latent variable modeling, called gllamm package in Stata [40,47]. The results reveal that the assumption of proportional odds is tenable at 5% level of significant for all considered covariates in the model. Similar to the aforementioned binary logistic regression model results, educational label, wealth index, HIV, BMI, and pregnancy status were found significant determinant factors of anaemia prevalence. Parameter estimates and their standard errors, together with the corresponding adjusted odds-ratios are given in Table 4. Pregnant women were 1.61 times (61%) more likely to be anaemic than non-pregnant women. Similar to the binary outcome model result, the effect of age on the prevalence of anaemia is different for males and females, and place of residence (urban or rural) where the subject lives does not have any impact on the prevalence of anaemia for males, and a borderline significant result was observed for females. Similar to the binary outcome model result, HIV positive females are 2.09 times more likely to be anaemic than HIV negative females, and HIV positive males were 3.56 times more likely to be anaemic than HIV negative males.
The likelihood of being anaemic for males with BMI between 18.5 and 24.9 was found to be 40% less likely than males with lower BMI (BMI <18.5). Further, the likelihood of being anaemic for overweight (BMI >24.9) males was 38% lower than underweight males. After adjusting for important covariates and by considering the ordinal nature of the outcome variable, we   Spatial patterns of anaemia in Ethiopia found that the PfPR of the area of residence is not statistically correlated with anaemia prevalence (Table 4) for females. This may be due to the fact that malaria prevalence is very low (less than 0.5%) in Ethiopia [42].
The estimated variances of the random effects are significant at the 0.05% level, indicating that there are substantial differences between clusters.

Discussion
The prevalence of anaemia was found to vary geographically-higher in the eastern part of the country and lower in the south-west part (Fig 4). The observed spatial variation of anaemia across regions could be due to the regional differences in dietary preferences; infectious disease risk; access to health care centers or any other factors. The prevalence of anaemia was found to be low in SNNP region which is dominated by maize-mixed agriculture. Messina and his colleagues [13] conclude that living in a community dominated by maize-mixed agriculture was significantly associated with lower chance of being anaemic (74% less likely) as well as greater hemoglobin levels.
Although the descriptive statistics (Table 2) and the report by Central Statistical Agency of Ethiopia [8] suggest that prevalence of anaemia is associated with place of residence (urban and rural), lack of statistical significance in our model (Tables 3 and 4) indicates that other factors are more likely to account for anaemia prevalence. Urban residence was associated with a low burden of anaemia among women [13,48].
In this study, at individual level, wealth, BMI, HIV status, educational level, pregnancy and age were identified as significant determinants of anaemia in both multilevel mixed-effects binary and ordinal logistic regression model at 5% level of significance (Tables 3 and 4). Due to the low prevalence of HIV(1.3%) in their study Messina [13] did not show any association between HIV and anaemia. While, in this study, even if the prevalence of HIV was 1.4% for males and 2.3% for females, significant association between anaemia and HIV was demonstrated in both binary and ordinal logistic regression models (Tables 3 and 4). In this study, the likelihood of being anaemic for HIV positive females was two-times higher than those HIV negative females, and it was 3.56 times higher for HIV positive males as compared with HIV negative males (Table 3). When we take into account the ordinal nature of the outcome variable, a similar result was observed with the binary outcome model (Table 3) were observed in the likelihhod of being anaemic for both HIV positive males and females ( Table 4). The estimated variance of the random intercepts using the weighted analysis for cluster level are significantly different from zero, indicates considerable heterogeneity in anaemia prevalence with respect to sampling cluster that is unaccounted for by the predictor variables (Tables 3 and 4).
Unlike the findings by different scholars [5,[21][22][23][24], our results suggest that malaria is not likely to be a risk factor for anaemic individuals living in those areas with high malaria prevalence. This may be due to the fact that our work makes use of Plasmodium falciparum endemicity from limited number of geographical estimates for PfPR in locations where demographic and health surveys were made, and malaria prevalence is very low (less than 0.5%) in Ethiopia [42].
Carneiro [5] analyze how the prevalence of anemia depends on that of Plasmodium falciparum malaria by developing models of the excess risk of anemia caused by malaria at a population level in 24 villages in northeastern Tanzania. Their study result reveals that the prevalence of a hemoglobin level < 8g/dL attributable to malaria was 4.6% in infants, 4.1% in children one year of age, 2.7% in children two years of age, and 3.3% in women of childbearing age [5].
There was no statistically significant difference between underweight and overweight males in the likelihood of being anaemic. But, males with BMI 18.5-24.9 were 40% less likely to be anemic than underweight (BMI < 18.5) males. Further, when we consider sampling weights in the analysis, wealth index was not found statistically significant on the likelihood of being anaemic for females. While the likelihood of being anaemic for males with middle and richer wealth index was lower as compared with poorest, poorer and richest wealth index individuals.
The results of analysis in this study showed that the findings of the weighted analysis generally agree with the un-weighted analysis. The main difference that was observed between the weighted and un-weighted analysis was in the confidence intervals of parameter estimates. The confidence intervals were narrow for the un-weighted analysis but due to larger standard errors the confidence intervals become wider when we take into account the sampling weight in the analysis. Further, the estimated regression coefficient and standard errors from the weighted analysis diverged slightly from the un-weighted analysis (i.e. parameter estimates from the un-weighted analysis are slightly lower).
In contrast with the findings of our study and others [10,49], the findings from Messiana [13] do not show any significance association of anaemia with body mass index and wealth. Similar to our study results, [10,49] show a significant association between wealth and anaemia.
Supporting our findings, [50][51][52] found increased risk for anaemia among less educated individuals. Contrary to our study results, [13,53,54] found no association between anaemia and educational level. The study by Messina [13] did not account for potential between-cluster correlation in their analysis. They applied the standard multilevel models to identify predictors of anaemia in Congolese women. Studies on the spatial variation of anaemia risk for adult individuals have not been undertaken well yet [55]. The study by Soares [55] showed that malnutrition and parasitological risk factors highly contributed to the spatial variation in individual-level anaemia.

Conclusion
In summary, our study shows that the prevalence of anaemia is associated with wealth, educational level, BMI, HIV, age and pregnancy status. Spatial variability of anaemia prevalence across survey clusters and regions were observed; it was higher in the eastern part of the country.
The main limitation of this study was the inability to incorporate other potential correlates of anaemia such as hookworm infection [56] and dietary information which is associated with anaemia [54,57] were not included in this study.