Lifestyle and socio-economic inequalities in diabetes prevalence in South Africa: A decomposition analysis

Background Inequalities in diabetes are widespread and are exacerbated by differences in lifestyle. Many studies that have estimated inequalities in diabetes make use of self-reported diabetes which is often biased by differences in access to health care and diabetes awareness. This study adds to this literature by making use of a more objective standardised measure of diabetes in South Africa. The study estimates socio-economic inequalities in undiagnosed diabetes, diagnosed diabetes (self-reported), as well as total diabetes (undiagnosed diabetics + diagnosed diabetics). The study also examines the contribution of lifestyle factors to diabetes inequalities in South Africa. Methods This cross sectional study uses data from the 2012 South African National Health and Nutrition Examination Survey (SANHANES-1) and applies the Erreygers Concentration Indices to assess socio-economic inequalities in diabetes. Contributions of lifestyle factors to inequalities in diabetes are assessed using a decomposition method. Results Self-reported diabetes and total diabetes (undiagnosed diabetics + diagnosed diabetics) were significantly concentrated amongst the rich (CI = 0.0746; p < 0.05 and CI = 0.0859; p < 0.05). The concentration index for undiagnosed diabetes was insignificant but pro-poor. The decomposition showed that lifestyle factors contributed 22% and 35% to socioeconomic inequalities in self-reported and total diabetes, respectively. Conclusion Diabetes in South Africa is more concentrated amongst higher socio-economic groups when measured using self-reported diabetes or clinical data. Our findings also show that the extent of inequality is worse in the total diabetes outcome (undiagnosed diabetics + diagnosed diabetics) when compared to the self-reported diabetes outcome. Although in comparison to other determinants, the contribution of lifestyle factors was modest, these contributions are important in the development of policies that address socio-economic inequalities in the prevalence of diabetes.

million people with diabetes are not diagnosed [5]. The magnitude of the unmet need for diabetes care in South Africa has also been previously analysed and reported by Stokes et al. [18]. Using the 2012 South African National Health and Nutrition Examination Survey, the authors find that close to half of individuals with diabetes were undiagnosed [18]. Poorer and less educated people tend to have relatively worse access to medical care for diabetes diagnosis when compared to the more educated and wealthier individuals [16]. As a result, the exclusion of undiagnosed diabetics may produce biased diabetes prevalence and inequality estimates.
Whilst the causes of type 1 diabetes are unknown, the risk of type 2 diabetes is determined by an interplay of factors such as ethnicity, age, socio-economic status and various lifestyle factors [2]. Lifestyle factors such as unhealthy diets, smoking, alcohol consumption and physical inactivity are particularly important for the prevention of type 2 diabetes [2,19,20], which is more common globally [4,5]. The role of modifiable risk factors in explaining the inequality in diabetes has been previously investigated [15,21,22]. Health behaviours such as smoking and alcohol consumption explain between 33-45% of inequalities in the incidence of type 2 diabetes in the United Kingdom [21] and a third of socioeconomic inequalities in type 2 diabetes in a Swedish based study [22]. Using data from the South Africa National Income Dynamics Survey, Mukong et al. finds that smoking and alcohol consumption account for -2.4% and 2.2% of self-reported diabetes inequality in 2014-2015 [15]. The importance of addressing these risk factors is highlighted in these studies and is also entrenched in the World Health Organisation (WHO) Global Status report on non-communicable diseases [23].
Estimating inequalities in diabetes and determining the contributions of avoidable diabetes risk factors to these inequalities can help South Africa in working towards meeting the 2030 sustainable development goal 3 (SDG 3), which targets a reduction in premature deaths due to NCDs (including diabetes). This study therefore aims to (1) describe the prevalence, treatment and control of diabetes among South Africans across various socio-economic groups; (2) to determine socio-economic inequalities in the prevalence of diabetes using the CI; and (3) to examine the contribution of dietary, lifestyle and metabolic risk factors to socio-economic inequalities in diabetes prevalence by conducting a decomposition analysis. Our study makes important contributions to the body of literature on inequalities in diabetes. To the best of our knowledge this is the first South African study to make use of clinical outcomes in addition to self-reported data in the estimation of socio-economic inequalities in diabetes and the only study that allows a more in-depth analysis into the contribution of a number of lifestyle factors to inequalities in diabetes.

Data
Data are taken from the 2012 South African National Health and Nutrition Examination Survey (SANHANES-1) [24]. SANHANES-1 is a nationally representative survey undertaken in April to November 2012 to assess the health and nutrition status of the South African population. It is the first comprehensive national survey on NCDs in South Africa. The survey received clearance from the Research Ethics Committee (REC) of the Human Sciences Research Council (REC 6/16/11/11). Informed consent was obtained from all study participants. A stratified, multi-stage cluster sample design was employed in sampling the households to be included in the survey. The 2001 population census was used to select a total of 1 000 enumeration areas (EAs) from a database of 86 000 EAs. The selection of EAs was stratified by province and locality. In formal urban areas, the selection of EAs was further stratified by race. Based on the master sample of 1 000 EAs, a total of 500 EAs were selected based on the sociodemographic profile of South Africa. A random sample of 20 dwellings was then randomly selected from the EAs, yielding a sample of 10 000 dwellings or so-called visiting points. Out of the 8 168 valid, occupied households (the balance of 1 832 dwellings were vacant or could not be located), 6 306 households residing at these houses agreed to be interviewed (response rate = 77.2%). The dataset includes 26 806 individuals.
The SANHANES-1 survey comprised a questionnaire component and a clinical examination. Three questionnaires were administered during the survey: a household questionnaire, an adult questionnaire and a child questionnaire. In this study, the household questionnaire is the source of data for the wealth index. Information on self-reported diabetes and lifestyle factors is drawn from the adult questionnaire. The adult and child questionnaires were administered in the individual's households. The SANHANES-1 survey did not draw a distinction between type 1 and type 2 diabetes.
Blood samples were collected during the clinical examinations which were conducted at various facilities such as school halls, church halls, primary healthcare facilities, community centres and city halls. The blood samples were collected from individuals, aged 6 years and older, and used for biomarker analysis. The clinical examinations were conducted by experienced medical doctors and nurses on consenting individuals. The blood samples were collected and stored in cooler boxes and delivered to a laboratory within 24 hours. No deviations from established quality control measures were reported.
Our analysis is restricted to individuals above the age of 15 who had a blood sample taken and had non missing information on HbA1c; 17.8% had missing data on the wealth index and were also excluded. The final analytical sample used in this study is 3 438. Details of our exclusion criteria are shown in

Measuring inequality
To measure inequalities in diabetes, this study makes use of the CI. The calculation of the CI requires a measure of socioeconomic status. In this study, a wealth index is used for this purpose. The wealth index was constructed with the aid of Multiple Correspondence Analysis (MCA). The household and living conditions considered in the creation of the wealth index are housing type, water and sanitation services, and a set of thirteen household assets. The full list of thirteen household assets is as follows: ownership of a fridge, television, stove, mobile phone, radio, DVD (digital video disc), washing machine, computer, DSTV (digital satellite television), motorcar, vacuum cleaner, telephone (landline) and internet access. Imputation by iterative binomial and multinomial logistic regression analysis, applied using Stata's mi function, is employed to deal with item non-response. Asset ownership is imputed as a function of the ownership of the twelve other assets, whereas housing type is imputed from information on the material of the wall and roof of a dwelling. The percentage inertia explained by the first dimension is approximately 90%.
The CI is derived from the concentration curve (CC) which plots the cumulative percentage of the health variable against the cumulative percentage of the population ranked by the living standards measure [12] and the CI is measured as twice the area between the CC and the 45 degree line [12]. The CI takes on a value of zero when there is no socioeconomic related health inequality; which means that the health measure (in this case diabetes) is equally distributed across the population. It takes on a positive value when the health measure is more concentrated amongst the richer population and takes on a negative value when the health measure is more concentrated amongst the poorer population [12]. The magnitude of the CI indicates a disproportionate concentration of the health measure among the poor or the rich and takes on a value between +1 and -1. The CI can then be measured as follows: twice the covariance of the health variable and the ranking of the living standards variable r all divided by the mean of the health measure (μ): Since all the health variables in our study are binary, a normalisation process is required to measure inequality. This study makes use of the Erreygers corrected concentration index which is algebraically expressed as shown below [25].
Where μ is the mean of the health variable, CI is the standard CI, b is the maximum value of the health variable (in this case 1) and a is the minimum value of the health variable (in this case 0). Similar to previous studies [26,27] we made use of the conindex command in STATA to estimate inequalities.

Decomposing socio-economic inequality
The CI can be decomposed into the factors that contribute to the measured inequality [28]. A review of the literature showed that there have been various developments in the methods applied in regression based decompositions of bivariate inequalities [28][29][30][31]. Whilst the Wagstaff decomposition technique [28] has been the dominant approach this method is one dimensional only [29,31]. It ignores the correlation between health and the socioeconomic variable but rather focuses on the degree of variation in one variable only [29,31]. Alternative methods have been suggested in the literature [29][30][31]. For example Erreygers and Kessels propose a two dimensional decomposition method that allows an analysis of the two variables (health and income) simultaneously [29]. To this Kessels and Erreygers introduced a structural equation modelling (SEM) approach which uses different sets of variables to explain the health and socioeconomic status variables. Heckley et al [31] makes use of the recentered influence function (RIF) regression approach developed by Firpo et al [32] to decompose the inequalities into their underlying determinants whilst addressing the limitations within the Wagstaff decomposition method. This approach however relies on a suitable identification strategy. Although such approaches do not ignore the bivariate nature of the bivariate rank dependent indices they have been commented on as being data demanding [31]. We adopt the dominant Wagstaff decomposition which also allows comparability with other studies that have used this method within the literature.
Following Wagstaff et al. [28] our health variable h i (diabetes), is linked to a set of explanatory variable x ij by the following linear model.
If we have such linear model as shown in Eq (3) Wagstaff et al. shows that the concentration index for h i can be written as [28]: In Eq (4), CI(h) is the CI for the health variable h (diabetes), � x j is the mean of x j , μ h is the mean of the health variable, CI(x j ) is the CI for x j , GC ε is the generalised CI for the error term. In this equation the first part is the weighted sum of the CI for the variable x j . The weight of each regressor is determined by the elasticity (b � x j ) of h with respect to x j . The second part is the residual socio-economic inequalities in health that cannot be explained by the CI of the regressors. Since we applied the Erreygers normalisation to the calculation of the CI for the socio economic inequalities in diabetes, the corrected CI for the health variable is formulated as: Eq (5) can now be used to decompose socio-economic inequalities in diabetes, showing the contribution of each factor. If the contribution of variable x is positive, then inequality in the health variable would decrease if variable x becomes equally distributed across the socio-economic group, ceteris paribus. The opposite is also true, if a contribution is negative, the absence of inequalities in that variable would result in an increase in inequality, ceteris paribus.
The absolute contribution a variable makes to socio-economic inequality is a product of the elasticity (b � x j ) of diabetes for each variable and the CI for each variable. Therefore, to estimate the contribution, we need to firstly estimate the coefficients of the explanatory variables via a regression. Ordinary Least Squares (OLS), Probit and Generalised Linear Models (GLM) are the three most common regression methods used for decomposition of inequalities [33]. Yiengprugsawan et al. compare these three decomposition approaches and show that the use of a GLM model (with binomial family and identity link) is the best choice when decomposing inequality of a binary variable [33]. Since our outcome variable is binary and following Yiengprugsawan et al. [33] and other studies [34,35], this study makes use of the GLM model for decomposition of the Erreygers CI.
As there is no analytical expression for the computation of the standard errors for the contributions generated from Eq (4) and since the Stata bootstrap prefix command does not work [12,36], a bootstrapping technique was used to generate the standard errors for the absolute contributions. Whilst taking into account the data's sampling structure we applied the bootstrapping method as described in Efron et al. and Efron [37,38] and applied in Ataguba et al. [36]. Bootstrapping allows us to assess sampling variability and obtain statistical inference on the results from the decomposition [39]. A total of 500 replications were used to estimate the standard errors.
Data analysis was conducted in STATA 13 and post-stratification sample weights were used in all analysis to adjust for unequal probabilities of selection and non-response.

Diabetes indicators
From the analytical sample of 3 438, we identified five main diabetes health indicators: total, undiagnosed, diagnosed, treated and controlled diabetes.
1. Total diabetes. Total diabetes was defined as individuals who self-reported being diabetic or had undiagnosed diabetes. The self-reported and undiagnosed diabetes outcomes are explained in more detail below. A total diabetes binary variable was then created to estimate the socio-economic inequalities in total diabetes (see Fig 2). The binary variable took on the following values; 0 = individual did not have undiagnosed or self-reported diabetes, 1 = individual self-reported being diabetic or had undiagnosed diabetes.
2. Undiagnosed diabetes. Among the sub-sample of total diabetics, we calculated the proportion of individuals with undiagnosed diabetes. According to the World Health Organisation the diagnostic criteria for diabetes is an HbA1c level greater than or equal to 6.5% [2]. Consistent with other studies, we defined diabetes as being undiagnosed when an individual did not self-report prior diabetes diagnosis by a physician, did not report currently taking any diabetic medication, and has a glucose test result of greater than or equal to 6.5% [40]. A binary variable is created to estimate inequalities in undiagnosed diabetes within the total diabetic sample (see Fig 2). The binary variable took on the following values; 0 = individual selfreported being diabetic, 1 = individual had undiagnosed diabetes.
3. Diagnosed diabetes. Based on the SANHANES adult questionnaire, individuals were regarded as diabetic if they answered yes when asked if a medical doctor or other healthcare professional had told them that they have high blood sugar or if they answered yes when asked if they are currently taking insulin or tablets to lower their blood sugar levels. A binary variable was then created to estimate the socio economic inequalities in diagnosed diabetes (see Fig 2). The binary variable took on the following values; 0 = individual did not self-report being diabetic, 1 = individual self-reported being diabetic or taking diabetic medication.
4. Treated diabetes. Among the diagnosed sample (self-reported diabetics), we calculated the proportion of diabetics that reported being on diabetic treatment. Diabetic individuals were considered to be taking treatment if they reported currently taking insulin or tablets to lower their blood glucose levels. A binary variable was created which took on the following values, 0 = if a self-reported diabetic individual reported not taking insulin or tablets for the lowering of blood glucose levels, and 1 = if a self-reported diabetic individual reported taking insulin or tablets for the lowering of blood glucose levels (see Fig 2).
5. Controlled diabetes. Among the sample on diabetes treatment we calculated the proportion of individuals with controlled diabetes. Diabetes was defined as controlled if the respondent reported taking diabetes treatment (insulin or tablets) and had an HbA1c test of < 6.5%. A binary variable was then created for diabetes control amongst the treated sample, taking on the following values: 0 = individual was on diabetes treatment and had an HbA1c test of > 6.5%, 1 = if the individual was on diabetes treatment and had an HbA1c test of < 6.5% (see Fig 2).

Explanatory variables-dietary, lifestyle and metabolic risk factors
A systematic review and meta-analysis by Aune et al. shows that all types of physical activity are beneficial in reducing the risks of type two diabetes [41]. Physical activity data was taken from the Global Physical Activity Questionnaire (GPAQ) within the SANHANES survey. To calculate the intensity of physical exercise, we multiplied weekly activity data of walking, moderate intensity activities and vigorous intensity activities by Metabolic Equivalents (MET) values of 3.3, 4.0 and 8.0 respectively [42]. The intensity of physical exercise variable (METminutes) was then used to create a categorical variable. The WHO recommendations on physical activity is achieving a minimum weekly exercise equivalent to 600 MET-minutes [43]. We categorised the physical activity variable as follows: 0-0 MET-minutes, 1 -> 0 < 600 METminutes, 2 -> = 600 < 2000 MET-minutes and 3 -> = 2000 MET-minutes. For unhealthy diet, two measures are used: consumption of fruits and of vegetables. Low fruit and vegetable consumption is referred to as the intake of fewer than five portions a day [44,45]. Fruit and vegetable consumption is included as categorical variables that took on the values of 0 -none, 1 -less than four times a day, 2 -more than four 4 times a day. Evidence also suggests that smoking is associated with diabetes, however the increase in diabetes risk varies with smoking intensity [19]. In the SANHANES-1, respondents were also asked how many manufactured cigarettes they smoke per week, this was included as a continuous variable. Because alcohol is reported to have both beneficial and harmful effects, the association of alcohol and the risk of type 2 diabetes are influenced by alcohol drinking frequencies [20]. Alcohol consumption is therefore included as a categorical variable taking the values 0 -never, 1 -occasional and 2regularly. Body mass index (BMI) was calculated as weight divided by height squared and included as categorical variable that took on the values, 0 -underweight (BMI< 18.5), 1 -normal weight (BMI � 18.5 and <25), 2 -overweight (BMI � 25 and < 30) and 3 -obese (BMI � 30).

Other explanatory variables
Apart from the lifestyle factors and the wealth index, we also included a range of other variables which past literature has shown to influence health [46][47][48]. These variables include gender, residence, age, race, employment status, family history of diabetes, insurance and obesity. Gender was included as a binary variable 1 -male, 2 -female. Residence was included as a binary variable with 0 -urban and 1 -rural. Age was measured in years and included as a categorical variable, 15-35 years, 36-60 years and 61 + years. Race was included as a binary variable with 0 -African, 1 -Non-African (i.e. white, coloured and Indian). Employment was included as a binary variable, 0 -unemployed, 1 -employment. Family history of diabetes and insurance were both included as binary variables, 0 -No and 1 -Yes. Table 1 shows survey weighted descriptive statistics for the study sample according to diabetes outcome. According to the data, the total prevalence rate of diabetes was 11% (total diabetes). Of the total diabetics, 38% were undiagnosed. The prevalence rate of self-reported diabetes was 7%. Of the self-reported diabetics, 61% were on treatment and 31% of those on treatment had controlled diabetes.

Descriptive statistics
In each diabetes health outcome category the sample is predominantly female, resides in urban areas, is unemployed, has no health insurance and is overweight or obese. Our sample is predominantly within the age group of 36 to 60, with the exception of controlled diabetes. Approximately 63% of the undiagnosed sample was made up of Africans. The group that selfreported diabetes was predominantly non-African. However, the majority of respondents under treatment and with controlled diabetes were African. Based on the lifestyle factors, our sample predominantly consumed fruits or vegetable portions less than four times a day and did not consume alcohol. A majority of the individuals who self-reported diabetes did not drink alcohol, and conducted weekly exercise equivalent to more than 2 000 MET-minutes. With the exception of diagnosed diabetes a majority of the respondents within the health outcomes were obese. Fig 3 shows the distribution of diabetes categories by wealth quintile. From the graph it is clear that the distribution of all diabetes categories is not even across wealth index quintiles. Controlled diabetes was highest in the fourth quintile and all other outcomes were highest in the fifth quintile. All diabetes outcomes were lowest in the first quintile. The number of individuals with undiagnosed diabetes appeared to increase with wealth. Table 2 shows that the distribution of lifestyle factors is not even across wealth index quintiles. The majority of respondents who drink alcohol are from the highest wealth quintile. The consumption of fruits and vegetables more than four times a day appears to increase with wealth. The majority of the individuals who do not consume any fruits or vegetables lie within the poorest quintile, whilst the majority that consume fruits and vegetables more than 4 times per week are within the richest quintile. Our results also show that across all wealth quintiles the majority of respondents conducted weekly exercise equivalent to more than 2 000 MET-minutes. Cigarette consumption intensity was highest within the highest wealth quintile. Table 3 shows the corrected Erreygers CIs for each of the three diabetes outcomes. Due to the small sample sizes and loss of statistical power we do not present the CIs for controlled and treated diabetes. The results show that socio-economic inequality is statistically significant for self-reported and total diabetes. All statistically significant CIs are pro-rich, indicating a greater burden of self-reported and total diabetes amongst the higher socio-economic groups. The extent of inequality is worse in the total diabetes outcome when compared to the self-reported diabetes outcome. The concentration index for undiagnosed diabetes was calculated from a Lifestyle and socio-economic inequalities in diabetes different sub-sample of the data and was pro-poor but not statistically significantly different from zero.

Decomposition of socio-economic inequality in diabetes
In order to better understand the lifestyle factors that contribute to inequalities in the diabetes outcomes we conducted a decomposition analysis. The decomposition analysis was conducted only for the measured inequalities in self-reported and total diabetes. Our study does not decompose inequalities in undiagnosed diabetes because our findings show that the measured inequalities in this diabetes outcome was statistically insignificant. Table 4 displays the contribution of lifestyle factors to inequalities in self-reported diabetes and total diabetes. The table shows the margins, the elasticity (product of the coefficient and mean of each explanatory variable), the CI of the explanatory variables, the absolute, percentage and total contributions of lifestyle factors whilst also adjusting for other demographic and socio-economics variables. The table also presents the standard errors for the absolute contributions obtained via a bootstrapping method using 500 replications.
As shown in Table 4, demographic and socio-economics variables that were significantly associated with self-reported diabetes were living in rural areas (p< = 0.05), increasing age (p< = 0.01), wealth (p< = 0.05) and family history of diabetes (p< = 0.01). Factors significantly associated with total diabetes were increasing age (p< = 0.01), being non-African (p< = 0.1), wealth quintile 2 (p< = 0.1) and a family history of diabetes (p< = 0.01). Factors that contributed the most to socioeconomic inequalities in both self-reported and total diabetes were residence (urban or rural dwelling), wealth index (socio-economic status) and age. Our results show that residence explains -34.9% of the inequality in self-reported diabetes and -17.3% of the inequality in total diabetes. Thus, if inequalities in diabetes were determined by this variable alone they would favour the better off. According to Table 4 the wealth index is also a significant contributor to self-reported diabetes (65.77%) and total diabetes (27.10%).The contribution of the wealth index to diabetes inequalities is higher for self-reported diabetes compared to total diabetes because of the different elasticity values. Age category was also another large contributor to inequality, contributing 19.2% to inequalities in self-reported diabetes and 22.1% to inequalities in total diabetes. Race and family history of diabetes also make notable contributions to inequality. The marginal effects in Table 4 show that the lifestyle factors significantly associated with self-reported diabetes were being obese (p< = 0.1), regular alcohol consumption (p< = 0.01) Lifestyle and socio-economic inequalities in diabetes and vegetable consumption less than four times a day (p< = 0.1). Factors significantly associated with total diabetes were obesity (p< = 0.05), smoking (p< = 0.1), alcohol consumption (p< = 0.01) and vegetable consumption more than four times a day (p< = 0.01). Results from the decomposition show that lifestyle factors contributed a total of 22.2% and 34.7% to inequalities in self-reported and total diabetes respectively. Among the lifestyle factors, obesity, alcohol and vegetable consumption made the largest contribution to diabetes inequalities (see Table 4). Obesity contributed approximately 24.8% to inequalities in self-reported diabetes and 35.5% to inequalities in total diabetes. In the absence of inequalities in obesity, inequalities in diabetes would decrease. Vegetable consumption is another important contributor to diabetes inequalities, 5.3% for self-reported diabetes and 12.7% for total diabetes. The positive contribution by vegetable consumption indicates that if vegetable consumption was equally distributed across the wealth index then inequalities in self-reported diabetes would decrease by 5.3% and 12.7% for total diabetes. Alcohol consumption contributed -9.6% to inequalities in self-reported diabetes and -11.7% to inequalities in total diabetes, meaning that if alcohol consumption was distributed equally amongst the population, inequalities in self-reported and total diabetes would increase. Smoking intensity, fruit consumption and physical activity made marginal contributions to diabetes inequalities. The residuals in Table 4 represent the unexplained sources of inequalities.

Discussion
Our paper provides evidence on the socio-economic inequalities in various diabetes outcomes using the CI and identifies the contribution of lifestyle factors to socio-economic inequalities in diabetes prevalence by conducting a decomposition analysis. To the best of our knowledge this is the first paper to incorporate biomarker analysis in the measurement of diabetes inequalities in South Africa and the first to attempt to measure the contribution of various lifestyle factors to socio-economic related inequalities in diabetes. Consistent with the study by Stokes et al., our study documents the high levels of undiagnosed diabetes in South Africa [18]. This study showed that the total prevalence of diabetes in South Africa was 11%, of which 38% were undiagnosed. The poor rates of diagnosis are largely a result of insufficient access to health care and poor health systems [17]. The prevalence of self-reported diabetes was 6.86% of which 61% were on treatment and 31% of those on treatment had controlled diabetes. The poor rates of treatment and control have also been previously reported by Stokes et al. [18] and have been attributed to poor diabetes education and medication adherence [17,18]. Our findings corroborate other related literature that demonstrates the existence of socioeconomic inequalities in diabetes [13,15]. Furthermore, consistent with previous studies that estimated inequalities in self-reported diabetes in South Africa our study finds that the prevalence of self-reported diabetes is pro-rich [13,15]. Although our findings on the inequality in undiagnosed diabetes were not statistically significantly different from zero, we find that the size of the inequality in diagnosed diabetes was further intensified by the inclusion of undiagnosed diabetics. This finding informs the development of studies that seek to produce more robust inequality estimates of NCD prevalence in South Africa. The finding also contributes to international literature that has attempted to use the concentration index to compare the use of self-reported diagnosis versus standardised measures of diagnosis in the estimation of socioeconomic inequalities in health [16,49]. Although the NCDs considered in these study excluded diabetes, the direction of inequality between self-reported chronic diseases and standardised measures of chronic disease diagnosis showed a mixed picture that varied by disease type and country [16,49].
The decomposition of inequalities has become an important tool in influencing policy in inequality studies. Decompositions provide important information on the sources of the observed inequalities. Whilst the largest contributions to the inequalities in diabetes in this study came from residence and socio-economics status, the contributions of lifestyle factors further exacerbate these inequalities and are the focus of this analysis. Various studies have attempted to estimate the contribution of lifestyle factors to health in general [15,[50][51][52] and diabetes specifically [21,22,53]. A study by Borg and Kristensen showed that lifestyle factors and work environment contribute approximately two thirds to the social gradient in selfreported health [50]. Our study shows that lifestyle factors contributed a total of 22.2% and 34.7% to inequalities in self-reported and total diabetes, respectively. Previous studies suggest that these factors explain between 33-45% of inequalities in the incidence of type 2 diabetes in the United Kingdom [21], a third of socioeconomic inequalities in type 2 diabetes in a Swedish based study [22] and 27% when estimated using the Australian Diabetes Obesity and Lifestyle Study [53].
Amongst the lifestyle factors in our study obesity makes the largest contribution to socioeconomic inequalities in both self-reported and total diabetes variable. Stringhini et al, using data from the London Whitehall II Study also finds that amongst the health behaviours in the study obesity was the most important contributor to the relationship between socio economic status and diabetes [21]. Obesity is widely regarded as a risk factor for ill health [23] and type 2 diabetes [2,54]. Consistent with a study by Alaba and Chola, using the South African National Income Dynamics Study (NIDS), we observe a pro-rich distribution of disparities in obesity [55]. Our findings show a much larger contribution of obesity to self-reported and total diabetes (24.8% and 35.5%) than the contribution of obesity to social inequalities in health reported in the London Whitehall II study (18% -20%%) [51]. South Africa is reported to be undergoing an epidemic of overweight and obesity that is closely linked to nutrition changes [56], severely impacting health outcomes.
Although evidence suggests that diets rich in fruits and vegetable are associated with a reduced risk of type 2 diabetes [57][58][59], the mechanisms through which fruits and vegetables consumption influences the diabetes risk is not well established [59]. Whilst some studies showed that the consumption of fresh fruit [58] was associated with a reduction in the risk of type 2 diabetes other literature shows that the reduction in risk is related to the fruit or vegetables sub-types consumed [60]. In particular dietary fibre is reported to regulate insulin which helps reduce diabetes risk [61] and green leafy vegetables are inversely related to diabetes [59,62]. The observed differences are likely a result of the use of food frequency questionnaires rather than biomarkers (such as vitamin C) [59]. In our study vegetable consumption was associated with diabetes prevalence and contributed more to socio-economic inequalities in diabetes when compared to fruit consumption. Consistent with previous literature we find that lower consumption of fruits and vegetables is concentrated amongst those within low socio-economic groups [63,64]. In South Africa factors such as urban migration and globalisation are reported to be the cause of the nutrition transition that has resulted in the consumption of energy-dense foods and sugary beverages [56].
Findings within the literature on the association between alcohol and diabetes have not been consistent [20]. Some studies report a protective effect at moderate consumption levels [20,65], an increased risk at high consumption [65], a protective effect even at high consumption [66] and other studies find that the risk of diabetes in high alcohol consumers is the same as in abstainers [65]. Our study finds that regular and occasional alcohol consumption was negatively correlated with diabetes. Overall the rates of regular or occasional alcohol consumption in our study are quite low (25%), diabetics in our sample were less likely to drink alcohol. Alcohol consumption is reported to have both beneficial and harmful effects on health. Thus, the impact of alcohol consumption on health is a function of the length, volume, patterns and type of alcohol consumed. In our study alcohol made one of the largest contributions to inequalities in diabetes amongst the lifestyle factors. Contrary to other studies we find much larger contributions of alcohol to inequalities in diabetes [15,21].
In our study smoking and physical activity make the smallest contributions to inequalities in diabetes. This is a result of the very small marginal effects and elasticities. Similar to the existing literature [15], we also found no evidence that smoking contributes significantly to inequalities in diabetes, while contrary to the literature, we did not find any evidence that physical activity explains a substantive proportion of inequalities in diabetes [22]. These differences most probably are attributable to differences in behavioural and situational contexts in different countries and settings.

Study strengths and limitations
A major strength of this study is that it made use of an HbA1c test, an objective measure of diabetes. This measure allowed us to measure the prevalence of undiagnosed and total diabetes. The study has some limitations that must be acknowledged. Whilst there are several regression based decomposition methods within the literature our study makes use of the Wagstaff method. Results may differ depending on the decomposition method applied [31]. The American Diabetes Association states that although the risk of developing diabetes increases with age, there is no exact age for the onset of type 1 or type 2 diabetes, thus we were unable to separate type 2 from type 1 diabetics [67]. Despite this, lifestyle factors such as alcohol, physical activity and fruit consumption which are common risk factors for type 2 diabetes were included in our analysis as explanatory variables. Another limitation of the study is the low number of individuals who went to the testing centres and provided a blood sample. Our analytical sample may be prone to self-selection of individuals that went to get blood samples taken as well as those who completed the adult and household questionnaires. We therefore compared our final analytical sample to the 2011 South African census across sex, age, race and province. Compared to the 2011 census our analytic sample contained a larger sample of non-Africans (27% versus 23) and a smaller sample of Africans (72% versus 77%). Our sample also contained fewer individuals within the age category of 15 to 35 years (51% versus 55%). The analytical sample employed in this study therefore is not nationally representative, which means caution is necessary in drawing generalisations from the empirical results. It is also possible that our self-reported data on lifestyle factors suffered from social desirability bias. For example an under reporting of smoking patterns or alcohol consumption could potentially influence the contributions made by these factors to diabetes inequalities.

Conclusion
This paper provides an analysis of the socio-economic inequalities in the prevalence of diabetes and determines the sources of these inequalities with a focus on modifiable lifestyle factors. The paper contributes to the literature on diabetes by making use of a more objective measure of diabetes and highlighting the magnitude of undiagnosed diabetes in South Africa. The study provides evidence that inequality in self-reported and total diabetes is concentrated among the rich. The magnitude of inequality estimates based on self-reported data only would be different when compared to inequality estimates based both on self-reported plus clinical data. The measured inequalities are mostly explained by residence and wealth. The contributions made by lifestyle factors to inequalities in diabetes, are less than the overall contributions of other factors within our model. Although modest, the contributions made by lifestyle factors to inequalities in diabetes provide important information for use in planning of interventions to reduce the burden of diabetes. Our study shows that in comparison to all other lifestyle factors obesity, alcohol consumption and vegetable consumption make large contributions to inequalities in diabetes. These findings are important to policy makers in terms of informing the design of effective strategies and policies for encouraging healthy lifestyles. Future national health surveillance surveys that capture larger numbers of individuals who provide blood samples are an ideal conduit for the monitoring of diabetes and the tracking of socio-economics inequalities in the prevalence, diagnosis and treatment of diabetes.