Associations between socio-economic factors and alcohol consumption: A population survey of adults in England

Aim To gain a better understanding of the complex relationships of different measures of social position, educational level and income with alcohol consumption in England. Method Between March 2014 and April 2018 data were collected on n = 57,807 alcohol drinkers in England taking part in the Alcohol Toolkit Study (ATS). Respondents completed the AUDIT-C measure of frequency of alcohol consumption, amount consumed on a typical day and binge drinking frequency. The first two questions were used to derive a secondary measure of quantity: average weekly unit consumption. Socio-economic factors measured were: social-grade (based on occupation), employment status, educational qualifications, home and car ownership and income. Models were constructed using ridge regression to assess the contribution of each predictor taking account of high collinearity. Models were adjusted for age, gender and ethnicity. Results The strongest predictor of frequency of alcohol consumption was social-grade. Those in the two lowest occupational categories of social grade (e.g. semi-skilled and unskilled manual workers, and unemployed, pensioners, casual workers) has fewer drinking occasions than those in professional-managerial occupations (β = -0.29, 95%CI -0.34 to -0.25; β = -0.31, 95%CI -0.33 to -0.29). The strongest predictor of consumed volume and binge drinking frequency was lower educational attainment: those whose highest qualification was an A-level (i.e. college/high school qualification) drank substantially more on a typical day (β = 0.28, 95%CI 0.25 to 0.31) and had a higher weekly unit intake (β = 3.55, 95%CI 3.04 to 4.05) than those with a university qualification. They also reported a higher frequency of binge drinking (β = 0.11, 95%CI 0.09 to 0.14). Housing tenure was a strong predictor of all drinking outcomes, while employment status and car ownership were the weakest predictors of most outcomes. Conclusion Social-grade and educational attainment appear to be the strongest socioeconomic predictors of alcohol consumption indices in England, followed closely by housing tenure. Employment status and car ownership have the lowest predictive power.


Results
The strongest predictor of frequency of alcohol consumption was social-grade. Those in the two lowest occupational categories of social grade (e.g. semi-skilled and unskilled manual workers, and unemployed, pensioners, casual workers) has fewer drinking occasions than those in professional-managerial occupations (β = -0.29, 95%CI -0.34 to -0.25; β = -0.31, 95%CI -0.33 to -0.29). The strongest predictor of consumed volume and binge drinking frequency was lower educational attainment: those whose highest qualification was an A-level (i.e. college/high school qualification) drank substantially more on a typical day (β = 0.28, 95%CI 0.25 to 0.31) and had a higher weekly unit intake (β = 3.55, 95%CI 3.04 to 4.05) than those with a university qualification. They also reported a higher frequency of binge drinking (β = 0.11, 95%CI 0.09 to 0.14). Housing tenure was a strong predictor of all drinking PLOS

Introduction
In England, approximately 17% of adults drink at hazardous levels and around 1% can be classed as dependent [1]. However, there are substantial regional variations and a strong relationship with demographic characteristics, in particular, socio-economic status [2]. Numerous studies have indicated that people with higher socio-economic status tend to consume similar or greater amounts of alcohol than those of lower social-economic status, although the latter group seems to bear a disproportionate burden of negative alcohol-related consequences [3,4]. This phenomenon is known as the Alcohol Harm Paradox [5][6][7]. The complex relationship between socio-economic status and alcohol consumption may be partially driven by variations in drinking patterns [7], but also appears to be dependent on the specific measure of socio-economic status which is used. This is the first paper to our knowledge which has examined how far different measures of socio-economic status are associated with different alcohol consumption measures.
Using population level data we reported previously that whereas social-grade (an occupational based measure) has a U-shaped relationship with consumption, education has an inverse U-shaped relationship [8]. Other studies have also reported that higher educational attainment is associated with higher alcohol consumption [9] and that alcohol-related harm is disproportionately experienced by the most deprived in the lowest social-grade categorises [10,11]. There no longer appears to be an association with car ownership [8], which has been argued to be due to their increased affordability [12]. Studies have also failed to find associations with economic activity measured by employment status [13][14][15]. However, a strong association remains with another material indicator: housing tenure. A higher consumption but lower rate of binge drinking is generally found among those who own their own property [16]. The association with income and wealth is more complex. Despite more severe debt being associated with problem drinking [17], comparable consumption has been found across household income [4,14]. Although these differences offer some insight into what drives harmful alcohol consumption, they may also be reflect associations between other demographic characteristics (e.g. age, gender and ethnicity) and socio-economic measures [18].
The assessment of socio-economic status is a long-standing debate in the addictions field, given its multifaceted nature comprising of economic, social, educational and occupational factors [19]. Each measure has strengths and limitations. For example, income is affected by typically high non-response rates, reporting biases, monthly fluctuations and the fact that retained wealth is not captured [20]. The treatment of those still studying full time and of retirement age is problematic when looking at working status, and car ownership, which was once a strong predictor of health inequalities, no longer discriminates well between socio-economic groups [21]. The use of different measures across studies hinders comparisons and can often result in conflicting conclusions. A socio-economic measure highly predictive of one behaviour may also not be applicable to another. Although many have argued for the use of composite scores, these can involve increased cost and logistical constraints for survey designers as a result of increased survey length [22]. They can also present problems of interpretation and thereby create difficulties for policy development.
One problem in finding optimum measures for a given purpose is that different measures are typically highly correlated. The presence of multicollinearity means that it is difficult to identify those variables producing the largest associations with outcomes of interest using traditional statistical approaches. This is because the inclusion of collinear variables in the same model causes the variance of standard estimates to be inflated. A statistical technique that can overcome this is ridge regression. Ridge regression comes from the machine learning arena and can be seen as a penalised regression approach [23,24], which allows an assessment of the contribution of each independent variable while taking account of high collinearity.
Thus, this paper, applies ridge regression to assess the association between multiple measures of socio-economic status (i.e. income, home ownership, car ownership, education, employment status and social-grade; and a composite of these) with the three AUDIT-C measures: frequency of alcohol consumption, amount consumed on a typical drinking day and frequency of binge drinking, and an estimated mean weekly consumption derived from the AUDIT-QF [25]. Data are used from a large representative survey of adults aged 16+ and adjusted for gender, age and ethnicity. We are unaware of any study which has applied ridge regression to the problem of multicollinearity among socio-economic measures.

Ethical approval
Approval for the study was granted by UCL Ethics Committee (ID 0498/001). The data are collected by Ipsos Mori on behalf of UCL and are anonymised before being received by UCL. Explicit verbal agreement and willingness to answer questions voluntarily is recorded electronically by Ipsos Mori. This is standard protocol and was agreed by the UCL ethics committee. Participants are also given a printed information sheet.

Design
Data were used from the ATS (www.alcoholinengland.info) between March 2014 and April 2018. The ATS involves monthly cross-sectional household computer-assisted interviews, conducted by Ipsos Mori of approximately 1,700 adults aged 16+ and over in England [26]. The baseline survey uses a type of random location sampling, which is a hybrid between random probability and simple quota sampling. England is first split into 171,356 'Output Areas', comprising of approximately 300 households. These areas are then stratified based on ACORN characteristics and geographic region. ACORN is a socio-economic profiling tool developed by CACI (http://www.caci.co.uk/acorn/). The areas are then randomly allocated to interviewers, who travel to their selected areas and conduct the electronic interviews with one member of the household. Interviews are conducted until quotas based upon factors influencing the probability of being at home and tailored to local area census data are fulfilled. Morning interviews are avoided to maximise participant availability. STROBE reporting guidelines are followed in this paper [27].

Measures
Data were collected between March 2014 and April 2018 on participant's age, gender, ethnicity, socio-economic status (SES) and drinking behaviour. Six measures of SES were collected which are outlined below.
1) Social-grade was measured using the British National Readership Survey (NRS) Social-Grade Classification Tool [28]: AB (Higher managerial, administrative or professional), C1 (Supervisory or clerical and junior managerial, administrative or professional), C2 (Skilled manual workers), D (Semi-skilled and unskilled manual workers) and E (Casual or lowest grade workers, pensioners, and others who depend on the welfare state for their income).
3) Educational level in 8 categories (GCSE/O-level/CSE-high school sophomore; vocational qualification-high school senior; A-level or equivalent-high school senior; Bachelor degree or equivalent-university undergraduate; Masters/PhD or equivalent-university post-graduate; other; no formal qualifications-no post 16 qualifications; still studying) 4) Car ownership (owns a car; does not own a car) 5) Working status in 7 categories (Have paid job (full time); have a paid job (part time and over or under 8 hours per week); self-employed; full-time student; still at school; retired; not in paid work (long term illness, housewife or other reason) 6) Housing tenure in six categories (mortgage, owned outright, rented from local authority, rented from private landlord, belongs to housing association and other).
Due to violations of the assumption of linearity and in order to improve interpretation, all variables, except social-grade, were dichotomised or categorised as follows (all variables were coded so that lower SES or greater social-disadvantage reflects higher scores): 1) Income: four quartiles; 2) Education: University education, A-level and equivalent, GCSE/vocational, other/ still studying and none; 3) Working status: Full time job versus no full-time job; and 4) Housing tenure: owner occupied (owned outright or being bought with a mortgage) versus other. These thresholds are based on previous research [8,[29][30][31].
A composite score was also derived to assess how far this added predictive value over any one measure [19,22]. The composite score was coded such that a higher composite score reflected greater social disadvantage. The derived composite score was found to have good internal consistency (standardised Cronbach alpha of: 0.73).
Participants completed the AUDIT-C [32] which measures quantity of alcohol consumed on a typical day, frequency of alcohol consumption and binge drinking (i.e. single occasion high intensity consumption). It has been shown to be a sensitive and coherent measure of alcohol consumption [32,33]. An estimate of mean weekly unit consumption (one unit of alcohol is defined as 10 millilitres (8 grams) of pure alcohol and is a commonly used measure in the UK) was derived from the AUDIT-QF, which comprises of the first two questions of the AUDIT-C measuring quantity and frequency of alcohol consumption [25]. This was calculated by summing the scores for each item using the midpoint of the range in the response options, e.g. 2-3 drinking occasions per week meant 2.5. This AUDIT-QF derived weekly unit consumption measure has been used previously [34] and is in line with alternative measures not derived from AUDIT scores, including those used by the Sheffield Alcohol Policy Model (SAPM) [35]. The UK alcohol guidelines are also based on unit intake per week as opposed to on a typical drinking day [36].

Analysis
The analysis plan was pre-registered on the Open Science Framework (https://osf.io/jub3q/). An amendment was made with an extension of data collection from March 2017 until April 2018. In the original protocol only the first three AUDIT-C questions were considered. It was decided after a discussion among the co-authors, that an estimate of weekly alcohol consumption should also be included. This was derived from the first two questions of the AUDIT (AUDIT-QF). Finally, it was decided to run two sensitivity analyses to check if different results were obtained for the linear regression using complete cases only for income and a missing data indicator. The large amount of missing data for income and use of the other SES variables for imputation may have artificially created stronger relationships between the variables and reduced the power of income in the models.
All analyses were conducted in R version 3.4.4. Prevalence of high-risk drinking was weighted using a rim (marginal) weighting technique. This involves an iterative sequence of weighting adjustments whereby separate nationally representative target profiles are set (for age, social grade, region, tenure, ethnicity, and working status within sex). This process is then repeated until all variables match the specified targets. Missing data were imputed by multiple imputation using the Amelia 11 package [37]. The number of imputed data sets were based on previous recommendations (i.e. n = 20) [38] and results combined using Rubin's Rules [39]. The extent of missing data among the sample of drinkers was as follows: n = 14 (0.02%) for gender, n = 292 (0.51%) for age, n = 214 (0.37%) for ethnicity, n = 246 (0.42%) for car ownership, n = 390 (0.67%) for home ownership, n = 19,173 (33.2%) for income. An SES composite score, based on all six measures of SES, was derived from Multiple Correspondence Analysis (MCA) using the FactoMineR package [40]. Weights for the composite score comprised of those for the first three components; the assumption being that the variation explained by these is sufficient to adequately represent the original values [41]. The composite score was normalised to allow easier comparison with the dummy variables (i.e. it had a range of 0 to 1).
The analysis then proceeded as follows: Association with individual socio-economic status measures. Separate linear models, specifying the Gaussian distribution family, were run to assess the associations between the socio-economic status measures and the four outcome measures of interest. Each model was reported unadjusted and adjusted for only age, gender and ethnicity.
Determining the best socio-economic status predictor. Model fit was compared using adjusted R-squared, AIC and BIC. Higher R-squared values and lower BIC and AIC values indicate a better model fit. Ten-fold cross validation was also performed to assess the predictive validity of each model [42]. Cross validation can be seen as a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Ten-fold cross-validation works by dividing the dataset into ten subsets. Each time, one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. The training is then used to make predictions, and comparisons are made with the actual values in the test set. This gives what is known as the root-mean-square deviation (RMSE) which is the square root of the mean square error and reflects the differences between the actual response values and the predictions. Thus, lower values generally indicate a better prediction model.
To assess the predictive ability of each socio-economic variable when adjusting for all others, ridge regression was performed. The independent variables were too collinear to include in a typical multiple linear regression model. Multicollinearity occurs when highly correlated variables are simultaneously added to a regression model [43] and leads to biased standard errors and unstable p-values [43][44][45].
Ridge regression works by shrinking coefficients, with unimportant terms driven towards zero. The degree of penalisation, λ, is known as the ridge factor and must be estimated prior to data analysis. To choose λ, a cross validation approach was used whereby various models were fitted to the training set with different values of λ. The predictive accuracy of the models was then determined and the one which gave the most regularised simplest model chosen (where the cross-validated error was within one standard error of the model with minimum λ). It should be noted that this leads to coefficients which are slightly biased downwards but with the trade-off of much smaller standard errors and therefore large improvement in the precision of regression coefficients [24].

Results
Between March 2014 and April 2018 data were collected on n = 57,807 (Prevalence: 68.3% 95CI 68.0 to 68.6) drinkers in England taking part in the Alcohol Toolkit Study (ATS). Descriptive statistics are given for the sample in Table 1. Table 2 shows results of the linear regression analyses assessing the association between socioeconomic measures and the four outcome measures of interest before and after adjustment for age, gender and ethnicity. In general, those at greater social disadvantage reported consuming alcohol less frequently, but when they did they consumed larger amounts and were more likely to report 'binge drinking'. There were a few exceptions, with those not in full time work on average having a higher frequency of consumption compared to those in full time work and those with GCSEs/vocational qualifications and with no qualifications less likely to report binge drinking relative to those with a university education. Those in social-grades C2 to E also reported less frequent binge drinking than those in social-grade AB. Table A in S1 File reports these results unadjusted for age, gender and ethnicity. Table 3 and Table B in S1 File give the fit indices and RMSE from the 10-fold cross-validation for the models reported in Table 2 and Table A in S1 File. These suggest that the best predictor of frequency of consumption is social-grade and the composite score is the best predictor of quantity of alcohol consumed, frequency of binge drinking and mean average weekly unit consumption. Educational qualifications appeared to be the next best individual predictor across the outcome measures and housing tenure also performed well. Table 4 reports the results from the best ridge regression models adjusted for gender, age and ethnicity, and all measures of SES. File A in the S1 File and Figures A and B in S1 File describe the ridge regression models at different values of λ. The strongest predictor of frequency of alcohol consumption was social-grade. The strongest predictor of quantity of consumption, binge drinking frequency and weekly unit consumption, was educational attainment. Housing tenure was also a consistently strong predictor across all outcome measures. Educational qualification also acted as a good predictor of frequency of alcohol consumption and social-grade as a good predictor of quantity of alcohol consumption and weekly unit intake. Car ownership and employment status were generally the poorest predictors, while income had some predictive value particularly in the comparison of the highest and lowest earners. Table C in S1 File reports the results from the best ridge models adjusted for all measures of SES but with no adjustment for gender, age and ethnicity.

Sensitivity analysis-complete case analysis for income
The results of the sensitivity analysis are presented in Tables 2 and 3. This shows that after only choosing complete cases and including a missing data indicator that income generally remained a poorer predictor of the outcome measures of interest relative to social-grade, tenure and educational achievements. Of interest, is that those with missing data self-reported a lower frequency and quantity of alcohol consumption and less frequent binge drinking compared to those earning in the upper quartile.

Discussion
In the linear regression analysis, the composite score was found to outperform all six individual SES measures except in the case of frequency of consumption, where social-grade appeared Table 3. to offer the best predictive power. In the ridge regression analysis, the strongest predictor of frequency of alcohol consumption was social-grade, while the strongest predictor of quantity of consumption and binge drinking frequency was educational attainment. Housing tenure Socio-economic factors and alcohol consumption was also a consistently strong predictor while car ownership and employment status were poor predictors of most outcomes. Income offered some predictive value. It is unsurprising that the composite measure outperformed the individual measures of SES [21] in the linear regression analysis but it has disadvantages. Using a composite measure of individual level variables may obscure the underlying mechanisms, as evident by differences in the associations reported here, and prevent understanding of how different aspects of SES contribute to alcohol use. Composite scores also come at greater cost, both financial and logistical in terms of respondent time and number of survey items. Thus, they are not always suitable for survey-based studies. There was one exception: frequency of consumption was better predicted by social-grade, an occupation-based classification system. The social-grade A:E measure has several advantages including its wide use across surveys both in the UK and Europe, allowing for easy comparison, but can be time consuming in itself [46].

Model fit statistics (R-squared, AIC and BIC) and mean squared prediction error from 10-fold cross validation for the regression models presented in
The ridge regression analysis allowed assessment of the specific contribution of each SES predictor while taking account of high collinearity between these predictors. Educational qualification emerged as the best predictor of consumption on a typical drinking day, weekly unit consumption and binge drinking frequency in the ridge regression analysis. The strongest predictor of frequency of alcohol consumption remained social-grade, but this was closely followed by educational attainments. Previous studies have also reported that higher educational attainment is associated with higher alcohol consumption [8,9]. Previous studies have shown strong associations between level of education and alcohol abuse and dependence in later life [47,48], and several possible explanations can be given for this association. The 'human capital' approach would argue that education increases individuals' ability to synthesise information on the health implications of alcohol use or that those with greater educational qualifications have more health-orientated allocation of resources [49]. It may also be that there is no causal association but that future-orientated individuals invest more in their health and are more educated [49]. Alternatively, more educated individuals may prefer healthy habits and avoid unhealthy ones and education is a key component of health literacy [50,51]. Finally, more educated individuals may have more material resources which can help buffer adverse effects of drinking by better nutrition or living in places with less social harm [5,6]. It will be important to try and disentangle what may be driving the association as this could have significant policy implications, including perhaps the targeting of interventions to those without post-16 qualifications. If the association is causal, this also strengthens the economic case for providing free, high quality post-16 education to everyone. It is also of interest that housing tenure was a consistently strong predictor across all outcome measures while car ownership and employment status were poorer predictors. Previous studies have similarly found housing tenure to be strongly related with heavy intake and problem drinking [52]. There are several possible explanations for this, including the local environment and culture of 'owned' homes relative to rented and social housing [53]. Those in social housing also often experience greater levels of depression and poor mental health which themselves are associated with heavy drinking patterns [54]. Previously it had been thought that car ownership was an indicator of affluence due to the costs associated with purchase and maintenance; however, questions have been raised whether it is still an appropriate measure [53], with 75% of households having access to a car and 42.6% multiple vehicles [12].
Household income offered some predictive power and unlike education and occupation measures, gives a good indication of the standard of living and life chances of a household. However, questions regarding personal income are often met with hostility as evident by the large amounts of missing data in the current study. Household members may also not have equal access to the income which blurs the association with alcohol use [55]. This may explain why previous studies have found that the association between alcohol consumption and individual wealth is complex [4,14,17].
These findings have several implications. First, the finding that income, although not the best predictor of alcohol consumption, was still significantly associated supports previous arguments that those of lower socio-economic status have more to gain from the most effective public health alcohol policies-namely, increasing taxation and setting a minimum unit price [56]. Secondly, they provide guidance as to which measures one may wish to use when identifying individuals most at risk from harmful alcohol intake. In effect, this can help to tailor interventions and supports the concept of personalised medicine [57]. Thirdly, although these findings suggest that ideally multiple measures of socio-economic status are used in population surveys, they offer some guidance as to which socio-economic measures to choose when there are financial or logistical constraints and the goal is to assess associations with alcohol frequency and quantity. Finally, the differing associations with frequency of consumption and amount consumed may help to partially explain the AHP [6]. Although those of a higher social-grade consume alcohol more frequently, those with fewer educational attainments drink larger quantities and this may drive the higher rates of alcohol-related harm that lower SES groups experience. Previous studies have shown that lower SES groups are more likely to drink at extreme levels [7].
This study has several advantages including the use of data from a large household survey of adults in England and the widely validated AUDIT questionnaire [32]. However, this study also has several limitations which must be considered. As with all cross-sectional surveys, caution should be taken when assigning cause and effect. It may be the case that SES has a direct influence on drinking behaviour but drinking behaviour may also have an effect on some of the SES measures. For example, those who experience greater alcohol problems may be more likely to become unemployed. Self-report measures are also susceptible to recall bias. Secondly, although this paper assessed a wide range of SES measures which reflect those used previously; the measures did not address the social capital aspect of SES. This is something which may require further consideration, as family and friend networks are associated with health outcomes [58]. Thirdly, despite ridge regression being recommended for multicollinearity problems [59], some have raised concerns about the use of biased regression methods to assign relative importance to independent variables in the presence of multicollinearity [13,60]. Although such concerns should be noted when drawing conclusions, the consistency between the results from the ridge regression and linear regression models gives some validity to the conclusions drawn here. Fourthly, although we adjusted for several demographic characteristics some of these findings may be accounted for by other factors which are correlated with SES, including area level deprivation and marital status. These will be important factors to consider in future research. Fifthly, while the sample was designed to be representative, there is a risk of bias in terms of the characteristics of those who agree to participate. There is also a risk that respondents may underestimate or fail to report their drinking. As with all population level surveys, interviewer effects are also possible whereby answers are affected by the interviewer administering the survey. Finally, this study assessed how socio-economic measures are associated with alcohol consumption but not why they are. Additional qualitative and longitudinal research is needed to address this question. Part of the explanation relates to how the socio-economic measures assess somewhat different (albeit related) constructs, rather than simply being better or worse assessments of socio-economic position.
In conclusion, educational achievements appear to be the best predictors of alcohol use, both measures of frequency and amount consumed, followed closely by social-grade and housing tenure. Employment status and car ownership have less predictive power.
Supporting information S1 File. Table A gives the results on the unadjusted linear regressions assessing the association between individual measures of socio-economic status and frequency, quantity and binge drinking. Table B gives the model fit statistics and mean squared error from 10-fold cross validation for the regression models presented in Table A. Table C gives the results of the ridge regression at optimal values of lambda unadjusted for sex, age and ethnicity. File A gives information on how the best ridge regression model was chosen. Figure A gives the results of the ridge regression at different values of Log(Lambda) for predicting a) frequency, b) quantity and c) frequency of binge drinking (unadjusted). Figure B gives the results of the ridge regression at different values of Log(Lambda) for predicting a) frequency, b) quantity and c) frequency of binge drinking (adjusted). (DOCX)