Pathways between Socioeconomic Disadvantage and Childhood Growth in the Scottish Longitudinal Study, 1991–2001

Socioeconomically disadvantaged children are more likely to be of shorter stature and overweight, leading to greater risk of obesity in adulthood. Disentangling the mediatory pathways between socioeconomic disadvantage and childhood size may help in the development of appropriate policies aimed at reducing these health inequalities. We aimed to elucidate the putative mediatory role of birth weight using a representative sample of the Scottish population born 1991–2001 (n = 16,628). Estimated height and overweight/obesity at age 4.5 years were related to three measures of socioeconomic disadvantage (mother’s education, Scottish Index of Multiple Deprivation, synthetic weekly income). Mediation was examined using two approaches: a ‘traditional’ mediation analysis and a counterfactual-based mediation analysis. Both analyses identified a negative effect of each measure of socioeconomic disadvantage on height, mediated to some extent by birth weight, and a positive ‘direct effect’ of mother’s education and Scottish Index of Multiple Deprivation on overweight/obesity, which was partly counterbalanced by a negative ‘indirect effect’. The extent of mediation estimated when adopting the traditional approach was greater than when adopting the counterfactual-based approach because of inappropriate handling of intermediate confounding in the former. Our findings suggest that higher birth weight in more disadvantaged groups is associated with reduced social inequalities in height but also with increased inequalities in overweight/obesity.


Introduction
The existence of social inequalities in health is well established, with poor health disproportionately burdening those of lower socioeconomic status [1]. There is evidence to suggest that the early years of development play a critical role in the creation of socioeconomic health inequalities which are maintained into adulthood [2,3]. Socioeconomic differences in childhood growth are therefore an important area of research. Such socioeconomic differences in childhood growth may ultimately be tackled by a reduction in socioeconomic inequality itself, but a better understanding of the mediatory pathways through which the effects of deprivation act might give rise to alternative, possibly more achievable, interventions.
The association between lower birth weight and socioeconomic disadvantage has been extensively documented [10,30,31]. Greater weight at birth has been shown to be predictive of increased height [5,32] and body mass index (BMI) [24,33,34] in childhood. Although there is some debate about the causal nature of these relationships, particularly since the associations were drawn from observational studies, several studies have found strong associations between birth weight and height or obesity that were robust to adjustment for several putative confounders [24,32,33,34]. Birth weight is thus a plausible mediator in the associations observed between socioeconomic disadvantage and childhood height and overweight. However, few studies have explicitly examined this [35].
The aim of the current study was to elucidate the putative mediatory role of birth weight in the relationship between socioeconomic disadvantage and childhood growth in terms of height and overweight using a large-scale representative sample of the Scottish population. Furthermore, we aimed to provide a comparison of two different methods for examining mediation.

Materials and Methods Participants
The Scottish Longitudinal Study (SLS) is an anonymised record linkage study covering a 5.3% sample of the Scottish population, selected using 20 birth dates [36]. In contains linked Census and vital registration data from 1991 onwards and, with appropriate permissions, can be linked to health data.
The sample considered for the present analysis are the 43,286 SLS members born from 1991 to 2001. Many analysis variables were from data sources other than the Census: birth registrations, Scottish Morbidity Records, maternity records and Child Health Systems Programme (CHSP) Pre-School data.
The SLS contains no identifiable individual level data, and data are derived from linkages that are anonymised prior to analysis by the research team. Ethical approval for the study was granted by the London School of Hygiene and Tropical Medicine Research Ethics Committee (reference 6418) and by the National Health Service National Services Scotland Privacy Advisory Committee for approval of the health data linkage. occupation/income). Categorical measures of each socioeconomic disadvantage variable were used in the analyses to allow for potential non-linear effects.
Mother's education was derived from 2001 Census data and categorised as: 'no qualifications' , 'GCSE or equivalent' (exams usually taken at age 15/16 years), ' A-level or equivalent' (exams usually taken at age 17/18 years), 'degree or equivalent' .
The SIMD defines small area concentrations of multiple deprivation within Scotland across several different domains [37]. SIMD values were derived using the postcode recorded on the birth record and reflected the SIMD level for that area in 2001. SIMD was analysed in quarters of the observed distribution.
Income data are often missing or poorly measured in observational studies due to their inherent complexity and sensitivity. One solution, recently proposed by Clemens and Dibben [38], is to derive a synthetic measure of weekly income using observed data on occupation. We used the occupation of the SLS member's mother and father reported on the birth record of the SLS member to derive synthetic weekly income using this approach [38]. An estimate of income was made for parents not in paid employment on the basis of typical social security payments. Household income was calculated as the sum of the mother's and father's estimated incomes, and an income equalisation multiplier of 1.6 was applied for single mothers. The final synthetic weekly household income was analysed in quarters of the observed distribution.
Anthropometric measures. The CHSP Pre-School is a programme of pre-school child health reviews carried out by nurses and health visitors in Scotland at a series of designated ages. The CHSP Pre-School was established in 1991 but had a phased implementation across the 15 Health Boards in Scotland. Ten Health Boards (Ayrshire & Arran, Borders, Argyll & Clyde, Fife, Greater Glasgow, Lanarkshire, Lothian, Tayside, Forth Valley and Dumfries & Galloway) had implemented the CHSP Pre-School by 2000 [39] and were included in the present analysis.
Length/height, weight and age data were extracted from the CHSP Pre-School records at the following approximate ages: 6-8 weeks, 8-9 weeks, 21-24 months, 39-42 months and 48 months. It should be noted that in those Health Boards where the implementation of the CHSP Pre-School was relatively late (e.g. Dumfries and Galloway in 2000), some or all of the anthropometric measurements were not available for SLS members born early in the period being considered.
Potential mediator. The mediator of interest was birth weight (<2.50 kg, 2.50-2.99 kg, 3.00-3.49 kg, 3.50+ kg; derived from maternity records). Birth weight was considered as a categorical variable in the analyses to allow for potential non-linear effects.
Intermediate confounders are common causes of the mediator and outcome that are themselves causally affected by the exposure. We considered maternal age at the birth of the study child (<20, 20-24, 25-29, 30-34, 35+ years) and parity prior to the birth of the study child (0, 1, 2+; derived from maternity records) as intermediate confounders The hypothesised causal relationships between the analysis variables are shown in

Statistical methods
A two-stage modelling approach was used to first define the outcomes and then perform mediation analysis. In the first stage the repeated anthropometric measurements were modelled to predict height and weight at the age of 4.5 years. This age was chosen because it is the approximate mean age at the final CHSP Pre-School measurement. In the second stage, mediation of the effect of the different measures of socioeconomic disadvantage on height and overweight at age 4.5 years by birth weight was assessed using two different approaches: i) 'traditional' mediation analysis and ii) counterfactual-based mediation analysis. Stage 1: Growth modelling. Full details of the growth modelling are given in S1 Appendix. Briefly, the repeated measurements of height and weight were modelled separately in males and females using mixed effects Berkey-Reed models [40,41]. All subjects with at least one valid growth measurement were included in the modelling, assuming missingness was at random [42]. The fitted models were used to predict subject-specific height (cm) and weight (kg) at age 4.5 years, with predicted BMI at age 4.5 years derived from the predicted height and weight values (weight (kg)/height (m) 2 ). The age-and sex-specific international overweight cut-offs of Cole et al [43] (17.47 kg/m 2 for males and 17.19 kg/m 2 for females) were then used to define overweight at age 4.5 years (a binary variable) from the predicted BMI. Predicted height and predicted overweight status were the outcomes of interest to be used in the second stage.
Stage 2: Mediation analysis. This step was achieved using two different approaches: i) 'traditional' mediation analysis and ii) counterfactual-based mediation analysis A frequently used approach to mediation analysis, referred to here as a 'traditional' mediation analysis and elsewhere as the 'difference method' [44], is that popularised in a 1986 paper by Baron and Kenny [45]. In this approach two separate regression models are fitted: the first relating the outcome to the exposure (and any confounders) and the second relating the outcome to the exposure and the mediator (and any confounders). In the first model the estimated parameter for the exposure is interpreted as the 'total effect' of the exposure on the outcome, in the second model it is interpreted as the effect of the exposure on the outcome not mediated by the mediator (the 'direct effect'). The difference between the 'total effect' and the 'direct effect' can then be calculated to give the effect of the exposure on the outcome mediated by the mediator (the 'indirect effect'). Estimation is carried out under the standard assumptions of linear regression, i.e. that the error terms in the regression models are uncorrelated with the explanatory variables and have conditional means of zero.
However, this approach is only valid for linear models which do not include exposure-mediator interactions or intermediate confounding [46][47][48]. We adopt it here (despite the presence of intermediate confounding and using a generalised linear model for the binary outcome) as a comparator to the more general counterfactual-based approach (see below). Because the presence of intermediate confounding means we do not expect to obtain unbiased estimates of the direct and indirect effects using the traditional approach, we refer to the 'direct effect' and 'indirect effect' (in inverted commas) when discussing results of this analysis.
For the traditional approach, each derived growth outcome was related to the different indicators of socioeconomic disadvantage using linear or logistic regression (for height and overweight respectively). For each socioeconomic disadvantage measure the following two models were fitted using maximum likelihood estimation: • Adjusted for background confounders (sex, year of birth, Health Board and ethnicity); • Additionally adjusted for the mediator, birth weight.
The regression coefficient for socioeconomic disadvantage in the first fitted model thus gives the estimated 'total effect' , and the same parameter in the second fitted model gives the estimated 'direct effect' , with the difference between them giving the estimated 'indirect effect' . Model specifications were investigated by separately testing the significance of the interaction between the exposure of interest and each covariate (confounders and birth weight) using Wald tests. The 'proportion mediated' was then calculated (on the log-odds ratio (OR) scale for overweight models).
According to the literature, we would expect both a negative direct effect of socioeconomic disadvantage on height and a negative indirect effect of socioeconomic disadvantage on height, resulting in 'consistent mediation' [49]. On the other hand for overweight we would expect a positive direct effect and negative indirect effect, i.e. 'inconsistent mediation' [49]. The proportion mediated was thus calculated as the estimated 'indirect effect' divided by the estimated total effect for consistent mediation, and as the absolute estimated 'indirect effect' divided by the sum of the absolute estimated 'indirect effect' and the absolute estimated 'direct effect' for inconsistent mediation [50].
To account for between-subject differences in the precision of the estimated growth outcomes the analyses were weighted by the inverse of the approximate variance of the predicted growth outcome. A robust standard error estimator was used.
As noted above, traditional mediation analysis has several limitations. In addition to its restriction to linear models, of particular concern in the present study is that intermediate confounding cannot be appropriately handled. As shown in An alternative approach that can appropriately handle non-linear models and intermediate confounding has been proposed in the causal inference literature. It uses counterfactual definitions of direct and indirect effects allowing for generality and greater formality than the traditional approach [51]. Among several possible definitions proposed in this framework we consider here the 'natural direct effect' (NDE) and 'natural indirect effect' (NIE) [46,52]. Concerns regarding the relevance of natural effect estimates have been expressed, particularly in regards to the 'cross-world' assumption often invoked to identify these estimands [53]. We acknowledge that this assumption could not be verified even in a hypothetical experiment. However, our focus is on partitioning the effect of an exposure into separate pathways; controlled effects, which are often suggested as alternative mediation estimands, cannot achieve this. The interpretation of NDE/NIE (what would happen if we could disable the pathway from the exposure to the mediator, and hence to study the strength of the mechanism that involves birth weight) is more appropriate for this type of exploratory mediation analysis.
Let X i be socioeconomic disadvantage, Y i be height at age 4.5 years, M i be birth weight, C i be the set of background confounders (sex, year of birth, Health Board and ethnicity) and L i be the set of intermediate confounders (maternal age and parity) for child i. Let Y i (x) be the value that Y i would take if X i had been set (possibly counter to fact) to the value x, Y i (x,m) be the value that Y i would take if X i and M i had been set to the values x and m, and M i (x) be the value that M i would take if X i had been set to the value x.
The 'total causal effect' (TCE) of X on Y, conditional on C = c, expressed as a mean differ- The TCE, NDE and NIE can be analogously defined on the OR scale, as shown in S2 Appendix.
Identification of these effects requires certain assumptions. Those usually invoked are noninterference, consistency, and no unmeasured confounding of the exposure-mediator, exposure-outcome and mediator-outcome relationships [52,54]. Additionally, because of the presence of intermediate confounders, a version of the cross-world assumption is required. In our applications this took the form of no non-linearities in the effect of the intermediate confounders on the outcome [48].
Estimation was performed via parametric G-computation using Monte Carlo simulation under the additional assumption of correct model specification [54][55][56]. The G-computation procedure is described in S3 Appendix. Using this approach the TCE of each socioeconomic disadvantage variable on height and overweight at age 4.5 years was decomposed into a NDE (not via birth weight) and a NIE (via birth weight) on the mean difference (height) or OR (overweight) scale.
Confidence intervals (CIs) were obtained using a bootstrap approach with 1000 draws. The proportion mediated was calculated using the same approach as in the traditional mediation analysis.
Analyses were restricted to complete records after assessing the extent of the likely selection bias. All analyses were conducted using Stata version 12 [57]. Estimation by G-computation was performed using the 'gformula' command [55].

Results
Growth data availability by age and by subject is shown in S1 Table and S2 Table respectively. 24,703 subjects with at least one valid height observation had height models fitted and height at age 4.5 years predicted. 24,509 subjects with fitted height and weight at age 4.5 years had BMI (and then overweight) at age 4.5 years derived. Of the subjects with derived height and/or overweight at age 4.5 years, 16,628 had complete data on birth weight, all the potential confounders and one or more of the socioeconomic disadvantage measures. The distributions of all the analysis variables are given in Table 1. The distributions of variables in the analysis sample were in general very similar to those in the sample overall (S3 Table). One exception was the relative preponderance of later births in the analysis (due to the phased implementation of the CHSP Pre-School). The distributions of height and overweight by each explanatory variable are given in S4 Table. Traditional mediation analysis There was evidence of a strong, graded association between all measures of socioeconomic disadvantage and predicted height at age 4.5 years in the confounder adjusted models with, for example, the most disadvantaged category for mother's education estimated to be associated with a 0.98 cm (95% CI 0.75, 1.22) lower height relative to the least disadvantaged category ( Table 2). The estimated associations were attenuated on additional adjustment for birth weight. Comparing the most disadvantaged category to the least disadvantaged category the estimated proportion mediated was 43% for maternal education, 40% for SIMD and 88% for synthetic income (Table 2). Table 3 shows the estimated ORs for the associations between each measure of socioeconomic disadvantage and predicted overweight at age 4.5 years. There was only weak evidence of associations for maternal education and SIMD in the confounder adjusted models with, for example, the most disadvantaged category for mother's education estimated to be associated with 26% (95% CI 5, 52) higher odds of overweight relative to the least disadvantaged category. These associations were markedly amplified on additional adjustment for birth weight. Comparing the most disadvantaged category to the least disadvantaged category the estimated proportion mediated was 27% for maternal education, 25% for SIMD and 35% for synthetic income.
There was no evidence of interactions between the exposures of interest and either birth weight or the confounders in any models.

Counterfactual-based mediation analysis
The TCEs for each measure of socioeconomic disadvantage estimated by G-computation (Table 4) were very similar to the corresponding estimated regression coefficients in the traditional mediation analysis (confounder adjusted model, Table 2) as would be expected. The estimated NDEs however were more pronounced than their traditional counterparts leading to smaller percentage mediated. Comparing the most disadvantaged category to the least disadvantaged category, the estimated proportion mediated was 20% for maternal education, 20% for SIMD and 16% for synthetic income. Table 5 reports the estimated NDEs and NIEs for overweight at age 4.5 years (expressed on the OR scale). In each model, in contrast to the results for height, the estimated NDE was less  pronounced than its traditional counterpart. Moreover, the estimated NDEs and NIEs were in the opposite direction, as expected under inconsistent mediation, with evidence of mediation by birth weight (absolute proportion mediated in most disadvantaged category: 18-29%).

Main substantive findings
In this large-scale representative sample of the Scottish population we have elucidated the mediatory role of birth weight in the relationship between socioeconomic disadvantage and childhood growth in terms of height and overweight. Using two different approaches (traditional and counterfactual-based mediation analyses) we found broad agreement that: i) there was a strong, graded, negative 'effect' of each measure of socioeconomic disadvantage on height at age 4.5 years which was mediated to some extent by birth weight and ii) there was a strong, graded, positive direct 'effect' of each measure of socioeconomic disadvantage (with the exception of synthetic income) on overweight at age 4.5 years which was partly masked by inconsistent mediation by birth weight. These findings suggest that increased birth weight in more disadvantaged groups may be associated with reduced social inequalities in height but also with increased inequalities in overweight. Birth weight could potentially be a modifiable target of intervention through, for example, programmes designed to improve maternal nutrition or deter maternal smoking during pregnancy [58,59,60].

Main methodological findings
Several assumptions must hold in order for us to obtain unbiased estimates of the NDE and NIE using G-computation, in particular: i) no unmeasured exposure-mediator confounding, ii) no unmeasured exposure-outcome confounding, iii) no unmeasured mediator-outcome confounding, iv) correct specification of the parametric models. We believe that we included the main exposure-mediator-outcome confounders (sex, year of birth, Health Board, ethnicity) and additional mediator-outcome confounders (maternal age, parity), which should reasonably capture biological, geographical and temporal sources of confounding, so are hopeful that assumptions i), ii) and iii) hold. All covariates entered the models as categorical variables so there was limited scope for model misspecification and we found no evidence of exposuremediator or exposure-confounder interactions in any models, so these were not included (assumption iv)). For continuous outcomes, in the absence of intermediate confounding (but with the above assumptions holding) the direct and indirect effects estimated under the traditional mediation analysis are consistent estimates of the NDE and NIE [61]. When considering height at age 4.5 years the estimated extent of mediation was greater under the traditional mediation analysis than under the counterfactual-based mediation analysis. This suggests that the observed differences in the results are due to intermediate confounding by maternal age and parity, which is correctly controlled for only in the counterfactual-based mediation analysis. This approach thus provides the more appropriate estimate of the extent of mediation. Comparison of the results relating to overweight at age 4.5 years between the two analysis approaches is less straightforward. Even if the above assumptions hold and there is no intermediate confounding then the 'direct effect' and 'indirect effect' estimated under the traditional mediation analysis are not consistent estimates of the NDE and NIE due to the non-collapsibility of the OR [62]. It has recently been shown that the traditional mediation analysis will provide an overestimate (in absolute terms) of the NDE, leading to a corresponding bias in the estimated NIE [44]. In the present study we found the traditional mediation analysis to give an overestimate of the direct effect relative to the counterfactual-based mediation analysis, but it is not possible to distinguish between bias due to the non-collapsibility of the OR and bias due to incorrect control of intermediate confounding.
Although much previous research has suggested birth weight as a plausible mediator of the effect of social disadvantage on height or overweight, few studies have explicitly examined this mediation. Armstrong et al [35] used data from the CHSP Pre-school and analysed health records of 74,500 children aged 3-4 years in 1998-99. In comparing the most deprived (based on area-level Carstairs Deprivation Category) with least deprived, the odds ratio for obesity was 1.30 (95% CI: 1.05, 1.60), and when adjusting for birth weight the adjusted OR increased to 1.43 (95% CI: 1.16, 1.77), giving a proportion mediated of 0.21. Considering the differences in analysis approach, these results are very similar to those in the present study. However, the analysis of Armstrong et al [35] does not correctly account for intermediate confounding.

Representativeness and generalisability
The CHSP Pre-School was implemented in different years in different Health Boards (mean 1994, range 1991 to 2000), meaning that not all births between 1991 and 2001 were eligible to have all childhood anthropometric measurements taken. Those measurements which could not be observed as they preceded the implementation of the CHSP Pre-School are systematically missing. The growth models required a minimum of one observation so that as many children as possible contributed to the analysis, with information borrowed from other children with more complete growth data. The lack of evidence for an interaction between Health Board and any of the exposures with respect to either of the outcomes suggests that our results remain representative of the target population.

Strengths and limitations
There are many strengths to this analysis. The large sample size meant that we could estimate effects precisely. Although the analysis sample was a relatively small proportion of the overall sample the main reason for this (phased implementation of the CHSP Pre-School, described above) should not result in a biased sample-as borne out by the similarity of distributions of variables between the analysis sample and the overall sample. The data were from reliable sources (Census and birth records as part of the SLS, along with linked health data from maternity and CHSP Pre-School records).
There were also some limitations to the analysis. Although we included the main confounders and intermediate confounders, there may be others which we were unable to fully capture, potentially leading to bias. Potential unmeasured confounders could include maternal birth weight, parental anthropometrics, maternal lifestyle factors, and maternal morbidity. There may also be residual confounding due to measurement error or lack of granularity in those confounders that we did include.
Some of the variables included in the analyses may be subject to measurement error. This is a particular concern for synthetic income, for which error may be due to either the observed data on occupation or to the model from which this is converted into synthetic income. Such measurement error would in general lead to attenuation of the association between synthetic income and height or overweight, which may lead to some bias in the estimated direct and indirect effects.
Including a relatively small number of subjects with as few as a single growth measurement, allowing us to retain a larger sample size and reduce the potential for selection bias, may lead to mischaracterisation of the growth trajectories in those individuals. However, down-weighting these subjects in the traditional mediation analysis made very little difference (relative to the unweighted analysis) suggesting that the inclusion of subjects with fewer data points does not significantly affect the results (and certainly not the conclusions) of the study.
When using a two stage modelling approach (e.g. growth modelling followed by mediation analysis using the derived growth outcomes) one should appropriately propagate uncertainty in estimates from the first stage into the second stage model. In the traditional mediation analysis we weighted by the inverse of the approximate variance of the estimated growth outcomes to account for between-subject differences in their precision. While such weighting deals with relative variability in precision it does not allow for complete propagation of the first stage uncertainty, so the precision of the estimated effects may be somewhat overestimated in both analyses. Solutions include bootstrapping the whole process and joint modelling [63], but these may not be practicable for analyses using G-computation. Further research is required in this area.

Conclusions
In this large-scale representative sample of the Scottish population we have elucidated the mediatory role of birth weight in the relationship between socioeconomic disadvantage and childhood growth in terms of height and overweight, and highlighted the importance of correctly accounting for intermediate confounding in mediation analyses. We found a strong, graded, negative effect of socioeconomic disadvantage on height which was mediated to some extent by birth weight and a strong, graded, positive direct effect of socioeconomic disadvantage on overweight which was partly masked by inconsistent mediation by birth weight.
Our findings suggest that higher birth weight in more disadvantaged groups may be associated with reduced social inequalities in height but also with increased inequalities in overweight.