Using graphic modelling to identify modifiable mediators of the association between area-based deprivation at birth and offspring unemployment

Background Deprivation can perpetuate across generations; however, the causative pathways are not well understood. Directed acyclic graphs (DAG) with mediation analysis can help elucidate and quantify complex pathways in order to identify modifiable factors at which to target interventions. Methods and findings We linked ten Scotland-wide databases (six health and four education) to produce a cohort of 217,226 pupils who attended Scottish schools between 2009 and 2013. The DAG comprised 23 potential mediators of the association between area deprivation at birth and subsequent offspring ‘not in education, employment or training’ status, covering maternal, antenatal, perinatal and child health, school engagement, and educational factors. Analyses were performed using modified g-computation. Deprivation at birth was associated with a 7.3% increase in offspring ‘not in education, employment or training’. The principal mediators of this association were smoking during pregnancy (natural indirect effect of 0·016, 95% CI 0·013, 0·019) and school absences (natural indirect effect of 0·021, 95% CI 0·018, 0·024), explaining 22% and 30% of the total effect respectively. The proportion of the association potentially eliminated by addressing these factors was 19% (controlled direct effect when set to non-smoker 0·058; 95% CI 0·053, 0·063) for smoking during pregnancy and 38% (controlled direct effect when set to no absences 0·043; 95% CI 0·037, 0·049) for school absences. Conclusions Combining a DAG with mediation analysis helped disentangle a complex public health problem and quantified the modifiable factors of maternal smoking and school absence that could be targeted for intervention. This study also demonstrates the general utility of DAGs in understanding complex public health problems.


Introduction
Socio-economic deprivation is a risk factor for a wide range of health indicators from birth through adolescence [1][2][3][4][5][6], as well as poorer educational outcomes [2,[7][8][9][10][11]. Deprivation can perpetuate between generations. In 2017, the Scottish government reported a 14�9 percentage point difference in the rate of participation in employment, education or training between pupils from the most deprived areas compared to the least deprived [12]. Stewart et al. demonstrated that, in a Scottish school leavers cohort, increased deprivation at birth was associated with poorer attainment and that poorer attainment on leaving school was associated with increased unemployment, however the analyses were not adjusted for confounders [13].
Elucidating the pathways through which parental deprivation predisposes to offspring deprivation could help to identify modifiable factors that, if tackled, could break the current cycle of 'inherited' health inequalities. Constructing a directed acyclic graph (DAG) has been recommended to guide analyses of neighbourhood health effects [14] and to understand bias and confounding [15][16][17]. When coupled with gformula analysis, they can estimate the proportion of an effect that is explained by a mediator. To the best of our knowledge, this approach has not previously been used to identify factors that mediate the association between deprivation at birth and offspring 'not in education, employment or training' (NEET).
Scotland is well placed to undertake this type of research due to its large number of highquality, national administrative datasets that can be linked at an individual level. This study used record linkage of routinely collected data to construct and analyse a graphical model of the factors that mediate the association between parental and offspring deprivation; measured by area-based deprivation at birth and offspring NEET respectively.

Databases and inclusion criteria
Individual-level data were linked from six Scotland-wide administrative health databases, held by the Information Services Division of the National Health Service (NHS), and four Scotlandwide education databases held by the Scottish Exchange of Educational Data. The linkage process has been described previously [13,18]. The Scottish Morbidity Record (SMR) collects data on admissions to hospital including dates of admission and discharge and International Classification of Diseases (ICD) diagnostic codes for acute (SMR01) and psychiatric (SMR04) hospitals and neonatal (age 0-28 days) units (SMR11). SMR02 collects additional antenatal, obstetric and neonatal data for admissions to maternity hospitals. The Prescribing Information System records data on all medications dispensed in the community. The Child Health Surveillance Programme Pre-School Dataset collects information obtained by health visitors on developmental milestones and feeding. The School Pupil Census, conducted annually in September, collects data on all children attending Scottish local authority-run primary, secondary and special schools. This includes any record, and type, of special educational need and whether a child is looked after by the care system. The Scottish Qualifications Authority collates exam results for all children and the school-leavers' database collects information on school leaver status six months after leaving school. Data on school absences and exclusions are collected prospectively and appended to the School Pupil Census at the end of the relevant school year.
Study inclusion was restricted to singleton children who attended Scottish schools between 2009 and 2013 inclusive and who were born in Scotland. Since pupils are permitted to leave school between the ages of 16 to 18 years, study participants were born between 1991 and 1998.

Exposure, outcome and confounder variables
The exposure was area-based deprivation at birth; derived from the postcode of residence recorded on SMR02 at the time of delivery using the 2012 Scottish Index of Multiple Deprivation. The Scottish Index of Multiple Deprivation is a measure of relative deprivation derived for all postcodes of residence across Scotland. It is calculated from neighbourhood-level measures of 38 indicators across seven domains: income; employment; health; education; housing; geographic access; and crime. General population quintiles were derived and dichotomised into the most deprived quintile and the four less deprived quintiles. Since it is common to reside in student accommodation or with parents for several years after leaving school, postcode of residence was not considered to be a good measure of the offspring's personal socioeconomic status. Therefore, the outcome of offspring not in education, employment or training was used instead. School leaver status was dichotomised into unemployed versus in further/higher education, training or employment six months after leaving school. Ethnicity and sex were treated as potential confounding variables because they did not lie on any of the causal pathways.

Directed acyclic graph
A directed acyclic graph (DAG) was used to visually depict assumptions about causal relationships. Construction of the DAG began prior to data analyses. All possible pairs of variables were systematically assessed applying known and/or published evidence of causal relationships. Unclear associations were discussed by the research team, with input from two Consultant Neonatologists where appropriate.
The DAG was created using Dagitty software [19]. Each node represented a variable of interest. Arrows between nodes denoted causal relationships, pointing from cause to effect, and included even weak assumptions of causal relationships and causal relationships present only in a sub-group. Lack of an arrow denoted confidence that no causal relationship existed based on evidence [20]. The full DAG (S1 Fig) was used in the analyses but a simplified version, with collapsed variables, is also provided (Fig 1) for ease of interpretation.
Antenatal nodes included self-reported smoking during pregnancy, maternal age (<25, 25-29, 30-34, or >34 years), and parity (nulliparous, parous, or multiparous). Perinatal nodes included mode of delivery (assisted versus non-assisted) and APGAR score at 5 minutes (1-3, 4-6, or 7-10). Gestational age and birth weight were used to derive sex-, gestation-specific birthweight centiles as a measure of intrauterine growth restriction. Health visitors recorded developmental milestones across four domains (gross motor, hearing and communication, manipulative skills, and social and behavioural) as normal, abnormal, doubtful, or incomplete. Children classified as doubtful or abnormal for any domain at the 6-8 week, 8-9 month, 22-24 month, or 39-42 month assessments were then coded as having milestone concerns. Due to large amounts of missing data, we were unable to analyse nodes in the DAG for maternal body mass index, drug and alcohol consumption during pregnancy, and breastfeeding, therefore these were marked as unmeasured.
Neonatal records (SMR11) were used to ascertain congenital anomalies (ICD9 740-758 or ICD10 Q00-Q99). Hospital admissions were coded as the total number of admissions to acute or psychiatric units recorded on SMR01 and SMR04 respectively. Admissions secondary to trauma were identified using ICD9 800-999 and ICD10 S00-T98 codes, and an additional binary measure created for ever admitted to hospital secondary to trauma. Prescribing data were used to ascertain whether children had received at least one prescription to treat epilepsy (British National Formulary (BNF) section 4.8), diabetes (BNF section 6.1.1), attention deficit hyperactivity disorder (BNF section 4.4), and depression (tricyclic antidepressant, selective serotonin reuptake inhibitor, mirtazapine or venlafaxine), and two or more prescriptions for inhalers to treat asthma (corticosteroid in addition to long/short acting beta agonist) [21]. Neurodevelopmental disorders were defined as receipt of medication for attention deficit hyperactivity disorder or any school record of special educational need due to autistic spectrum disorder or learning disability. Mental health problems were defined as receipt of medication for depression, previous admission to a psychiatric ward/hospital or a record of special educational need attributed to mental health. The School Pupil Census was also used to identify sensory impairments, adolescent substance misuse, young carers, and children looked after by the care system.
School performance variables included the annual number of absences, annual number of exclusions for challenging/disruptive behaviour, and academic attainment across the last three years of secondary school derived from the total number of awards attained at each level of the Scottish Credit Qualifications Framework and converted into a binary variable: low/basic versus broad/general/high attainment. Absence and exclusion data were only available for years 2009, 2010 and 2012.
Maternal smoking, maternal age, parity, mode of delivery, number of admissions to hospital, ever admitted for trauma, substance misuse, looked after child, absences, exclusions and attainment were considered potentially modifiable factors, in that education, prevention, policy or practice interventions could be directed at these.

Statistical analyses
We adhered to the statistical methods advocated for using DAGs to translate causal assumptions into statistical relationships using non-experimental data [22,23]. Mediation models were specified for each potential mediator for which we had data. Confounders were classified as either background or intermediate confounders. Background confounders confound at least one causal pathway and do not descend from the exposure (deprivation at birth). In contrast, intermediate confounders confound a mediated pathway and are, themselves, descendants of the exposure [24]. Identifying these variables is performed by systematically applying 'd-separation' rules using the Dagitty software [19]. The variables included in each model are listed in S1 Table in S2 File. The mediation models were then estimated using the user-written gformula command in Stata software. Separate models were run for each mediator. Gformula is an implementation of the G-computation procedure which permits mediation modelling in the presence of intermediate confounding, subject to certain assumptions, detailed in the 'Sensitivity analyses' section. Gformula estimates the total causal effect (TCE) of the exposure on the outcome and decomposes this into a natural direct effect (NDE) and a natural indirect effect (NIE) [25]. The NIE is the portion of the TCE that is transmitted through the mediator, whereas the NDE is the portion of the TCE that is transmitted through all other paths that do not involve that mediator.
In the presence of intermediate confounding, the estimated effects are interpreted as randomised interventional analogues of the NDE and NIE [26]. The NDE and NIE are termed 'natural' because the mediator is allowed to take on the value that it would naturally take for each individual if their exposure were set to zero. Gformula also estimates the controlled direct effect (CDE), which is the unmediated effect if the mediator were fixed at the same specified value for all individuals [24]. The CDE and NDE will diverge if there is an interaction between the exposure and the mediator, and the CDE will take on different values depending on the value at which the mediator is fixed. The CDE is informative if one wishes to calculate the potential effects of a population intervention targeting the mediator: the proportion eliminated (PE) is conceptualised as the proportion of the TCE that would be removed if an intervention were implemented to set the mediator to the same value for all members of the population (e.g. an intervention that resulted in everyone becoming a non-smoker) [27]. The proportion eliminated (on the risk difference scale) is calculated as (TCE − CDE(m))/TCE, where m is the value at which the mediator is to be fixed. Since the outcome of all mediation models was binary (unemployed yes/no), gformula's logistic function was used which returns effect sizes in the form of absolute risk differences with 95% confidence intervals (CIs) and p values. Statistical significance was defined as p<0�05. Analyses were performed on a complete case basis and missing values were not imputed, leading to different sample sizes across models.

Sensitivity analyses
Where intermediate confounders exist, mediation analysis rests on certain assumptions regarding the inter-relationships between the variables in the model. One assumption is that there is no exposure-mediator interaction [24]. This was tested for each model using logistic regression. Where evidence of a statistically significant exposure-mediator interaction existed, we included an exposure-mediator interaction term within the relevant mediation model using gformula. However, its inclusion relied on a further assumption of no interaction between the exposure and any intermediate confounders [24]. These interactions were also tested using logistic regression. Where statistically significant, sensitivity analyses were performed to compare the gformula mediation models with and without the exposure-intermediate confounder interaction terms. Where there was evidence of variations in effect size between those models, we ran the mediation model without an exposure-intermediate confounder interaction term but stratified by each level of the intermediate confounders.

Approvals
The study was approved by the NHS National Services Scotland Privacy Advisory Committee. A data processing agreement between Glasgow University and the Information Services Division and a data sharing agreement between Glasgow University and Scottish Exchange of Educational Data were drafted. The NHS West of Scotland Research Ethics Service confirmed that formal NHS ethics approval was not required since the study involved anonymised extracts of routinely collected data with an acceptably negligible risk of identification.

Demographics
The study cohort comprised 217,226 former school pupils of whom 22,719 (10�5%) were NEET six months after leaving school. Study participants whose mothers were in the most deprived quintile were less likely to be Asian and their mothers were younger and more likely to have been multiparous, and to have smoked during pregnancy. They were more likely to have been delivered without obstetric assistance and have had lower gestational age and birthweight. Children born to the most deprived mothers were also more likely to have had milestone concerns, more admissions to hospital, particularly for trauma, neurodevelopmental conditions, and be treated for asthma. They were more likely to have been looked after, had more exclusions from school, and poorer attainment (Table 1).

Controlled direct effects and proportion eliminated
As with the NDEs reported in Table 2, all p-values for the CDEs remained at <0�001. Of the antenatal factors, the PE for smoking during pregnancy was 19% (CDE when set to nonsmoker 0�058; 95% CI 0�053, 0�063), the PE for parity was 27% (CDE when set to nulliparous 0�054; 95% CI 0�051, 0�058) and the PE for younger maternal age was 6% (CDE when set to opposite directions. In light of this, and the very small size of the indirect effect in Table 2, the PE is not reported here for exclusions.

Sensitivity analyses
As indicated in Table 2, exposure-mediator interactions were found for smoking during pregnancy, maternal age, birth weight, hospital admissions, developmental milestone concerns, treated asthma and epilepsy, neurodevelopmental disorders, mental health problems, looked after child status, poor attainment, and school absences and exclusions. This indicates that any causal effect of deprivation at birth on subsequent offspring NEET that is transmitted through these mediators is non-linear, and these results will be sensitive to the prevalence of the mediator in question [28]. Consideration of interactions between the exposure and the intermediate confounders in each of these models found no notable variation in the results with and without the interaction term (see S2 Table in S2 File). Therefore Table 2 reports results from models that omitted these interactions and assumed that the conditions for causal identification were met.

Discussion
Our study demonstrated a small but statistically significant association between deprivation at birth and offspring NEET. Offspring NEET was 7�3 percentage points higher among children born to women in the most deprived quintile. We identified some key mediators, in particular smoking during pregnancy and school absence explained 22% and 30% of the total effect respectively. Importantly, both are modifiable factors, and the estimated proportion of the total effect that could be eliminated through population interventions targeted at these mediators was 19% for smoking and 38% for school absence. Half of the total effect was mediated indirectly through lower educational attainment, which is another potential target for interventions. Therefore, our results are encouraging since they identified a small number of modifiable mediators that make substantial contributions. There are interventions known to reduce smoking during pregnancy, such as nicotine replacement therapy and counselling [29,30]. The Scottish Government have published guidance for education authorities to promote attendance, outlining a number of strategies including parental engagement, pastoral care and providing supported learning [31].
A number of theories have been postulated as to how antenatal and early life factors can affect later health, education and employment outcomes including: altered structural brain development, a cumulative impact of multiple risk factors, family investment versus family stress mechanisms, or structural disadvantages as displayed in the social determinants of health model [4,11,32,33]. The reality is that most public health problems are complex and result from multiple underlying mechanisms that are not easily studied empirically. Our study tackled this complexity through construction of a DAG depicting existing evidence and understanding of pathways that could be systematically analysed. Whilst the use of a DAG is relatively novel in this field, they are increasingly being recognised as a useful tool [14,22,34].
The exposure was area-based deprivation at birth whilst the outcome was offspring NEET; an individual-level indicator of deprivation. We did not have access to individual-level indicators of deprivation for mothers, such as educational level, income or employment status. It can be problematic to use maternal indicators to assess deprivation status; women's earnings may be a poorer reflection of joint income because they earn less than their partners or are housewives [35]. Area-based measures of deprivation can be used as proxies for individual socioeconomic position, although it may underestimate the true individual-level effect [36]. Measuring area-based deprivation in the offspring would not have been appropriate because six months after leaving school offspring commonly still live with their parents or in student accommodation. The outcome used, NEET, incorporates two commonly used individual-level measures of deprivation: employment status and educational level. Our study used routine data on a very large, unselected, national cohort. However, records for multiple births, children born outside of Scotland or born at home, and students attending private schools were unavailable. We estimated these to be around 3%, 12% and 5% of the national population respectively. Record linkage of 10 datasets provided data on a wide range of variables. Nonetheless, 4 variables needed to be omitted due to missing data. Data completeness was improved by combining data from multiple sources. For example, mental health problems were ascertained from school records, hospital admissions and prescriptions.
We performed sensitivity analyses with respect to exposure-mediator and exposure-confounder interactions, relevant to causal identification. There was minimal impact on the initial effect sizes, suggesting that the initial analyses were reliable and robust. The large sample size improved precision of the effect sizes estimates. We believe our DAG is a plausible, albeit simplified and not fully comprehensive, representation of the real-world relationship between deprivation at birth and subsequent offspring NEET. Our models were limited to the data available to us, therefore we acknowledge that there is likely to be residual confounding present due to the omission of several key factors, such as cognition and parenting style. Some of these residual confounders are depicted in the DAG because we suspect that they play an important role in the pathway between deprivation at birth and offspring NEET. It is also worth noting that the variables for which we had data may simply be proxy measures of other unmeasured factors that are more significant to a child's development. Further research using additional datasets could help elicit some of these factors, and it may also be beneficial to expand data collection within existing datasets. For example, maternal and childhood adiposity are important to health but currently not well recorded in administrative datasets [37]. It is also likely that the cognitive ability of both parents and children plays an important role. This information was not available to us and should be included in future studies. A further limitation is that we analysed each mediator separately. It would be informative to work towards more comprehensive models with simultaneous estimation of multiple mediators, however methods for achieving this in the presence of intermediate confounding are not yet well-developed. A related issue is that proportions eliminated do not necessarily have a straightforward additive relationship with each other, particularly if individual mediators cause, or interact with, each other.

Conclusions
To the best of our knowledge this is the first time a DAG has been used to understand the factors that mediate transmission of socio-economic deprivation from parents to offspring. This study illustrates the potential contribution of this novel approach in helping to disentangle such complex problems and specifically identifies key targets for intervention to obviate the perpetuation of health inequalities across generations.