Over the past decade there has been a marked growth in the use of linked population administrative data for child protection research. This is the first systematic review of studies to report on research design and statistical methods used where population-based administrative data is integrated with longitudinal data in child protection settings.
The systematic review was conducted according to Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) statement. The electronic databases Medline (Ovid), PsycINFO, Embase, ERIC, and CINAHL were systematically searched in November 2019 to identify all the relevant studies. The protocol for this review was registered and published with Open Science Framework (Registration DOI: 10.17605/OSF.IO/96PX8)
The review identified 30 studies reporting on child maltreatment, mental health, drug and alcohol abuse and education. The quality of almost all studies was strong, however the studies rated poorly on the reporting of data linkage methods. The statistical analysis methods described failed to take into account mediating factors which may have an indirect effect on the outcomes of interest and there was lack of utilisation of multi-level analysis.
Citation: Chikwava F, Cordier R, Ferrante A, O’Donnell M, Speyer R, Parsons L (2021) Research using population-based administration data integrated with longitudinal data in child protection settings: A systematic review. PLoS ONE 16(3): e0249088. https://doi.org/10.1371/journal.pone.0249088
Editor: Abraham Salinas-Miranda, University of South Florida, UNITED STATES
Received: September 16, 2020; Accepted: March 11, 2021; Published: March 24, 2021
Copyright: © 2021 Chikwava et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: The principal author (FC) is in receipt of the Australian Government Research Training Program (RTP)(https://www.education.gov.au/research-training-program) Scholarship and The Australian Housing and Urban Research Institute (AHURI)(https://www.ahuri.edu.au/) Scholarship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Population-based administrative data is routinely collected by organisations to deliver services and to monitor, evaluate and improve upon those same services . Some examples of the types of data include administrative health data, disease registries, primary care databases, electronic health records, population registries and birth and death registries . The data may be linked within a single service sector, such as health, or with surveys and across sectors such as education, child protection and corrective services [1, 3, 4]. Bringing together data from various administrative data sources provides a rich repository of data that can be used for research purposes. The linked data enables researchers to study risk and protective factors and to examine outcomes from various databases brought together [5, 6]. The trend of using administrative data for research purposes has increased exponentially [7–13]. To date, there has not been a systematic review that has focussed on methods of analysis of integrated population-based administrative data with longitudinal data in child protection settings.
Population-based administrative data is invaluable in research as it offers complete coverage of a given population which overcomes the imprecision associated with sampling errors . It offers superior statistical power and precision to determine associations between rare exposures and outcomes, and using these samples as sampling frames for subsequent surveys [1, 15–18]. Administrative data is useful when studying causes of complex diseases and conditions as well as assessing outcomes of clinical or therapeutic interventions [17, 19–21]. Use of multiple linked administrative data allows researchers to explore comorbidity and variability in outcomes within target populations and compare these between specific clinical population groups and against outcomes in the general population [22–25]. As the purpose of this systematic review is related to child protection settings, it will be used as an example to elucidate the benefits and limitations of using population-based administrative integrated with longitudinal data in research.
Population-based administrative data allows the study of outcomes among cohorts of hard to reach or high-risk populations such as those in the juvenile justice system, and those involved with the child protection system [15, 26, 27]. For example, child protection administrative data allow longitudinal examination of population-level patterns and trends in child maltreatment and complex multi-level analysis, particularly where the data is linked to individuals who are related [27–31]. The data allow the determination of cumulative incidence of risk and protective factors among various population subgroups with different levels of child protection involvement [22, 32, 33]. Therefore the data allows researchers to trace various trajectories of specific cohorts from birth to adulthood .
Use of child protection administrative data in research reduces the burden on individuals to disclose sensitive or traumatic experiences and also reduces the risk of recall bias, social desirability and stigma, which may occur, for instance, in retrospective self-report of child maltreatment [4, 27]. Administrative data is less prone to selection bias since the data includes the entire population served by the Child Protection Agency. Such data is also used to evaluate the frequency of use, effectiveness and costs of services across populations and over time . Further, using administrative data is more cost-effective and efficient in that data is readily available when needed  and one can avoid the cost and burden associated with face to face data collection.
Despite all the advantages of using population-based administrative date, there are some limitations to using and accessing administrative data. Key variables of interest to researchers are often not recorded since administrative data are primarily collected for the delivery of programs and services . The data may be subjected to biases, such as under-reporting of the incidence of child maltreatment in child protection research or lack of availability of data for some respondents, particularly difficulty in reaching vulnerable groups . In addition, the type of data being collected routinely may lack the depth of information required to answer important research questions . Another important limitation of administrative data is that individual-level socio-economic status (SES) parameters are often not available .
Linked administrative data may be subject to linkage error when some records that should be matched or able to be linked were not linked (missed matches) or records were linked incorrectly (false matches), which could lead to biased estimates of association [7, 38]. There are also data access challenges, such as delays in getting approvals to link datasets, especially getting access to cross-jurisdictional linked datasets . There may be restrictions placed by data custodians on who may access linked data, thereby limiting the ability of researchers to access all the data they may need . Despite the above limitations of using population-based administrative data alone, there are advantages of linking population based administrative data to longitudinal data.
The benefits of conducting longitudinal research in child protection settings are well documented, as this type of research allows researchers to analyse trends, changes in early exposures, risks, behaviours and outcomes over a long period of time [18, 39]. Longitudinal studies are also powerful in that they overcome common issues around temporal associations and causal risk factors for outcomes of child abuse and neglect . Longitudinal studies also allow researchers to update certain information about participants, such as socio-demographic characteristics, and also obtain in-depth information about certain topics and service involvement, which otherwise could not be collected from administrative data alone [18, 40].
Despite the notable benefits of conducting longitudinal studies, they are known to be notoriously expensive as they involve several waves of data collection, and could run for several years before the outcomes of a study are determined . It may be difficult to obtain sufficient numbers of eligible participants, particularly when recruiting hard to reach populations and access to children in out-of-home care is generally tightly controlled, resulting in low response rates . Longitudinal data are also subject to different biases such as under-reporting, recall errors and high attrition rates , resulting in reporting of biased estimates if the biases are not appropriately accounted for in the analysis. A systematic review conducted by Farzanfar, Abumuamar  highlighted the potential for bias and on the reporting of longitudinal studies. Another review by Karahalios, Baglietto , found that 56% of studies had a high risk of bias with regards to attrition. Longitudinal studies also place a high burden on respondents due to frequent contact.
Combining population-based administrative data with longitudinal data has several advantages. For example, linking child protection administrative data to longitudinal data allows use of retrospective administrative data on prenatal or early childhood experiences to determine a trajectory of long term adult outcomes which can be measured from longitudinal data [44–47]. Young people who have had child protection contact are known to have worse outcomes than young people in the general population [48, 49]. Thus, integrating longitudinal data and administrative data enables comparison of outcomes using population level data. Other benefits of linking longitudinal data with administrative data include the following: i) cross-validation of self-reported information from longitudinal surveys with administrative data [26, 38, 50, 51]; ii) reducing data incompleteness and biases inherent in longitudinal data as reported earlier [40, 52, 53]; and iii) overcoming high attrition rates common in longitudinal data [52, 54, 55]. In summary, combining these two data sources increases the usability and possible applications of the data.
Using population-based administrative data integrated with longitudinal data has its own limitations. One of the challenges is the introduction of bias by linking data only where consent has been provided by respondents [1, 56]. Further, the linkage may be of poor quality and the data from administrative records may not exist or be incomplete for many longitudinal participants .
A wide variety of factors affect the accuracy of reported results in child protection settings. These include the reference population, data source, sampling strategy, sample size and analytical factors [41, 57]. While data integration offers unique advantages, it is important to consider various techniques and methods of analysis to report study outcomes and to correct for biases which may be introduced by bringing together data from various sources. When modelling outcomes using administrative data integrated with longitudinal data it is important to consider time between occurrences of events (survival analysis), all possible confounders, and mediating and moderating factors. These may include early childhood experiences, pre-natal and parental risk factors, socio-demographic and environmental factors . Failure to account for these factors may lead to biased estimates and false inference. Sensitivity analysis may be conducted to investigate the extent to which some changes or modifications in the confounding variables may have an effect on reported outcomes. For example, multiple regression models may be constructed involving child maltreatment notifications as a risk factor compared to modelling substantiated maltreatment on outcomes [45, 59].
Some of the considerations that need to be taken into account when analysing these datasets involve methods of dealing with biases in the datasets. Missing data can lead to biased estimates of regression parameters when the probability of missingness is associated with outcomes. Different strategies are used to handle missing data in statistical analyses, such as: i) imputation of missing data, [60, 61]; ii) using maximum likelihood estimation methods to model data from subjects who drop out of the study compared to those who complete the longitudinal study; and iii) weighting the available data using non-response methods to account for missing data [62, 63]. Some concurrence or agreement tests may need to be conducted to determine validity of responses from either data sources [64–66].
Some studies have demonstrated that longitudinal data analysis should account for possible within-subject correlation and different covariance structures of episodes of various disease outcomes over time. Some of the analytical methods used for this include generalized estimating equations (GEE) and mixed-effects models [67–71].
Previous reviews have focused on measurement of the diagnosis of diseases or outcomes, including administrative data characteristics and strengths and limitations of the two data sources [17, 72–74]. A systematic review conducted by Tew, Dalziel  focussed on the use of linked hospital data for research in Australia, thereby limiting the generalisability of the findings. Young and Flack  conducted a review that reported on recent trends of using linked data. Even though this paper used systematic search strategy, it was not published as a systematic review. In addition, the study highlighted areas where linked data is commonly used, particularly in cross-sectorial linked data and areas where its use could be improved, however it did not mention use of longitudinal data to enhance reporting of outcomes. A systematic review conducted by Andrade, Elphinston , highlights the need for future research to focus on collecting better measures for outcomes data and linking data to multiple administrative databases. A systematic review conducted by da Silva, Coeli  examined the issue of consent for data linkage, which is one of the sources of bias in using linked data.
Selecting appropriate statistical analysis of administrative data integrated with longitudinal data can improve the reporting of risk and protective factors related to child protection outcomes. This can be achieved through careful selection of variables and optimal use of the data extracted from the administrative and longitudinal data. The over-arching aim of this review is to provide a synthesis of the different methods of analysis used when administrative data is integrated with longitudinal data and make recommendations about approaches to enhance research findings thereby minimising risk of bias and other limitations. Specifically, the following objectives will be investigated: i) to describe the study designs and methods used in reporting linked administrative data when combined with longitudinal data in child protection settings; and ii) to identify statistical methods, gaps and opportunities in the analysis of administrative data integrated with longitudinal data in child protection settings.
Although research on combining administrative data integrated with longitudinal data in child protection research is available, to the best of our knowledge, no systematic reviews have reported on the statistical methods used when the two data sources are combined. This systematic review is an essential step towards informing policy, practice and future research directions in methodological aspects of using administrative data integrated with longitudinal data in child protection settings.
The systematic review was conducted according to Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) statement  which outlines minimum standards for reporting systematic reviews and meta-analysis. A completed PRISMA checklist is provided in S1 Table. The protocol for this review was registered and published with Open Science Framework (Registration DOI: 10.17605/OSF.IO/96PX8).
To be included in this review, peer reviewed studies needed to have at least one administrative database integrated with a longitudinal data. Selected studies were limited to studies involving child protection settings and published in English only. Studies involving systematic reviews or meta-analysis were excluded. In addition, anecdotes, reviews, book chapters, letters to the editor, editorials and conference abstracts were excluded. Studies had to meet all eligibility criteria to be included in the review.
Information sources and search strategy
The electronic databases Medline (Ovid), PsycINFO, Embase, ERIC, and CINAHL were systematically searched in November 2019 to identify all the relevant studies. In line with the objective of this review, terms were identified in electronic databases that are related to the following three concepts: i) data source (administrative data or population based data); ii) study design (longitudinal study or cohort study or prospective study); and iii) setting (child protection). Searches were conducted using free-text in all databases because we had too few relevant subject headings for our purposes. In addition, websites that provide a publication repository for studies involving linked data, such as the Population Health Research Network, were searched. The reference list of included studies was manually searched to find additional relevant studies. A full search strategy for all databases is shown in S2 Table.
Screening of titles and abstracts of the retrieved studies was conducted between December 2019 and March 2020. The first author screened all titles and abstracts while the second reviewer (LP) independently screened a random selection of 40% of studies to identify the candidate studies for the full text review. The reviewers graded each abstract as eligible, possibly eligible or not eligible (using the inclusion and exclusion criteria defined above). Both reviewers independently screened 100% of full-text studies. Any disagreements about eligibility of full-text studies were settled by discussing the differences in the assessment and reaching a consensus on which studies to include. Five studies were used to pilot the screening criteria, and data extraction process, which were modified after consultation between researchers. Inter-rater reliability using weighted Kappa between the two independent reviewers was established for the abstract selection and quality appraisal of included studies. The weighted Kappa measures the degree of disagreement between the two raters; the greater the disagreement the higher the weight.
Since there is no standard criteria for assessing the quality of study designs involving integration of population-based administrative data and longitudinal data, a combination of three critical appraisal methods for assessing the methodological quality of studies was utilised. The critical appraisal methods were the “Qualsyst” critical appraisal tool by Kmet et al.  (henceforth referred to as kmet checklist), the Guidance for Information about Linking Data sets (GUILD) , which focus on the methodological process of linking data, and the Reporting of studies Conducted using Observational Routinely-collected health Data (RECORD) .
The Kmet checklist has 14 items that use a 3-point ordinal scale (0 = no, 1 = partial, 2 = yes) of which three items were not applicable to our study design. The checklist items assess the study design, description of participants’ characteristics, appropriateness of sampling strategy and sample size, robustness of outcome and exposure variables, analytical methods, estimates of variance, control for confounding and whether conclusions drawn reflect results reported. A Qualsyst score of > 80% was interpreted as strong quality, 60–79% as good quality, 50–59% as adequate quality, and < 50% as poor methodological quality.
The GUILD statement has three broad domains with items within each domain that focus on the data source population and linkability of the dataset, data linkage process and quality of data linkage including accounting for linkage error. The RECORD statement, an extension from the STROBE guidelines, consists of a checklist of 13 items related to the title, abstract, introduction, methods, results, and discussion section of studies and other items relating to routinely collected health data . Three items were selected from the RECORD checklist as they were the only items that did not overlap with the GUILD items; these items were combined with the GUILD statements. Due to the absence of a standard scoring system for the GUILD and RECORD statements, a similar scoring method to Kmet was used. Prior to conducting the quality appraisal, the two reviewers (FC and LP) met to discuss the scoring method for these guidelines.
The second reviewer conducted quality assessment (using Kmet, GUILD and RECORD statements) on a random selection of 40% of the included studies. Any differences in ratings from the two reviewers were settled by discussing the differences in the assessment and reaching a consensus on the final score for each of the quality appraisal methods. The differences for Kmet were defined as any difference in the rating from one category to the next (e.g., when a study was rated as good quality (60–79%) by one reviewer, while the same study is rated as poor quality (<50%) by the other reviewer). However, because most studies received poor GUILD and RECORD ratings, discussions on agreement between scores were conducted for GUILD and RECORD ratings with more than 15% difference for each study.
Data collection process
Comprehensive data extraction forms were developed to extract relevant data from the included studies under the following four headings: study characteristics, administrative data, longitudinal data and statistical methods. The included studies were heterogeneous in terms of study design and quality, therefore a narrative synthesis of the findings of the included studies was conducted.
A total of 1,123 studies were retrieved from the electronic database search and eight from other sources. Out of these, a total of 698 studies remained after duplicates were removed. A total of 664 records did not meet the inclusion criteria, resulting in 34 full-text studies which were assessed for eligibility. The final number of studies that met the inclusion criteria and were included in data synthesis were 30 and of these 10 were identified by manually scrutinising the references of the eligible studies. Fig 1 below shows a flowchart of the search and selection process of the included studies.
Characteristics of included studies
The studies were conducted in a variety of countries with Australia having the highest number of publications (50%), followed by the USA (20%) and the United Kingdom (17%). While all studies were conducted in child protection settings, only a few were specific to out-of-home care settings (20%). The outcomes investigated were varied; the most common outcomes were child maltreatment (30%), mental health (20%), drugs and alcohol abuse (20%), education (17%), domestic violence (7%), and health insurance (7%). Table 1 below shows a summary of all included studies, and Table 2 has more detailed information for each study.
Almost all studies were birth cohorts and they each measured different variables at different points in time. In the majority of studies, baseline data consisted of prenatal or postnatal data as reported by the mothers, while outcome data were obtained during follow-up waves. Six major longitudinal studies were reported from the publications, the main one being the Mater-University Study of pregnancy (MUSP) which was conducted in Queensland, Australia from 1981–2004 [58, 80–82]. While these studies had multiple follow-up waves, the authors mostly reported on the baseline wave and one follow up wave. The duration of follow up from the baseline to the last wave ranged from 3 to 21 years. Each longitudinal study had multiple publications demonstrating that a range of exposures and outcomes can be investigated in linked child protection datasets. There was an almost equal number of males and females reported in 70% of studies, while the gender split was unknown in 9 studies.
The cohort sizes ranged from 1,200 children to approximately 14, 000 children. Most studies (83%) reported only one administrative database that was integrated with the longitudinal data, while 17% had multiple datasets linked and these ranged from census data, psychiatric registers, educational databases, medical aid data, child birth and death reviews. Almost all (97%) of the studies reported a state-wide child protection dataset integrated with the longitudinal data. About 23% of studies from two longitudinal studies reported systematic random sampling method. These studies were the Alaska Pregnancy Risk Assessment Monitoring System (PRAMS) and the Evaluation through Follow-up (ETF) studies.
GUILD  recommend reporting on the following three aspects when reporting on studies using linked datasets: i) description of the population included in the data set i.e. how the data were generated, processed and quality controlled, ii) data linkage processes, and; iii) quality of data linkage including accounting for linkage error. Most studies only reported on one of the steps which is the data linkage method used. Fifty seven percent reported using a deterministic linkage method which mainly involved using a unique personal identification number to link datasets. This linkage method is well established in Scandinavian countries [24, 83], and is increasingly becoming common in other countries. Only two studies reported using probabilistic matching, which involves using a set of non-unique identifiers to link data . Two studies [55, 85] reported using a combination of probabilistic and deterministic methods and nine studies did not report on any linkage methods.
Only four studies reported on the linkage quality. Parrish, Young  reported on the proportion of successful matches, manual review of suspected matches that met a certain probability score threshold,  while two studies from Raghavan, Brown  and reported on the number of records that were linked and unlinked from the source file including statistical differences in linked and unlinked data on key variables.
There are several biases which commonly occur in longitudinal studies . However, for the purposes of this review we report on three of the most common occurring biases, attrition, missing data and selection bias.
Incomplete data is common in longitudinal research, as reflected in this review where missing data were reported in 87% of the studies (Table 3). In the past, three traditional mechanisms of missing data were reported . When missingness is unrelated to the data, this is termed missing completely at random (MCAR), while if the probability of missing data on a variable is unrelated to the value of that variable itself but may be related to the values of other variables in the dataset this is referred to as missing at random (MAR). A mechanism which should not be ignored in longitudinal analysis is termed missing not at random (MNAR) [87, 88]. This refers to missingness that is contingent on the unobserved data, as reported in studies where there was an over-representation of children exposed to child protection agencies with missing data resulting in over-estimation of outcomes in this group compared to the general population [89, 90] and also missing data due to attrition.
Studies in this review reported missing data on certain covariates (MCAR) such as child maltreatment, parental race, paternal income and education and breastfeeding status [47, 52, 81, 91–96]. Missing data were also reported on outcome variables such as those from the Strengths and Difficulties Questionnaire . There are a range of simple to more sophisticated analytical methods of handling missing data that can be applied to reduce bias in reported outcomes. The simplest method reported was listwise deletion [4, 21, 59, 97, 98] and including missing data as a separate category for each covariate in regression analysis (Missing Indicator Method) [47, 81, 93–96]. Sophisticated methods included multiple imputation using Markov chain iterative regression methods (MCMC) , multiple imputation using chained equations (MICE) , and multiple imputation using the fully conditional specification (FCS) method  (S3 Table).
Missing data due to attrition.
Attrition) is a type of missingness that can occur in longitudinal studies, which typically occurs due to loss to follow up, death, emigration or non-return of a survey and withdrawal from the study . Attrition rates were reported for 53% of the studies and the rates ranged from 18% to 65% (Table 3). Even though the attrition rate was not mentioned in almost half of the studies, attrition was described for 63% of all studies. The review identified attrition as occurring due to loss of follow-up or differential attrition occurring among families with reported cases of substantiated maltreatment, those from higher socio-economic disadvantaged backgrounds and among males and indigenous people (particularly among MUSP studies) [4, 21, 46, 82, 97, 98, 101, 102]. Other attrition reported was death or early infant loss [47, 55, 93, 96], non-response  and emigration [47, 55].
Forty seven percent (47%) of all studies mentioned that they conducted some attrition analysis, while 40% reported some methods of correcting attrition loss. While these methods were described in the studies, the analysis output was not shown for all studies. Attrition analysis was conducted to determine if there would be any significant differences in outcomes among participants lost to follow up and those remaining in the study. The main methods of correcting for attrition were inverse probability weighting [46, 58, 59, 81, 101, 103, 104] and propensity score analysis [21, 97, 98], while no specific method was described in some studies . Inverse probability weighting was conducted to the analysis of subjects remaining in the cohort to adjust for loss to follow up to the included subjects to restore the representation of subjects. Propensity score analysis was conducted to determine the impact of differential attrition by inclusion of a weighted variable which takes account of baseline covariates.
Selection bias occurs when there is a systematic difference between those who participate in the study and those who do not (affecting generalisability) [105, 106]. Selection bias was reported for 33% of the studies (Table 3). Selection bias may result in over-estimation of outcomes among young people exposed to child protection compared with young people in the general population . Restricting the study to certain population groups which may not be representative of the entire population of interest may lead to selection bias [55, 85]. In addition, selection bias also occurs if a population of interest possesses certain unique characteristics giving them a higher chance of recruitment to a study compared to the population without those characteristics [93, 95, 96]. Some authors reported conducting weighted analysis in order to account for potential selection bias [46, 103, 104].
Sensitivity analysis is conducted to determine if small changes in exposure or confounding variables alter the significance of reported outcomes in situations where there could be potential measurement errors . Sensitivity analysis was reported for 43% of the studies, but only eight out of the thirteen studies reported the actual method of analysis conducted. Sensitivity analysis was conducted through modifying some covariates, such as child maltreatment, by expanding the definition to include or exclude notified or suspect cases of maltreatment and through measuring multiple forms versus a single form of abuse [21, 52, 58, 59, 81, 104].
Other authors also reported restricting the analysis to groups of people with certain characteristics  or adding  or removing  one or more covariates to the analysis in order to reduce bias. Addition of covariates at subsequent waves resulted in either strengthening, weakening or no change to the effect sizes in some studies . The main sensitivity analysis methods presented in the eight studies were logistic regression [21, 45, 58, 59, 81, 98, 102] and multiple regression analysis  controlling for known confounders and effect modifiers (S3 Table).
There were two groups of statistical methods identified in the study. These included data preparation methods and the main statistical analysis method reported.
Data preparation methods.
Most authors conducted some preliminary data preparation, descriptive or bivariate analysis to address missing data and identify significant covariates to include as confounders in final in multivariate models. Multiple data preparation methods were described and ranged from descriptive statistics to bivariate and simple regression analysis (S3 Table). In addition, multiple imputation, data weighting and propensity analysis procedures were applied to correct for missing data. Some authors did not provide full details of the analytical methods used to correct for missing data. Common descriptive parameters were frequencies, percentages, means, incidence rates and population attributable risk. Chi-square tests (53%) were also commonly reported as a method to determine association of confounders and outcome variables. Other methods included two-sample t-tests (13%), correlation analysis (7%) and to a lesser extent, concordance analysis (3%), logistic regression (3%), and cumulative risk factor analysis (3%).
Main analytical method.
The main method of analysis for each study was identified. These are shown in Table 4. The main analytical method reported by most studies was logistic regression (63%) followed by multiple regression methods (10%). Logistic regression methods were used for analysing risk factors and associated outcomes, attrition analysis and sensitivity analysis. Advanced analytical methods included generalised linear models (GLM) , multinomial logistic regression using Vermunt’s three step Latent Class Analysis approach and Growth Mixture Modelling , and survival analysis using Kaplan-Meier, Cox (proportional hazards) regression and Nelson-Aalen Estimation methods [55, 99]. A few studies used a combination of methods, where in most cases logistic regression was included as one of the main methods [45, 47, 55, 82]. Only one study reported descriptive statistics as their main method of analysis .
The main outcomes evaluated in the studies were standardised and self-reported measures from the main research areas reported in Table 5. There were some notable similarities of reported confounding variables across all studies and most of them (93%) used individual and family characteristics as confounders. These included early childhood experiences, socio-demographic variables, pre-natal exposure and parental (mostly maternal) risk factors. Five studies reported on potential mediating variables, these included school mobility [47, 89], parenting age, education, psychiatric history and poverty , gender , young people’s income, education, marital status, neighbourhood characteristics , smoking and alcohol use [97, 102], receipt of social welfare, education and marital status , race and receipt of public aid . One study  found that parenting and social stress did not moderate the relationship between intimate partner violence and maltreatment. One study reported  the following as potential mediating variables: receipt of social welfare, the young person’s educational achievement, and the young person’s marital status. Only three studies [47, 90, 92] reported some assumptions of statistical tests such as tests for normality and homogeneity in variances before conducting data analysis.
The Kmet, GUILD and RECORD checklists were used to rate the methodological quality of included studies. The results of the quality assessment are shown in Table 6. Based on the “QualSyst” Standard Quality assessment for evaluating primary research papers by Kmet, Cook , the final quality scores ranged from 55% (adequate quality) to 100% (Strong quality) with a median score of 91%, indicating high quality across all studies reviewed. The final quality scores for the GUILD and RECORD checklist ranged from 10% to 79% and only three studies had scores greater than 50%. The median score was 23%, indicating poor quality across all studies reviewed. The inter-rater reliability test was 81% (95%CI: 75%; 88%) for the Kmet scores and 77% (95%CI: 70%; 85%) for the GUILD and RECORD scores.
This systematic review sought to describe the study designs and statistical methods used when administrative data is integrated with longitudinal data in child protection settings and make recommendations about approaches to improve the quality of reporting of research findings, thereby minimising risk of bias and other limitations. There has been a steady growth in the number of studies which use administrative data integrated with longitudinal data in child protection settings since 2000. A total of 30 studies were identified that integrated these data to determine outcomes in the areas of child maltreatment, mental health, drug and alcohol abuse and education. Since the focus of the review was on studies in child protection settings, the main administrative data reported was child protection data.
While most studies had multiple data collection points, the median number of waves reported for the longitudinal studies was two. The findings from this review can be grouped under three themes: i) quality of reporting on data linkage procedures; ii) biases reported; and iii) statistical methods used. Though some systematic reviews have been conducted on administrative data alone or longitudinal data alone in child protection or other settings [26, 110, 111], this is the first systematic review of studies utilising administrative data integrated with longitudinal data in child protection settings.
Quality of reporting on data linkage procedures
Overall, the quality of all studies was strong (Qualsyst median score = 93%), but most of the studies rated poorly on the reporting of data linkage methods (GUILD and RECORD median score = 23%). Only three of the 30 studies [55, 92, 99] described the data linkage procedures in sufficient detail. This is of concern, as a small amount of data linkage errors may lead to significant bias and inconsistencies in estimating parameters of a statistical model. As described in the GUILD , researchers utilising linked data should take account of biases inherent in the data linkage process and account for such biases in the analysis. The GUILD guidelines recommend following three key steps when reporting analyses using linked data: i) describing the population included in the data set (i.e., how the data were generated, processed and quality controlled); ii) describing the data linkage processes; and iii) describing the quality of data linkage, including accounting for linkage error. Similar reporting items are recommended in the RECORD statement .
Harron, Dibben  supports the notion of accounting for linkage errors as recommended by GUILD and RECORD, but states that it may be difficult for researchers to determine the quality of linked data since researchers may not have access to identifiable data. The authors therefore recommend conducting the following three methods to evaluate data linkage quality and identify potential sources of bias: i) post-linkage validation, ii) sensitivity analyses, and iii) comparison of characteristics of linked and unlinked records.
Most authors did not report sufficiently on the population included in the data set and how the data were generated and quality controlled. Most authors provided descriptions of the population in the source data and how the data were collected, but no information was reported on how the data were updated, processed and quality controls. Only a few authors explained how data were cleaned, including standardisation of missing data and treatment of special characters [55, 92, 99], and how manual linkages were conducted by reporting on data mismatches and duplicate cases .
The second GUILD step, which focusses on data linkage processes, was described in sufficient detail by the same authors [55, 92, 99] by reporting on how linkage rates were calculated and how probability match scores were used for weighting. Benchimol, Smeeth  state that the methods of linkage and methods of linkage quality evaluation should be reported by authors, though this information may not be provided by the data linkage unit. Furthermore, information on disclosure controls to reduce the re-identification of individuals from linked data was not reported in any of the studies. However, the majority (80%) of studies reported the method of data linkage (deterministic or probabilistic, or both), including reporting the unique ID that was used as the variable for deterministic linkage.
The last GUILD step involves analysis of linked data which takes linkage error into account. While the quality of data linkage can be determined prior and during data linkage, this step allows researchers to report on linkage error post data linkage. The analysts who conduct data linkage should provide researchers with reports of the data linkage process, including estimates of false and missed matches, so that there is transparency. If there are linkage errors, analysts can determine methods or procedures to correct for this before conducting any analysis, while acknowledging this may not always be possible . Analysts could identify linkage errors by analysing differences or similarities between linked and unlinked data , though this method may introduce additional bias caused by missing records . A simulation exercise developed by Parrish, Shanahan  enables post-estimation of linkage errors. The inclusion of linkage errors into research analyses is an evolving and relatively new area of methodological research. Some methods that have been developed by researchers model simple linkage errors derived from one-to-one matches rather than the more complex many-to-many or many-to-spine match scenarios that exist in modern day production linkage systems. [112, 113]
In longitudinal studies there is commonly missing data for various reasons, such as non-availability of data from specific variables or missing data due to participant attrition. Missing data may result in loss of statistical power, bias in estimation of parameters, and diminish the representativeness of samples in a study . Almost all studies described missing data and a few conducted some analysis to correct for missing data. Biases may occur due to certain population groups being over-represented, for instance Aboriginal children are over-represented in child-protection or out-of-home care systems compared with other young people in Australia. Systematic bias may occur as a result of Aboriginal young people being more often reported and therefore at increased contact with child protection services. Some studies reported over-representation of children in OHC among those with missing school grades and this was corrected by replacing the missing grades with estimated grades (MAR) [89, 90]. If the missing data were not accounted for in the analysis this could have resulted in over or under-estimation of outcomes among the OHC group.
This review shows some variability in the reporting and analysis of missing data. A review conducted by Karahalios, Baglietto  highlighted that there is generally inconsistent reporting of missing data in cohort studies and methods employed to handle missing data in some studies may be inappropriate. While weighting was described as one technique to account for missing data, this method has limitations. For example, standard errors of estimates, such as means and proportions, are larger than they would be if the data were not weighted .
Listwise deletion as a method of handling missing data also has limitations as it requires data to be MCAR . While some studies in this review applied this method it may not be appropriate, particularly if the missing values occur among populations with certain characteristics, such as those lost to follow up who were mostly disadvantaged or are hard to reach. In addition, listwise deletion results in a reduced sample size (and ultimately loss of statistical power), which is a concern particularly among young people with child protection contact where smaller sample sizes are reported compared to comparison groups in the general population.
Most studies reported using logistic regression as a method of analysing the factors associated with reported outcomes. While this method was appropriate to determine the impact of reported outcomes with a binary scale, controlling for multiple confounders, more sophisticated methods of analysis were expected, particularly where mediating or moderating effects of some variables were required. One of the limitations in the reporting of logistic regression analysis was lack of descriptions on why this method was chosen in relation to fulfilling the assumption that there is a linear relationship between the logit of the outcome and each predictor variables. Likewise, with multiple regression methods the assumption of linearity has to be satisfied; this was not often described where linear regression methods were used.
Survival analysis methods were well described and utilised where there were more than two pre-specified time points and these included the Nelson-Aalen Estimation method , the Kaplan-Meier method, and the Cox regression method . Three studies described more advanced methods of analysis which are Multinomial logistic regression model using Vermunt’s three step Latent Class Analysis Approach, Growth mixture modelling and Generalised Linear Model [92, 108]. Sensitivity analysis was conducted particularly when definitions of child maltreatment were altered to either include substantiated maltreatment or reported allegations. Conducting sensitivity analysis prior to data modelling may not be necessary since sensitivity analysis is usually done after a statistical model has been estimated and the results interpreted .
The statistical methods applied to most of the included studies lack the sophistication expected of longitudinal studies with certain covariance structures. The methods used fail to take into account random or systematic error which may be inherent to the measurable observed variables . Failure to account for such errors in the analysis may lead to under or over estimation of the true values of the measured outcomes. This limitation can only be overcome by using techniques such as structural equation modelling (SEM) that estimates latent variables which are not directly observed and which provide a closer estimation to measurement error for each observed variable . Only one study used multi-level modelling; an analytical approach with similar benefits to SEM . These methods were not explored in other studies as a technique for analysing longitudinal data where outcomes are studied over time (i.e., involving multiple data collection points) or accounting for the correlation of individual responses over time. This is surprising given the usefulness of these methods when analysing participants with varying lengths of follow-up due to death and MAR outcomes .
SEM also allows the estimation of the indirect effect of mediating variables on outcomes of interest [121, 122]. Seven studies [21, 47, 58, 89, 93, 97, 102] reported the role of mediating variables, without reporting on the indirect effects that these variables have on outcomes. Most authors reported several logistic regression models per study, whereas SEM is able to model multiple regression equations simultaneously, and hence provides a flexible framework for testing a range of possible relationships between the variables in the model, including mediating effects and possible latent confounding variables [123, 124].
Logistic regression analysis and multiple linear regression analysis assume a direct pathway analysis and, therefore, fail to take into account mediating factors which may have an indirect effect on the outcomes of interest . More recently, Bayesian methods have been proposed as important complementary approaches for testing for mediation and computing the value of the mediation effect (often referred to as Bayesian Mediation Analysis) [125, 126]. Literature has determined that Bayesian methods of analysis are better suited to analyse data with small sample sizes as compared to frequentist methods, though it is important that the prior distribution is correctly specified to avoid obtaining less accurate estimates [117, 127].
Strengths and limitations
This review has several strengths. The systematic search used a comprehensive range of databases including directed search strategies from linked child protection data and longitudinal study websites and manual scrutiny of reference lists were conducted. The integrity of the review process was maintained through quality control procedures including independent assessment of the included and excluded studies. However, the review was limited to peer reviewed studies published in English only, thus limiting the ability to review unpublished studies and studies from non-English speaking countries. Future reviews should consider targeted searches that may uncover literature from other geographic regions such as Asia, Africa and South and Central America.
Recommendations for future research
Overall, the quality of studies was good but the reporting of data linkage procedures was poor. It is important that in future, researchers should conduct adequate data preparation consisting of checking for errors and missing data and ways to address these. Additionally, the generalisability of the findings on the reported studies may be questionable as the reporting omitted important aspects of mediation analysis and ways to overcome bias due to small sample sizes.
The review has shown that it is important that researchers follow the guidelines recommended by the GUILD and RECORD statements to report the quality of data linkage so that there is transparency in the reporting process. While some data linkage communities have recognised the need to improve on their reporting of linkage quality to researchers it remains apparent that there should be improved communication and engagement between researchers and the data linkage units so that the reporting of linkage quality can be provided more routinely and consistently . The poor or lack of transparency in reporting data linkage processes, such as reports on linkage errors, may under or overestimate the quality of studies reported, particularly among the hard to reach populations as exemplified in these studies. The more vulnerable or hard to reach populations are often missed or miss matches, resulting in reduced sample size and loss of statistical power [10, 129].
In addition, our review has also shown that there was lack of reporting or referencing of validated data quality assessments conducted for administrative data. In the context of transparency, accuracy, and reliability of measurement from administrative data sources, it is important to reference validated appraisal tools. Additionally, due to variability in quality criteria for child protection administrative data sets, we recommend that future researchers implement a data quality framework [130, 131]. With the growing use of administrative data it is necessary that data quality indicators are operationalised and reported in studies. For example, leaders in the use of linked administrative data at the Manitoba Centre of Health Policy have identified 5 dimensions of data quality: accuracy, internal validity, external validity, timeliness, and interoperability.
These dimensions of data quality can serve as an important starting point for future reporting of administrative data. However, determining if these dimensions are comprehensive, what exact criteria should be used for each dimension, and the operationalisation of those dimensions into measurable data quality criteria remains elusive. As such, there is need to conduct a Delphi Study [132, 133] among leading experts in the field of administrative data, to establish consensus on the use of these data quality indicators to either be integrated into tools such as the GUILD  and RECORD  guidelines, or to develop a new comprehensive data quality appraisal tool.
Reporting of missing data may be done by following some recommended guidelines such as the STROBE  and RECORD  guidelines. According to these guidelines, the number of individuals used for analysis at each stage of the study should be reported followed by reasons for non-participation or non-response. When it comes to handling missing data, simple to more complex analytical methods should be applied and the method used should take into account the mechanism for missingness . If a wrong technique is applied, this may lead to biased inferences .
If data is MCAR, listwise deletion can be conducted because the reason for missing data is unrelated to the data itself. Pairwise deletion can be used as an alternative to listwise deletion since it preserves more information than listwise deletion . While if data is MAR, analysis of complete records only may be invalid and thus techniques such as multiple imputation and likelihood based methods should be applied, though if not carried out appropriately, this could lead to biased estimates. If the reason for missing data depends on the missing values (NMAR), it is important to account for this by modelling the missing data and thus avoid getting parameters with biased estimates.
Basic regression methods of analysis were reported in most studies. More advanced statistical techniques, such as SEM and Bayesian, should be incorporated in analysis of cohort studies, particularly where small sample sizes are involved and where there are multiple data collection time points and multiple covariates. Multilevel structural equation modelling (ML-SEM) combines the advantages of multi-level modelling and structural equation modelling and further enables researchers to scrutinize complex relationships between latent variables at different levels .
Studies utilising administrative data integrated with longitudinal data in child protection settings were homogenous in nature. Most were birth cohort studies that were integrated with child protection data. There was poor reporting of data linkage processes, whereby only three studies (10%) reported the data linkage process in sufficient detail. A few techniques to account for missing data were reported, but generally lacked sufficient analytical details. The main statistical method of analysis reported in most studies were regression analysis which fail to take into account mediating factors which may have an indirect effect on the outcomes of interest. Furthermore, there was lack of utilisation of multi-level analysis as would have been expected in longitudinal studies reported where an individual’s responses over time are correlated with each other. While a few studies (10%) reported advanced statistical analysis methods, there is an opportunity to implement other advanced techniques in future studies where small samples are involved. Additionally, the methods should account for measurement and linkage errors and missing data due to attrition. The review emphasises the need for more effort to be channelled towards improvements in reporting of data linkage processes through following recommended and standardised data linkage processes, which can be achieved through greater co-ordination among data providers and researchers.
S1 Table. PRISMA checklist.
S2 Table. Search strategy from all databases.
S3 Table. Data preparation methods.
- 1. Calderwood L, Lessof C. Enhancing Longitudinal Surveys by Linking to Administrative Data. 2009. p. 55–72.
- 2. Nicholls SG, Quach P, von Elm E, Guttmann A, Moher D, Petersen I, et al. The REporting of Studies Conducted Using Observational Routinely-Collected Health Data (RECORD) Statement: Methods for Arriving at Consensus and Developing Reporting Guidelines. PLoS One. 2015;10(5):1. pmid:25965407.
- 3. Malvaso CG, Delfabbro PH, Day A. The child protection and juvenile justice nexus in Australia: A longitudinal examination of the relationship between maltreatment and offending. Child Abuse Negl. 2017;64:32–46. pmid:28017908.
- 4. Mills R, Kisely S, Alati R, Strathearn L, Najman JM. Child maltreatment and cannabis use in young adulthood: a birth cohort study. Addiction. 2017;112(3):494–501. pmid:27741369; PQ0004097272.
- 5. Brownell MD, Jutte DP. Administrative data linkage as a tool for child maltreatment research. Child Abuse & Neglect. 2013;37(2–3):120–4. pmid:23260116.
- 6. Doiron D, Raina P, Fortier I. Linking Canadian population health data: maximizing the potential of cohort and administrative data. Can J Public Health. 2013;104(3):e258–61. Epub 2013/07/05. pmid:23823892; PubMed Central PMCID: PMC3880355.
- 7. Gilbert R, Lafferty R, Hagger-Johnson G, Harron K, Zhang L-C, Smith P, et al. GUILD: GUidance for Information about Linking Data sets. Journal of public health (Oxford, England). 2018;40(1):191–8. pmid:28369581.
- 8. Maclean MJ, Taylor CL, O’Donnell M. Relationship between out-of-home care placement history characteristics and educational achievement: A population level linked data study. Child Abuse & Neglect. 2017;70:146. pmid:28609694.
- 9. Maclean MJ, Taylor CL, O’Donnell M. Out-of-Home Care and the Educational Achievement, Attendance, and Suspensions of Maltreated Children: A Propensity-Matched Study. The Journal of pediatrics. 2018;198:287–93.e2. pmid:29724484.
- 10. Randall S, Brown A, Boyd J, Schnell R, Borgs C, Ferrante A. Sociodemographic differences in linkage error: an examination of four large-scale datasets. BMC health services research. 2018;18(1):678. pmid:30176856.
- 11. Ferrante A. The Use of Data-Linkage Methods in Criminal Justice Research: A Commentary on Progress, Problems and Future Possibilities. Current Issues in Criminal Justice. 2009;20:378.
- 12. Vinnerljung B, Franzen E, Gustafsson B, Johansson I-M. Out-of-home care among immigrant children in Sweden: A national cohort study. International Journal of Social Welfare. 2008;17(4):301–11. http://dx.doi.org/10.1111/j.1468-2397.2008.00568.x. 2008-12576-003.
- 13. Young A, Flack F. Recent trends in the use of linked data in Australia. Australian health review: a publication of the Australian Hospital Association. 2018;42(5):584–90. pmid:30145995.
- 14. Putnam-Hornstein E, Needell B, Rhodes AE. Understanding risk and protective factors for child maltreatment: the value of integrated, population-based data. Child abuse & neglect. 2013;37(2–3):116–9. pmid:23260115.
- 15. Carr VJ, Harris F, Raudino A, Luo L, Kariuki M, Liu E, et al. New South Wales Child Development Study (NSW-CDS): an Australian multiagency, multigenerational, longitudinal record linkage study. BMJ Open. 2016;6(2):e009023. Epub 2016/02/13. pmid:26868941; PubMed Central PMCID: PMC4762073.
- 16. Simon A. Using administrative data for constructing sampling frames and replacing data collected through surveys. International Journal of Social Research Methodology. 2014;17(3):185–96. http://dx.doi.org/10.1080/13645579.2012.733176. pmid:2008944310.
- 17. Cohen S, Gilutz H, Marelli AJ, Iserin L, Benis A, Bonnet D, et al. Administrative health databases for addressing emerging issues in adults with CHD: a systematic review. Cardiology in the Young. 2018;28(6):844–53. pmid:29704902.
- 18. Findlay L, Beasley E, Park J, Kohen D, Algan Y, Vitaro F, et al. Longitudinal child data: What can be gained by linking administrative data and cohort data?2018. pmid:32935011
- 19. Jolley RJ, Quan H, Jetté N, Sawka KJ, Diep L, Goliath J, et al. Validation and optimisation of an ICD-10-coded case definition for sepsis using administrative health data. BMJ Open. 2015;5(12). pmid:26700284.
- 20. Kalilani L, Friesen D, Boudiaf N, Asgharnejad M. The characteristics and treatment patterns of patients with Parkinson’s disease in the United States and United Kingdom: A retrospective cohort study. PLoS One. 2019;14(11). pmid:31756215.
- 21. Kisely S, Abajobir AA, Mills R, Strathearn L, Clavarino A, Najman JM. Child maltreatment and mental health problems in adulthood: birth cohort study. British Journal of Psychiatry. 2018;213(6):698–703. pmid:30475193.
- 22. Maclean MJ, Sims SA, O’Donnell M. Role of pre-existing adversity and child maltreatment on mental health outcomes for children involved in child protection: Population-based data linkage study. BMJ Open. 2019;9 (7) (no pagination)(e029675). pmid:31362970.
- 23. Green MJ, Hindmarsh G, Kariuki M, Laurens KR, Neil AL, Katz I, et al. Mental disorders in children known to child protection services during early childhood. The Medical journal of Australia. 2020;212(1):22–8. pmid:31680266.
- 24. Egelund T, Lausten M. Prevalence of mental health problems among children placed in out-of-home care in Denmark. Child & Family Social Work. 2009;14(2):156–65. pmid:105501446. Language: English. Entry Date: 20090529. Revision Date: 20150820. Publication Type: Journal Article.
- 25. Vinnerljung B, Hjern A, Lindblad F. Suicide attempts and severe psychiatric morbidity among former child welfare clients—A national cohort study. J Child Psychol Psychiatry. 2006;47(7):723–33. pmid:16790007.
- 26. Tew M, Dalziel KM, Petrie DJ, Clarke PM. Growth of linked hospital data use in Australia: a systematic review. Australian health review: a publication of the Australian Hospital Association. 2017;41(4):394–400. pmid:27444270.
- 27. Hurren E, Stewart A, Dennison S. New Methods to Address Old Challenges: The Use of Administrative Data for Longitudinal Replication Studies of Child Maltreatment. International Journal of Environmental Research & Public Health [Electronic Resource]. 2017;14(9):15. pmid:28178211.
- 28. O’Donnell M, Nassar N, Jacoby P, Stanley F. Western Australian emergency department presentations related to child maltreatment and intentional injury: population level study utilising linked health and child protection data. Journal of Paediatrics & Child Health. 2012;48(1):57–65. pmid:21988059.
- 29. Gilbert R, Fluke J, O’Donnell M, Gonzalez-Izquierdo A, Brownell M, Gulliver P, et al. Child maltreatment: variation in trends and policies in six developed countries. The Lancet. 2012;379(9817):758–72. pmid:22169108.
- 30. Vinnerljung B, Oman M, Gunnarson T. Educational attainments of former child welfare clients—A Swedish national cohort study. International Journal of Social Welfare. 2005;14(4):265–76. http://dx.doi.org/10.1111/j.1369-6866.2005.00369.x. 2005-12195-003.
- 31. Simoila L, Isometsa E, Gissler M, Suvisaari J, Sailas E, Halmesmaki E, et al. Maternal schizophrenia and out-of-home placements of offspring: A national follow-up study among Finnish women born 1965–1980 and their children. Psychiatry Res. 2019;273:9–14. pmid:30639565.
- 32. Segal L, Nguyen H, Mansor MM, Gnanamanickam E, Doidge JC, Preen DB, et al. Lifetime risk of child protection system involvement in South Australia for Aboriginal and non-Aboriginal children, 1986–2017 using linked administrative data. Child Abuse Negl. 2019;97:1. pmid:31465961.
- 33. Brownell MD, Jutte DP. Administrative data linkage as a tool for child maltreatment research. Child Abuse & Neglect. 2013;37(2):120–4. pmid:23260116
- 34. Østergaard SD, Larsen JT, Petersen L, Smith GD, Agerbo E. Psychosocial Adversity in Infancy and Mortality Rates in Childhood and Adolescence: A Birth Cohort Study of 1.5 Million Individuals. Epidemiology (Cambridge, Mass). 2019;30(2):246–55. pmid:30721168.
- 35. Clarke P, Leal J, Kelman C, Smith M, Colagiuri S. Estimating the Cost of Complications of Diabetes in Australia Using Administrative Health-Care Data. Value Health. 2008;11(2):199–206. pmid:18380631.
- 36. Rotermann M, Sanmartin C, Trudeau R, St-Jean H. Linking 2006 Census and hospital data in Canada. Health reports. 2015;26(10):10–20. pmid:26488823.
- 37. Jutte DP, Roos LL, Brownell MD. Administrative record linkage as a tool for public health research. Annu Rev Public Health. 2011;32:91–108. pmid:21219160.
- 38. Harron K, Dibben C, Boyd J, Hjern A, Azimaee M, Barreto ML, et al. Challenges in administrative data linkage for research. Big data & society. 2017;4(2):2053951717745678. pmid:30381794.
- 39. Wagner M, Kutash K, Duchnowski AJ, Epstein MH. The Special Education Elementary Longitudinal Study and the National Longitudinal Transition Study: Study Designs and Implications for Children and Youth With Emotional Disturbance. Journal of Emotional and Behavioral Disorders. 2005;13(1):25–41. pmid:214915549.
- 40. Meer J, Mittag N. Using Linked Survey and Administrative Data to Better Measure Income: Implications for Poverty, Program Effectiveness and Holes in the Safety Net. W E Upjohn Institute for Employment Research, 2015.
Taplin S. Methodological design issues in longitudinal studies of children and young people in out-of-home care. NSW Centre for Parenting & Research, Research, Funding and Business Analysis: NSW Department of Community Services, 2005.
- 42. Farzanfar D, Abumuamar A, Kim J, Sirotich E, Wang Y, Pullenayegum E. Longitudinal studies that use data collected as part of usual care risk reporting biased results: a systematic review. BMC Medical Research Methodology. 2017;17. pmid:28877680.
- 43. Karahalios A, Baglietto L, Carlin JB, English DR, Simpson JA. A review of the reporting and handling of missing data in cohort studies with repeated assessment of exposure measures. BMC medical research methodology. 2012;12:96. pmid:22784200.
- 44. Putnam-Hornstein E. Report of maltreatment as a risk factor for injury death: a prospective birth cohort study. Child Maltreatment. 2011;16(3):163–74. pmid:21680641.
- 45. Teyhan A, Boyd A, Wijedasa D, MacLeod J. Early life adversity, contact with children’s social care services and educational outcomes at age 16 years: UK birth cohort study with linkage to national administrative records. BMJ Open. 2019;9 (10) (no pagination)(e030213). pmid:31594881.
- 46. Abajobir AA, Najman JM, Williams G, Strathearn L, Clavarino A, Kisely S. Substantiated childhood maltreatment and young adulthood cannabis use disorders: A pre-birth cohort study. Psychiatry Res. 2017;256:21–31. pmid:28622571.
- 47. Olsen RF, de Montgomery CJ. Revisiting out-of-home placed children’s poor educational outcomes-is school change part of the explanation? Children and Youth Services Review. 2018;88:103–13. 2018-19048-014.
- 48. Mendes P. Examining the experiences of young people transitioning from out-of-home care in rural Victoria. Rural Society. 2012;21(3):198–209. http://dx.doi.org/10.5172/rsj.2012.21.3.198. pmid:1268829248.
- 49. Campo M, Commerford J. Supporting young people leaving out-of-home care. Child Family Community Australia Information Exchange, CFCA PAPER NO 41. 2016.
- 50. Hilder L, Walker JR, Levy MH, Sullivan EA. Preparing linked population data for research: cohort study of prisoner perinatal health outcomes. BMC Medical Research Methodology. 2016;16. pmid:27312027.
- 51. Mars B, Cornish R, Heron J, Boyd A, Crane C, Hawton K, et al. Using Data Linkage to Investigate Inconsistent Reporting of Self-Harm and Questionnaire Non-Response. Archives of suicide research. 2016;20(2):113–41. pmid:26789257.
- 52. Mills R, Scott J, Alati R, O’Callaghan M, Najman JM, Strathearn L. Child Maltreatment and Adolescent Mental Health Problems in a Large Birth Cohort. Child Abuse & Neglect: The International Journal. 2013;37(5):292–302. http://dx.doi.org/10.1016/j.chiabu.2012.11.008. pmid:1361830227; EJ1001117.
- 53. Bell MF, Bayliss DM, Glauert R, Ohan JL. Using linked data to investigate developmental vulnerabilities in children of convicted parents. Developmental Psychology. 2018;54(7):1219–31. pmid:29620388
- 54. Eyawo O, Hull MW, Salters K, Samji H, Cescon A, Sereda P, et al. Cohort profile: the Comparative Outcomes And Service Utilization Trends (COAST) Study among people living with and without HIV in British Columbia, Canada. BMJ Open. 2018;8(1). pmid:29331972.
- 55. Parrish JW, Shanahan ME, Schnitzer PG, Lanier P, Daniels JL, Marshall SW. Quantifying sources of bias in longitudinal data linkage studies of child abuse and neglect: measuring impact of outcome specification, linkage error, and partial cohort follow-up. Injury Epidemiology. 2017;4(1):1–13. pmid:28066870.
- 56. Knies G, Burton J, Sala E. Consenting to health record linkage: evidence from a multi-purpose longitudinal survey of a general population. BMC Health Services Research. 2012;12:52. pmid:22390416.
- 57. Knight ED, Runyan DK, Dubowitz H, Brandford C, Kotch J, Litrownik A, et al. Methodological and Ethical Challenges Associated with Child Self-Report of Maltreatment: Solutions Implemented by the LongSCAN Consortium. Journal of Interpersonal Violence. 2000;15(7):760–75. pmid:60387533.
- 58. Abajobir AA, Kisely S, Williams GM, Clavarino AM, Najman JM. Substantiated Childhood Maltreatment and Intimate Partner Violence Victimization in Young Adulthood: A Birth Cohort Study. Journal of Youth & Adolescence. 2017;46(1):165–79. pmid:27624702.
- 59. Kisely S, Abajobir AA, Mills R, Strathearn L, Clavarino A, Gartner C, et al. Child Maltreatment and Persistent Smoking From Adolescence Into Adulthood: A Birth Cohort Study. Nicotine & tobacco research: official journal of the Society for Research on Nicotine and Tobacco. 2020;22(1):66–73. pmid:30874810.
- 60. Allen B. Children with Sexual Behavior Problems: Clinical Characteristics and Relationship to Child Maltreatment. Child Psychiatry Hum Dev. 2017;48(2):189–99. pmid:26923833.
- 61. Asendorpf JB, van de Schoot R, Denissen JJA, Hutteman R. Reducing bias due to systematic attrition in longitudinal studies: The benefits of multiple imputation. International Journal of Behavioral Development. 2014;38(5):453–60.
- 62. Wolke D, Waylen A, Samara M, Steer C, Goodman R, Ford T, et al. Selective drop-out in longitudinal studies and non-biased prediction of behaviour disorders. Br J Psychiatry. 2009;195(3):249–56. Epub 2018/01/02. pmid:19721116
- 63. Gustavson K, von Soest T, Karevold E, Røysamb E. Attrition and generalizability in longitudinal studies: findings from a 15-year population-based study and a Monte Carlo simulation study. BMC public health. 2012;12:918. pmid:23107281.
- 64. Tajima EA, Herrenkohl TI, Huang B, Whitney SD. Measuring child maltreatment: a comparison of prospective parent reports and retrospective adolescent reports. The American journal of orthopsychiatry. 2004;74(4):424–35. pmid:15554804.
- 65. Naicker SN, Norris SA, Mabaso M, Richter LM. An analysis of retrospective and repeat prospective reports of adverse childhood experiences from the South African Birth to Twenty Plus cohort. PLoS One. 2017;12(7). pmid:28746343.
- 66. Baldwin JR, Reuben A, Newbury JB, Danese A. Agreement Between Prospective and Retrospective Measures of Childhood Maltreatment: A Systematic Review and Meta-analysis. JAMA psychiatry. 2019;76(6):584–93. pmid:30892562.
- 67. Ballinger GA. Using Generalized Estimating Equations for Longitudinal Data Analysis. Organizational Research Methods. 2004;7(2):127–50. pmid:195090481.
- 68. Yoon S, Bellamy JL, Kim W, Yoon D. Father Involvement and Behavior Problems among Preadolescents at Risk of Maltreatment. Journal of Child and Family Studies. 2018;27(2):494–504. pmid:29491703.
- 69. Crowne SS, Gonsalves K, Burrell L, McFarlane E, Duggan A. Relationship Between Birth Spacing, Child Maltreatment, and Child Behavior and Development Outcomes Among At-Risk Families. Maternal and Child Health Journal. 2012;16(7):1413–20. pmid:22057656.
- 70. Kohl PL, Kagotho JN, Dixon D. Parenting Practices among Depressed Mothers in the Child Welfare System. Soc Work Res. 2011;35(4):215–25. pmid:923252141.
- 71. Compier-de Block LH, Alink LR, Linting M, van den Berg LJ, Elzinga BM, Voorthuis A, et al. Parent-Child Agreement on Parent-to-Child Maltreatment. Journal of Family Violence. 2017;32(2):207–17. pmid:28163367.
- 72. Macdonald KI, Kilty SJ, van Walraven C. Chronic rhinosinusitis identification in administrative databases and health surveys: A systematic review. The Laryngoscope. 2016;126(6):1303–10. pmid:26649650.
- 73. Leong A, Dasgupta K, Bernatsky S, Lacaille D, Avina-Zubieta A, Rahme E. Systematic Review and Meta-Analysis of Validation Studies on a Diabetes Case Definition from Health Administrative Records: e75256. PLoS ONE. 2013;8(10). http://dx.doi.org/10.1371/journal.pone.0075256. pmid:1500780850.
- 74. van Mourik MSM, van Duijn PJ, Moons KGM, Bonten MJM, Lee GM. Accuracy of administrative data for surveillance of healthcare-associated infections: a systematic review. BMJ open. 2015;5(8):1. pmid:26316651.
- 75. Dd Andrade, Elphinston RA, Quinn C, Allan J, Hides L. The effectiveness of residential treatment services for individuals with substance use disorders: A systematic review. Drug and Alcohol Dependence. 2019;201:227. pmid:31254749.
- 76. da Silva ME, Coeli CM, Ventura M, Palacios M, Magnanini MM, Camargo TM, et al. Informed consent for record linkage: a systematic review. J Med Ethics. 2012;38(10):639–42. pmid:22403083.
- 77. Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Systematic Reviews. 2015;4. pmid:25554246.
- 78. Kmet LM, Cook LS, Lee RC. Standard Quality Assessment Criteria for Evaluating Primary Research Papers from a Variety of Fields. Edmonton: Alberta Heritage Foundation for Medical Research (AHFMR). 2004;AHFMR—HTA Initiative #13.
- 79. Benchimol EI, Smeeth L, Guttmann A, Harron K, Moher D, Petersen I, et al. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLoS Med. 2015;12(10):1. http://dx.doi.org/10.1371/journal.pmed.1001885. pmid:1720449896.
- 80. Kisely S, Mills R, Najman J. The influence of child maltreatment on nicotine and alcohol use disorders in young adulthood: A birth cohort study. Aust N Z J Psychiatry. 2019;53 (Supplement 1):102. http://dx.doi.org/10.1177/0004867419836919. pmid:627698563.
- 81. Strathearn LMF, Mamun AAP, Najman JMP, O’Callaghan MJMF. Does Breastfeeding Protect Against Substantiated Child Abuse and Neglect? A 15-Year Cohort Study. Pediatrics. 2009;123(2):483. pmid:19171613.
- 82. Mills R, Kisely S, Alati R, Strathearn L, Najman JM. Cognitive and educational outcomes of maltreated and non-maltreated youth: A birth cohort study. The Australian and New Zealand journal of psychiatry. 2019;53(3):248–55. pmid:29696988.
- 83. Maret-Ouda J, Tao W, Wahlin K, Lagergren J. Nordic registry-based cohort studies: Possibilities and pitfalls when combining Nordic registry data. Scandinavian Journal of Public Health. 2017;45(17_suppl):14–9. pmid:28683665
- 84. Sayers A, Ben-Shlomo Y, Blom AW, Steele F. Probabilistic record linkage. International journal of epidemiology. 2016;45(3):954–64. pmid:26686842.
- 85. Raghavan R, Brown DS, Allaire BT. Can Medicaid Claims Validly Ascertain Foster Care Status? Child Maltreatment. 2017;22(3):227–35. pmid:28587521.
- 86. Parrish JW, Young MB, Perham-Hester KA, Gessner BD. Identifying risk factors for child maltreatment in Alaska: a population-based approach. American Journal of Preventive Medicine. 2011;40(6):666–73. pmid:21565660.
- 87. Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test (Madrid, Spain). 2009;18(1):1–43. pmid:1835472332.
Liu X. Chapter 14—Methods for handling missing data. In: Liu X, editor. Methods and Applications of Longitudinal Data Analysis. Oxford: Academic Press; 2016. p. 441–73.
- 89. Hansson Å, Gustafsson J-E, Nielsen B. Special needs education and school mobility: School outcomes for children placed and not placed in out-of-home care. Children & Youth Services Review. 2018;94:589–97. 10.1016/j.childyouth.2018.08.039. pmid:132804404. Language: English. Entry Date: 20181108. Revision Date: 20181108. Publication Type: Article. Journal Subset: Biomedical.
- 90. Hansson Å, Gustafsson J-E. School Mobility and Achievement for Children Placed and Not Placed in Out-of-home Care. Scandinavian Journal of Educational Research. 2020;64(2):167–80.
- 91. Olsen RF, de Montgomery CJ. Revisiting out-of-home placed children’s poor educational outcomes—Is school change part of the explanation? Children & Youth Services Review. 2018;88:103–13. pmid:129252766. Language: English. Entry Date: 20180430. Revision Date: 20180430. Publication Type: Article.
- 92. Austin AE, Gottfredson NC, Zolotor AJ, Halpern CT, Marshall SW, Naumann RB, et al. Trajectories of child protective services contact among Alaska Native/American Indian and non-Native children. Child Abuse Negl. 2019;95:1. pmid:31254951.
- 93. Sidebotham P, Heron J. Child maltreatment in the "children of the nineties": A cohort study of risk factors. Child Abuse Negl. 2006;30(5):497–522. pmid:16701895.
- 94. Parrish JW, Lanier P, Newby-Kew A, Arvidson J, Shanahan M. Maternal Intimate Partner Violence Victimization Before and During Pregnancy and Postbirth Child Welfare Contact: A Population-Based Assessment. Child Maltreatment. 2016;21(1):26–36. pmid:26627838.
- 95. Sidebotham P, Heron J, Golding J. Child maltreatment in the "Children of the Nineties:" deprivation, class, and social networks in a UK sample. Child Abuse Negl. 2002;26(12):1243–59. pmid:12464299.
- 96. Sidebotham P, Heron J. Child maltreatment in the "children of the nineties:" The role of the child. Child Abuse Negl. 2003;27(3):337–52. pmid:12654329.
- 97. Kisely S, Mills R, Strathearn L, Najman JM. Does child maltreatment predict alcohol use disorders in young adulthood? A cohort study of linked notifications and survey data. Addiction (Abingdon, England). 2020;115(1):61–8. pmid:31454119.
- 98. Mills R, Kisely S, Alati R, Strathearn L, Najman J. Self-reported and agency-notified child sexual abuse in a population-based birth cohort. Journal of psychiatric research. 2016;74:87–93. pmid:26774419.
- 99. Austin AE, Parrish JW, Shanahan ME. Using time-to-event analysis to identify preconception and prenatal predictors of child protective services contact. Child Abuse Negl. 2018;82:83–91. pmid:29870866.
- 100. Cameron CM, Osborne JM, Spinks AB, Davey TM, Sipe N, McClure RJ. Impact of participant attrition on child injury outcome estimates: a longitudinal birth cohort study in Australia. BMJ open. 2017;7(6):1. pmid:28667218.
- 101. Abajobir AA, Kisely S, Williams G, Strathearn L, Clavarino A, Najman JM. Does substantiated childhood maltreatment lead to poor quality of life in young adulthood? Evidence from an Australian birth cohort study. Qual Life Res. 2017;26(7):1697–702. pmid:28236264.
- 102. Mills R, Alati R, Strathearn L, Najman JM. Alcohol and tobacco use among maltreated and non-maltreated adolescents in a birth cohort. Addiction (Abingdon, England). 2014;109(4):672–80. http://dx.doi.org/10.1111/add.12447. pmid:1506419454.
- 103. Abajobir AA, Kisely S, Scott JG, Williams G, Clavarino A, Strathearn L, et al. Childhood Maltreatment and Young Adulthood Hallucinations, Delusional Experiences, and Psychosis: A Longitudinal Study. Schizophr Bull. 2017;43(5):1045–55. pmid:28338760.
- 104. Abajobir AA, Kisely S, Williams G, Clavarino A, Strathearn L, Najman JM. Gender-based differences in injecting drug use by young adults who experienced maltreatment in childhood: Findings from an Australian birth cohort study. Drug Alcohol Depend. 2017;173:163. pmid:28259090.
- 105. Haine D, Dohoo I, Dufour S. Selection and Misclassification Biases in Longitudinal Studies. Frontiers in veterinary science. 2018;5:99. pmid:29892604.
- 106. Henderson M, Page L. Appraising the evidence: what is selection bias? Evidence-based mental health. 2007;10(3):67–8. pmid:17652553.
- 107. Böhler T, Goldapp C, Mann R, Reinehr T, Bullinger M, Holl R, et al. Sensitivity analysis of weight reduction results of an observational cohort study in overweight and obese children and adolescents in Germany: the evakuj study. Pediatr Rep. 2013;5(3):1. pmid:24198928.
- 108. Raghavan R, Brown DS, Thompson H, Ettner SL, Clements LM, Key W. Medicaid Expenditures on Psychotropic Medications for Children in the Child Welfare System. J Child Adolesc Psychopharmacol. 2012;22(3):182–9. pmid:22537361.
- 109. Sidebotham P. Patterns of child abuse in early childhood, a cohort study of the "children of the nineties". Child Abuse Review. 2000;9(5):311–20. http://dx.doi.org/10.1002/1099-0852%28200009/10%299:5%3C311::AID-CAR627%3E3.0.CO;2-U. 2000-16750-001.
- 110. Kinner SA, Forsyth S, Williams G. Systematic review of record linkage studies of mortality in ex-prisoners: why (good) methods matter. Addiction. 2013;108(1):38–49. pmid:23163705.
- 111. Wilcox HC, Kharrazi H, Wilson RF, Musci RJ, Susukida R, Gharghabi F, et al. Data Linkage Strategies to Advance Youth Suicide Prevention: A Systematic Review for a National Institutes of Health Pathways to Prevention Workshop. Ann Intern Med. 2016;165(11):779–85. pmid:27699389.
- 112. Harron KL, Doidge JC, Knight HE, Gilbert RE, Goldstein H, Cromwell DA, et al. A guide to evaluating linkage quality for the analysis of linked data. International journal of epidemiology. 2017;46(5):1699–710. pmid:29025131.
- 113. Chipperfield J, Hansen N, Rossiter P. Estimating Precision and Recall for Deterministic and Probabilistic Record Linkage. International Statistical Review = Revue Internationale de Statistique. 2018;86(2):219–36. http://dx.doi.org/10.1111/insr.12246. pmid:2083689331.
- 114. Kang H. The prevention and handling of the missing data. Korean J Anesthesiol. 2013;64(5):402–6. Epub 2013/05/24. pmid:23741561.
Rothman S. Estimating Attrition Bias in the Year 9 Cohorts of the Longitudinal Surveys of Australia Youth. Longitudinal Surveys of Australian Youth. Technical Paper 48. Australian Council for Educational Research, Available from: ACER Press. 347 Camberwell Road, Camberwell, Victoria 3124, Australia, 2009 Apr 2009. Report No.
- 116. Gemici S, Bednarz A, Lim P. A primer for handling missing values in the analysis of education and training data. International Journal of Training Research. 2012;10(3):233–50. pmid:1266350797.
- 117. Miocevic M, MacKinnon DP, Levy R. Power in Bayesian Mediation Analysis for Small Sample Research. Structural Equation Modeling. 2017;24(5):666–83. pmid:29662296.
- 118. Cole DA, Maxwell SE. Testing mediational models with longitudinal data: Questions and tips in the use of structural equation modeling. J Abnorm Psychol. 2003;112(4):558–77. pmid:14674869.
- 119. Verdam MGE, Oort FJ, Sprangers MAG. Item bias detection in the Hospital Anxiety and Depression Scale using structural equation modeling: comparison with other item bias detection methods. Quality of life research: an international journal of quality of life aspects of treatment, care and rehabilitation. 2017;26(6):1439–50. pmid:27943018.
- 120. Jamsen KM, Ilomäki J, Hilmer SN, Jokanovic N, Tan ECK, Bell JS. A systematic review of the statistical methods in prospective cohort studies investigating the effect of medications on cognition in older people. Research in social & administrative pharmacy: RSAP. 2016;12(1):20–8. pmid:26003045.
- 121. Cameranesi M, Lix LM, Piotrowski CC. Linking a History of Childhood Abuse to Adult Health among Canadians: A Structural Equation Modelling Analysis. Int J Environ Res Public Health. 2019;16(11). pmid:31159325.
- 122. Herrenkohl TI, Jung H, Lee JO, Moo-Hyun K. Effects of Child Maltreatment, Cumulative Victimization Experiences, and Proximal Life Stress on Adult Crime and Antisocial Behavior. 2017. p. 21.
- 123. Kupek E. Beyond logistic regression: structural equations modelling for binary variables and its application to investigating unobserved confounders. BMC medical research methodology. 2006;6:13. pmid:16539711.
- 124. Lang AJP, Aarons GAP, Gearity JMSW, Laffaye CBA, Satz LRN, Dresselhaus TRMDMPH, et al. Direct and Indirect Links Between Childhood Maltreatment, Posttraumatic Stress Disorder, and Women’s Health. Behavioral Medicine. 2008;33(4):125–35. pmid:18316270.
- 125. Enders CK, Fairchild AJ, MacKinnon DP. A Bayesian Approach for Estimating Mediation Effects With Missing Data. Multivariate Behavioral Research. 2013;48(3):340. pmid:24039298.
- 126. Wang L, Preacher KJ. Moderated Mediation Analysis Using Bayesian Methods. Structural Equation Modeling. 2015;22(2):249. pmid:1672167747.
- 127. McNeish D. On Using Bayesian Methods to Address Small Sample Problems. Structural Equation Modeling: A Multidisciplinary Journal. 2016;23(5):750–73.
Network PHR. LINKAGE QUALITY IN THE PHRN: A report on how high linkage quality is achieved and maintained in the Population Health Research Network. PHRN Centre for Data Linkage Curtin University: 2017.
Harron K, Goldstein H, Dibben C. Methodological Developments in Data Linkage. New York, UNITED KINGDOM: John Wiley & Sons, Incorporated; 2015.
- 130. Smith M, Lix LM, Azimaee M, Enns JE, Orr J, Hong S, et al. Assessing the quality of administrative data for research: a framework from the Manitoba Centre for Health Policy. Journal of the American Medical Informatics Association: JAMIA. 2018;25(3):224–9. pmid:29025002.
- 131. Laitila T, Wallgren A, Britt , Wallgren B. Quality Assessment of Administrative Data. 2011.
- 132. Chia-Chien H, Sandford BA. The Delphi Technique: Making Sense of Consensus. Practical Assessment, Research & Evaluation. 2007;12:10. pmid:2366822829.
- 133. Grisham T. The Delphi technique: a method for testing complex and multifaceted topics. International Journal of Managing Projects in Business. 2009;2(1):112–30. http://dx.doi.org/10.1108/17538370910930545. pmid:232630764.
- 134. Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. International journal of surgery (London, England). 2014;12(12):1500–24. pmid:25046751.
- 135. Lee KJ, Roberts G, Doyle LW, Anderson PJ, Carlin JB. Multiple imputation for missing data in a longitudinal cohort study: a tutorial based on a detailed case study involving imputation of missing outcome data. International Journal of Social Research Methodology. 2016;19(5):575–91. http://dx.doi.org/10.1080/13645579.2015.1126486. pmid:1805051217.
- 136. Meuleman B. Multilevel Structural Equation Modeling for Cross-National Comparative Research. Kölner Zeitschrift für Soziologie und Sozialpsychologie. 2019;71(1):129–55. http://dx.doi.org/10.1007/s11577-019-00605-x. pmid:2258227653.