Research using population-based administration data integrated with longitudinal data in child protection settings: A systematic review

Introduction Over the past decade there has been a marked growth in the use of linked population administrative data for child protection research. This is the first systematic review of studies to report on research design and statistical methods used where population-based administrative data is integrated with longitudinal data in child protection settings. Methods The systematic review was conducted according to Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) statement. The electronic databases Medline (Ovid), PsycINFO, Embase, ERIC, and CINAHL were systematically searched in November 2019 to identify all the relevant studies. The protocol for this review was registered and published with Open Science Framework (Registration DOI: 10.17605/OSF.IO/96PX8) Results The review identified 30 studies reporting on child maltreatment, mental health, drug and alcohol abuse and education. The quality of almost all studies was strong, however the studies rated poorly on the reporting of data linkage methods. The statistical analysis methods described failed to take into account mediating factors which may have an indirect effect on the outcomes of interest and there was lack of utilisation of multi-level analysis. Conclusion We recommend reporting of data linkage processes through following recommended and standardised data linkage processes, which can be achieved through greater co-ordination among data providers and researchers.


Introduction
Over the past decade, there has been a marked growth in the use of linked administrative data for child protection research to understand longitudinally, factors associated with outcomes in adulthood. Linked data allows researchers to analyse sensitive topics without directly asking study participants; however, administrative data is not collected for research purposes. Utilising administrative data linked to longitudinal surveys provides rich data allowing for more in-depth analysis and reduction in biases which may be present in both data sources. There is no systematic review that has been conducted to investigate the reporting of outcomes in children who have child protection involvement from studies where administrative dataset has been linked to longitudinal surveys.

Objectives
The primary objectives of the systematic review are: 1. To synthesise and describe the different research designs reported when integrating administrative data to longitudinal surveys in out-of-home care or child protection settings 2. To describe and assess the statistical methods used when retrospective administrative data is linked to prospective longitudinal surveys.
We will discuss the suitability of the methods identified in the included studies. The aim is not to document methods of the data linkage process, but rather provide evidence on the different ways in which linked data can be used to enhance survey data thereby minimising risk of bias and other limitations reported.
The findings from this review will enhance our understanding, applications and enhanced use of data from linking multiple administrative databases and self-ported survey measures to understand various measures and associations in multiple domains. This systematic review is an essential step towards informing policy, practice and future research directions in methodological aspects of using linked data and survey data.
Although research on linked administrative data and longitudinal surveys in child protection/ out-of-home care settings is abundant, to the best of our knowledge, no systematic reviews have documented integration of administrative data and prospective longitudinal surveys. In addition, no reviews have reported on the statistical methods used for integration and reporting on outcomes associated with combining the two data sources.

Methods
The methodology and reporting on this systematic review complies with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement and checklist. This is designed to guide researchers in evidence based and transparent reporting of systematic reviews.

Eligibility Criteria
To be included in this review, peer reviewed journal articles needed to have at least one administrative databases linked to longitudinal survey. The studies would be limited to those where contact with child protection services or out-of-home care settings is involved. They will also be limited to studies in the English-language only, and no restriction on the publication dates. In addition, since the review is focussed on methodology there will not be any reporting of specific interventions, comparison groups or outcomes. Articles involving systematic reviews will be excluded. In addition, anecdotes, reviews, book chapters, letters to the editor, editorials and conference abstracts will be excluded. Articles would have to meet all eligibility criteria to be included. See below table of inclusion and exclusion criteria: longitudinal data  Non-peer reviewed articles: (Anecdotes, reviews, book chapters, letters to the editor, editorials and conference proceedings  Unpublished, not available online  Articles published in any language other than English  Findings will not be included if only an abstract could be found.  Reviews including systematic reviews are excluded.

Information sources
Research articles were searched from Medline (Ovid), Embase, ERIC, CINAHL and Pscyinfo databases. In addition, a search was conducted on some websites which provide a publication repository where data linkage units may list peer-reviewed publications, such as the Population Health Research Network (PHRN). The reference list of included studies was manually scrutinised to find any relevant studies. This provided an invaluable source of articles for inclusion. Searches were conducted using free-text in all databases.

Search Strategy
In line with the objectives of this review, three strings were used to search for articles in the relevant journals, and these include the following: 1. Data Source 2. Longitudinal Survey/ Study 3. "Out-of home care" or "child protection" A full search strategy for all databases is attached.

Study Selection
Two review authors independently conducted screening of titles and abstracts of the retrieved studies to identify the candidate studies for the full text review. The second reviewer reviewed a randomly selected 40% of the abstracts to ensure accuracy in study selection for the review. The reviewers graded each article as eligible/ not eligible/ might be eligible (using the inclusion and exclusion criteria defined above). There were no disagreements on the eligibility for the inclusion of particular studies. Both reviewers were involved independently in full text review, data retrieving and quality assessment of the included studies.

Data collection process
Using a standardised form, the two reviewers will extract the data independently. A third reviewer will independently check the data for consistency and clarity.

Data Items
Data extracted will include the following broad summary measures: study/ population characteristics, population/ sample size, measurement scales, data collection tools, validation methods and analysis method. Specific variables that will be extracted include the following:

Study/ Population characteristics:
Citation, year of publication, aims/ objectives, research area, study location, out-of-home care or child protection setting Administrative Data: Source & name of administrative data, Sample/ population size of data, number of administrative datasets, linkage type and linkage quality Longitudinal Survey data: Name of study, study duration, sampling method, study population (Age at baseline, gender, cohort size at baseline. Number of waves reported and sample sizes for each, attrition rate, standardised and non-standardised outcome measures, timeframe between waves, biases reported, and sensitivity analysis Statistical Methods: Domain of statistical analysis, analysis procedure, statistical parameters of reporting, tests for assumptions, confounding variables, mediating/ moderating variables, outcome variables, independent variables.

Methodological Quality and risk of bias
Since there is no standard criteria for assessing the quality of this unique integration of population-based administrative data and longitudinal studies, we used a combination of methods for assessing the quality of studies. A combination of the terms from the RECORD checklist, the Guild Guidance and the Kmet checklist were used. Kmet checklist has 14 items which use a 3-point ordinal scale of (0 = no, 1 = partial, 2 = yes), giving a structure and quantifiable means for assessing the quality of studies of a variety of research designs. The checklist items assess the sampling strategy, participant characteristics described, sample size calculations, sample size collection, description and justification of analytic methods, result reporting, controls for confounding variables, and whether conclusions drawn reflect results reported. Interrater reliability between the two independent assessors will be established for both the abstract selection and Kmet ratings of each included study. The following convention will be used for the classification of methodological quality: a score of >80% is considered strong quality, a score of 70±79% is good quality, 50 ±69% is considered fair quality and <50% is considered to have poor methodological quality. The Guild and Record items will assess method of data linkage, if linkage quality and accuracy is reported and representativeness of linked data.

Data Synthesis
It is anticipated that the included studies will be heterogeneous in terms of study designs and quality of the studies. Therefore, narrative synthesis of the findings of the included studies will be an appropriate strategy. The synthesis will be structured to report the findings of each study on the data abstraction items defined above. If there are a number of studies that are homogeneous in terms of study design, quality, and outcome measures, and the number is substantial enough to conduct meta-analysis, the team will pool the results using the randomeffects meta-analysis, with standardised mean differences for continuous outcomes and risk ratios for binary outcomes, and calculate 95% confidence intervals and two sided P values for each outcome.