Evaluation of the Growth Assessment Protocol (GAP) for antenatal detection of small for gestational age: The DESiGN cluster randomised trial

Background Antenatal detection and management of small for gestational age (SGA) is a strategy to reduce stillbirth. Large observational studies provide conflicting results on the effect of the Growth Assessment Protocol (GAP) in relation to detection of SGA and reduction of stillbirth; to the best of our knowledge, there are no reported randomised control trials. Our aim was to determine if GAP improves antenatal detection of SGA compared to standard care. Methods and findings This was a pragmatic, superiority, 2-arm, parallel group, open, cluster randomised control trial. Maternity units in England were eligible to participate in the study, except if they had already implemented GAP. All women who gave birth in participating clusters (maternity units) during the year prior to randomisation and during the trial (November 2016 to February 2019) were included. Multiple pregnancies, fetal abnormalities or births before 24+1 weeks were excluded. Clusters were randomised to immediate implementation of GAP, an antenatal care package aimed at improving detection of SGA as a means to reduce the rate of stillbirth, or to standard care. Randomisation by random permutation was stratified by time of study inclusion and cluster size. Data were obtained from hospital electronic records for 12 months prerandomisation, the washout period (interval between randomisation and data collection of outcomes), and the outcome period (last 6 months of the study). The primary outcome was ultrasound detection of SGA (estimated fetal weight <10th centile using customised centiles (intervention) or Hadlock centiles (standard care)) confirmed at birth (birthweight <10th centile by both customised and population centiles). Secondary outcomes were maternal and neonatal outcomes, including induction of labour, gestational age at delivery, mode of birth, neonatal morbidity, and stillbirth/perinatal mortality. A 2-stage cluster–summary statistical approach calculated the absolute difference (intervention minus standard care arm) adjusted using the prerandomisation estimate, maternal age, ethnicity, parity, and randomisation strata. Intervention arm clusters that made no attempt to implement GAP were excluded in modified intention to treat (mITT) analysis; full ITT was also reported. Process evaluation assessed implementation fidelity, reach, dose, acceptability, and feasibility. Seven clusters were randomised to GAP and 6 to standard care. Following exclusions, there were 11,096 births exposed to the intervention (5 clusters) and 13,810 exposed to standard care (6 clusters) during the outcome period (mITT analysis). Age, height, and weight were broadly similar between arms, but there were fewer women: of white ethnicity (56.2% versus 62.7%), and in the least deprived quintile of the Index of Multiple Deprivation (7.5% versus 16.5%) in the intervention arm during the outcome period. Antenatal detection of SGA was 25.9% in the intervention and 27.7% in the standard care arm (adjusted difference 2.2%, 95% confidence interval (CI) −6.4% to 10.7%; p = 0.62). Findings were consistent in full ITT analysis. Fidelity and dose of GAP implementation were variable, while a high proportion (88.7%) of women were reached. Use of routinely collected data is both a strength (cost-efficient) and a limitation (occurrence of missing data); the modest number of clusters limits our ability to study small effect sizes. Conclusions In this study, we observed no effect of GAP on antenatal detection of SGA compared to standard care. Given variable implementation observed, future studies should incorporate standardised implementation outcomes such as those reported here to determine generalisability of our findings. Trial registration This trial is registered with the ISRCTN registry, ISRCTN67698474.

Why was this study done?
• Antenatal detection and appropriate management of small for gestational age (SGA) infants is a recognised strategy to prevent stillbirth; previous reports have suggested the rate of stillbirth is halved when SGA is antenatally detected, compared to undetected SGA.
• Large observational studies provide conflicting results on the effect of Growth Assessment Protocol (GAP), an antenatal care package, with both findings of increased and no difference in detection of SGA and reduction of stillbirth.
• The observational nature of all previous studies about GAP limits the assessment of causality in any observed associations.

What did the researchers do and find?
• To the best of our knowledge, this is the first randomised control trial of GAP, comparing 11,096 births exposed to the intervention (5 clusters) to 13,810 exposed to standard care (6 clusters) during the outcome period.
• The lack of effect should be interpreted in the context of the variable implementation of GAP.

What do these findings mean?
• This randomised control trial of GAP compared to standard care did not observe improvement in ultrasound detection of SGA; variable implementation of GAP was observed consistent with previous studies.

Introduction
In 2014, the World Health Organization (WHO) launched the Every Newborn Action Plan with the aim to end preventable perinatal deaths by 2030; reducing stillbirth is thus a global priority [1]. While national strategies to tackle stillbirth vary according to leading causes locally, the importance of risk stratification and screening strategies that target improved detection of small for gestational age (SGA) (birthweight <10th centile) and appropriate management and timely delivery has been emphasised for high-income countries [2,3]. Antenatal detection of SGA has been associated with a halved risk of stillbirth compared to undetected SGA [4,5]. A review of guidelines from 6 high-income countries described a consensus on recommendations for stratifying women by risk of SGA, but noted variation in other aspects of screening and management, such as the use of customised fetal charts to identify SGA and the role of universal third trimester ultrasound [6]. The Growth Assessment Protocol (GAP), developed by the Perinatal Institute [7], is a complex intervention that includes the use of customised centile charts for fundal height and estimated fetal weight (EFW) measurements (Gestation-Related Optimal Weight (GROW) charts), evidence-based protocols and risk assessment, training and accreditation of clinical staff, a rolling audit programme and benchmarking of performance [8]. A nonrandomised control trial in the United Kingdom (UK) of standardised fundal height measurements plotted on customised charts demonstrated an increase in antenatal detection of SGA (29% versus 48%, odds ratio 2.2; 95% confidence interval (CI) 1.1 to 4.5) [9]. A recent study in New Zealand reported an almost 3-fold increase in detection of SGA (22.9% versus 57.9%; p < 0.001) when comparing rates before and after implementation of GAP [10]. In the UK, national uptake of GROW charts or GAP increased between 2007 and 2012 with a concomitant 22% reduction of stillbirth rates in regions of high uptake [11]. However, a study comparing the trend of stillbirth rates during 2010 to 2015 in England and Wales to that in Scotland where uptake of GAP was very low reported a greater decline in Scotland [12]. The authors concluded that any association between GAP and reductions in stillbirth rates was coincidental rather than causal. To our knowledge, there has been no randomisedAU : PleasenotethatRCTinthesente control trial studying the impact of GAP versus standard care on detection of SGA. There is also paucity of data on the impact of GAP on service usage (e.g., number of ultrasound scans and induction of labour) and on unwanted potential effects, such as a possible increase in neonatal adverse outcomes related to iatrogenic late preterm/early term birth.
The primary aim of the DESiGN trial (DEtection of Small for GestatioNal age fetus) was to determine whether implementation of GAP results in improved ultrasound detection of SGA, when compared to standard care. We also planned to explore the effect on related maternal and neonatal outcomes and to conduct a process evaluation of fidelity, reach, dose, acceptability, feasibility, and resource use.

Study design and population
The DESiGN trial was a pragmatic, superiority, 2-arm, parallel group, open, cluster randomised control trial, including 13 maternity units in England [13]. All women who gave birth in participating clusters (maternity units) during the trial (between November 2016 and February 2019) were included. Baseline data were also collected on women who gave birth during the year prior to cluster randomisation. Pregnancies with significant fetal abnormalities, multiple pregnancies, and pregnancies ending before 24 +1 weeks of gestation (referred to as weeks in the paper) were excluded. The study design and methodology of this trial have been prospectively registered (ISRCTN67698474), and both the study protocol (S1 Protocol) and the prespecified analysis plan (S1 Appendix) have been approved by the Trial Steering Committee.
We enrolled maternity units primarily in London given the lower uptake of GAP in this area at the time the trial was proposed compared to the whole of the UK, where uptake was 64% [14]. A cluster trial was undertaken because the intervention requires implementation of site-wide guidelines for screening and management of SGA and additional staff training. Within-site contamination would limit the validity of individual randomisation. The trial was pragmatic to capture the reality of the introduction of this complex intervention into clinical practice with support from the Perinatal Institute.

Randomisation and masking
Clusters were randomly allocated by the trial statistician to immediate implementation of GAP (intervention arm) or to continue standard care during the study period (standard care arm). Randomisation occurred in 3 strata according to time of inclusion in the study (8,3, and 2 clusters, respectively); the randomisation of the first 8 clusters were further stratified by size of maternity unit (number of births during the year 2013 to 2014). Randomisation was by random permutation within strata, providing exact 1:1 allocation except in the second stratum of 3 clusters where it was determined at random which arm would receive 2 clusters. The random permutation was conducted in Stata v14 (StataCorp LP, College Station, Texas, USA). Due to the nature of the intervention, concealment was not possible.

Procedures
Data were collected from a prerandomisation period of 12 consecutive months, which differed by randomisation stratum, the washout period (variable duration) during which the intervention arm clusters were implementing GAP, and for an outcome comparison period (outcome period) of 4 to 6 months from 1 September 2018 to 28 February 2019. The outcome period commenced when women giving birth in intervention clusters had had time to receive full antenatal exposure to GAP. One cluster from the control arm provided outcome data earlier due to a previously planned introduction of GAP at the original trial end date. This was a consequence of the washout period being extended after delays in GAP implementation at the last cluster randomised to the intervention.
Data were obtained from 4 types of routinely collected electronic patient record system at each cluster: maternity, ultrasound, neonatal, and administrative [15]. Additional data were collected to assess compliance with the intervention in allocated clusters from review of a subset of women's paper maternity records (n = 120 per cluster). Data were anonymised locally by the trial team before being sent centrally for data management, storage, and analysis.
Following randomisation, maternity units allocated to the intervention were expected to contact the providers of GAP to commence training and implementation support. The components of GAP implementation are detailed in Table 1, by stage of implementation. Following consultation with cluster sites, the e-learning training requirement was amended by the Perinatal Institute to allow compliance with e-learning certification to be achieved within 3 months of going "live." The prespecified requirements that describe how an implementing cluster would be considered as GAP compliant are further detailed in the study protocol (S1 Protocol; page 74). These were GAP recommendations during this trial; there were changes introduced subsequent to this study [16].
In the standard care arm, women received routine antenatal care as per the local guidelines for screening and management of SGA in each cluster. There was no prespecification of policies in this arm, except that these clusters should not implement GAP or use customised centiles for fundal height or ultrasound monitoring of fetal growth. At the time this trial started, standard care for screening and management of SGA was guided by an RCOG guideline [17]. This recommends stratification of pregnant women by presence of risk factors for SGA. Women at low risk of SGA are further screened using measurement of fundal height at each antenatal appointment after 24 weeks. Women with risk factors are either offered serial fetal growth ultrasound scans or further stratification using doppler assessment of the uterine arteries at 20 weeks of gestation, dependent on the number or significance of the risk factors present. RCOG does not guide frequency of serial growth scans. Following a request from reviewers, a summary description of recommended practice in standard care clusters is provided on S2 Appendix (page 2) based on review of local guidelines for screening and detection of SGA. The Saving Babies' Lives care bundle is a complex antenatal intervention that started to be implemented nationally during the trial. Clusters in the standard care arm were exempted from compliance with element 2 (risk assessment and surveillance of fetal growth restriction) of the Saving Babies' Lives bundle. However, it was considered unethical to stop clusters in the standard care arm that were willing to implement concomitant strategies for improved detection of SGA and prevention of stillbirths initiated locally or nationally, which could include the Saving Babies' Lives care bundle [18].

Process evaluation of implementation
The process evaluation examined implementation compliance, acceptability, feasibility, contextual factors, and mechanisms of impact. To assess compliance with the intervention in implementing sites, we assessed fidelity, reach, and dose [19], by comparing site guidelines to those recommended by GAP, assessing compliance with training targets and by a review of 600 women's maternity records (40 randomly selected singleton nonanomalous births in each of 3 months during the outcome period at 5 implementing clusters). Acceptability and feasibility of GAP implementation were explored through interviews with clinicians including clinical

Implementation Stage GAP requirements
Preparation and planning • Nominated staff from each cluster to attend "Train the Trainers" GAP workshop. • Cluster to conduct a baseline audit of SGA detection (10% of annual births). • Cluster to prepare local guideline for the "Assessment of Fetal Growth" modelled on GAP recommendations.

Implementation
• Cluster trainers to cascade face-to-face training to 75% of colleagues from each professional group (midwives, obstetricians, sonographers). • GAP e-learning module to also be completed by 75% staff members from each professional group.
Ongoing use of GAP • Access to GROW chart online programme provided by the Perinatal Institute after cluster compliant with above requirements.
• Each pregnant woman assessed for risk of SGA at antenatal booking appointment using GAP tool.
• Customised GROW chart printed for each pregnant woman at antenatal booking appointment and used to assess fetal growth by plotting fundal height measurements or estimated fetal weight on the chart. • Women at low risk of SGA expected to have a fundal height measured 3-weekly during pregnancy, commencing between 26 and 28 weeks.
If plots deviate from what is expected (first plot below 10th centile, slow/static/accelerative growth), the woman should be referred for a fetal growth scan. • Women at high risk of SGA expected to have an ultrasound scan to estimate fetal weight 3-weekly during pregnancy, commencing between 26 and 28 weeks. • Where GROW chart EFW plots deviate from the expected trajectory (as per fundal height deviations), RCOG protocols should be followed for further investigation of suspected SGA [17]. • Birthweight centiles are calculated at the time of birth using the GROW software. This also prompts the clinician to enter whether SGA was detected antenatally, to inform auditing of practice and national benchmarking. • GAP users are encouraged to use the GAP online proforma to conduct analyses of 'missed cases' in which SGA was not detected antenatally.

PLOS MEDICINE
Evaluation of the Growth Assessment Protocol (GAP) for antenatal detection of small for gestational age leads. A summary of implementation is provided in this report to support interpretation of the main findings (methodology provided in S2 Appendix; page 3). We also collected guideline on screening for SGA from clusters in the standard care arm. A more detailed process evaluation analysis will be reported separately.

Outcomes
The primary outcome of this study was antenatal ultrasound detection of SGA (after 24 completed weeks) defined for infants who are SGA (i.e., birthweight less than 10th centile) according to both population (UK1990 birthweight centiles) and customised (GROW) charts [20,21]. This definition was chosen because GAP targets detection of babies who are SGA by customised centiles, whereas standard care largely uses population centile charts. Antenatal detection of SGA was defined as ultrasound-derived EFW <10th centile by customised (GROW) charts in the intervention arm during the outcome period and by population [22] fetal charts for babies born in intervention sites during the prerandomisation period and all babies born in the standard care arm [20][21][22]. For calculation of ultrasound detection of SGA, data were obtained from electronic ultrasound records to identify EFW <10th centile and from electronic maternity records to identify birthweight <10th centile; these were calculated for all births in each cluster. A detailed description of methodology for calculating the rate of antenatal detection of SGA is provided in S2 Appendix (page 4). The 26 planned secondary outcomes included the test positive rate for antenatal detection of SGA (defined by both definitions as per primary outcome), antenatal detection and false positive rate of antenatal ultrasound detection of SGA confirmed at birth by customised centiles and by population centiles, maternal outcomes (induction of labour, mode of birth, postpartum haemorrhage, severe perineal tear (third/fourth degree), epidural and episiotomy), neonatal parameters and measures of condition at birth (gestational age at birth, preterm birth, birthweight, Apgar score <7 at 5 minutes, arterial cord pH <7.1, respiratory support at birth), neonatal unit admission, major neonatal morbidity (defined as one or more of: receipt of supplemental oxygen at 28 days of age, necrotising enterocolitis, sepsis, neonatal retinopathy, hypoxic-ischemic encephalopathy, intraventricular haemorrhage), minor neonatal morbidity (defined as one or more of hypothermia, hypoglycaemia, nasogastric tube feeding), stillbirth, neonatal death, and perinatal mortality. Utilisation of ultrasound scan was a process outcome (proportion of pregnancies with a scan, proportion of pregnancies with a scan between 18 +0 and 24 +0 weeks, proportion of pregnancies with a scan after 24 +0 weeks with EFW, number of scans per pregnancy after 24 +0 weeks with EFW, proportion of pregnancies with no record of ultrasound). Timing of scans after 24 weeks (i.e., utilisation per week gestation) was described following a request from reviewers and the academic editor, with the aim of better understanding differences in practice between trial arms. These process measures were reported to provide context to results.

Statistical analysis
Data management was performed to harmonise and amalgamate datasets from all clusters. This process has previously been described in detail and published [15]. The approach for multiple imputation of missing data is summarised in the S2 Appendix (page 5).
Characteristics of the individual participants in the prerandomisation and trial outcome period were summarised for each trial arm using means and standard deviations, medians and interquartile ranges or frequencies and percentages, as appropriate. These results are reported using imputed data, where available; results from available case analyses are provided in the Supporting information.
Main analyses. The primary analysis was performed using a modified intention to treat (mITT) approach. This involved excluding any cluster in the intervention arm that did not contact the GAP provider to initiate implementation of the intervention due to changes in local strategy, since such changes are not considered informative of how GAP would have performed in the cluster. Due to the modest number of clusters, the analysis was performed using an unweighted 2-stage cluster-summary statistical approach [23]; detailed description provided in S2 Appendix (page 6). Intervention effects (absolute difference of intervention minus standard care arm) are presented with 95% CIs. A sensitivity analysis was also performed at the request of reviewers, excluding 1 cluster without ultrasound measurement data for the baseline period, which are imputed in our main analysis (S2 Appendix; page 5).
Prespecified secondary, subgroup, and sensitivity analyses. A secondary analysis was planned using a per protocol approach restricting analysis of the intervention arm to clusters that complied with the GAP preimplementation requirements (S1 Protocol; page 74) in full. A further secondary analysis was a full intention to treat (ITT) analysis in which data from all clusters were used as randomised, irrespective of whether or not GAP was implemented. A prespecified subgroup analysis was planned to explore the effect of the intervention on 21 clinical and neonatal outcomes, only in SGA infants. A sensitivity analysis explored the intervention effect when restricted only to women who had an ultrasound scan between 18 +0 and 24 +0 weeks (presumed fetal anomaly scan) at the cluster where she later gave birth, reflecting antenatal care primarily within a single cluster and consistent exposure to the intervention from 24 weeks. A reviewer requested a further post hoc sensitivity analysis of the stillbirth outcome, concerned that our 2-stage analysis approach may be unsuitable for rare outcomes. After preferred 1-stage methods were found unfeasible or did not converge, we applied the standard logistic regression approach but with robust standard errors to acknowledge clustering (see S2 Appendix, page 6 for details). We use the standard 5% significance level for testing across our secondary outcomes and subgroup and sensitivity analyses. Due to multiple testing, significant results for secondary outcomes should be treated with caution.
These analyses were conducted following a prespecified analysis plan (S1 Appendix). All prespecified subgroup and sensitivity analyses were detailed in the trial protocol (S1 Protocol) and approved by the trial steering committee. This study has been reported as per the Consolidated Standards of Reporting Trials (CONSORT) statement (S1 CONSORT Checklist).
Sample size calculation. The power calculation for this study determined a minimum target sample size of 12 clusters (6 per arm) based on information collected during protocol development [13]. We were unable to identify reports of an intracluster correlation coefficient for detection of SGA; therefore, a coefficient of the most approximate outcome (rate of fetal growth restriction) was used (0.019) [24]. A cluster size that included an average of 126 SGA infants (defined by customised and population centile charts) with 6 clusters per arm provides 84% power to detect an improvement in the detection of SGA, assuming 20% are detected using standard care and 33% detected using GAP (doubling of odds ratio for detection) at the 5% significance level (2-sided test) [13]. We made no explicit allowance for the additional baseline data from each cluster, their inclusion is likely to increase power. Power calculations were performed using the user-written programme "clustersampsi" for Stata.

Protocol changes
The trial protocol was amended during the study period for logistical and methodological reasons, including changes to data flow and storage, and following a change to the trial sponsor in 2017. A further change occurred prior to the randomisation of recruited clusters, whereby the definition of the primary outcome was refined. The registration of this change was delayed until after randomisation because of the change in study sponsor. Nevertheless, the amendment was approved before any women included in the primary analysis had given birth. These and other minor study amendments are recorded in the current version of the study protocol (S1 Protocol). All amendments were approved by the Research Ethics Committee and participating sites' Research and Development departments. Approval was also sought from the trial steering committee, Confidentiality Advisory Group and funders, where appropriate. During data management and analysis, the definition of major neonatal morbidity changed in relation to the study protocol, as the data was insufficiently detailed to determine Bell stage of necrotising enterocolitis, culture status in sepsis, and need for ophthalmic intervention related to retinopathy.

Ethical approval
Ethical approval for this trial was obtained from the Health Research Authority (HRA) through the London Bloomsbury Research Ethics Committee (Ref. 15/LO/1632) and the Confidentiality Advisory Group (Ref. 15/CAG/0195). Individual informed consent was not obtained, but women could request to opt out from sharing their data. A key professional for each cluster provided written cluster consent prior to randomisation.

Patient and public involvement
Patient groups and stakeholders (representing both PPI and professional groups) were involved from the conceptualisation of this study. Patient groups were provided with a summary for the study and procedures in lay terms and asked their opinion about key points including the relevance of the study and the use of data without individual informed consent given the cluster intervention/design. Their feedback was used to inform the final study protocol and ethical application. Stakeholders such as Stillbirth Clinical Study Group from RCOG, SANDS Charity, and Tommy's Charity were also involved in the conceptualisation of this study. We have a patient representative in our coinvestigator group who has provided their perspective throughout the study, including in interpretation and explanation of results to a lay audience.

Results
Among the 16 sites that were invited to participate in the trial, 13 were willing and enrolled in the study (Fig 1). Seven clusters were allocated to the intervention and 6 to standard care. Two sites randomised to the intervention did not contact the GAP provider to initiate implementation. The median washout period was 17 months (range 11 to 18), this included a median 9 months (range 6 to 12 months) interval between antenatal booking of women (presumed to be at 12 weeks) with the opportunity of exposure to GAP until commencement of the outcome period. Among the 209,314 pregnancies during the study period in the 13 participating sites, 201,209 were included in the study. For the primary analysis (mITT), the outcome period included 13,810 pregnancies in the standard care arm (6 clusters) and 11,096 pregnancies in the intervention arm (5 clusters). No women asked for their data to be excluded from the study.
Demographic characteristics are provided in Table 2. In the prerandomisation period, age, height, and weight were broadly similar between trial arms, but there were fewer women: of white ethnicity (55.9% versus 62.8%), with obesity (15.7% versus 18.1%), and in the first (least deprived) quintile of Index of Multiple Deprivation (7.6% versus 17.4%) in the intervention arm than the standard care arm. Similar findings were observed in the outcome period. Demographic characteristics were also broadly similar using available case data (for variables that were imputed) and the ITT sample (13 clusters) (Tables A and B in S3 Appendix). A

PLOS MEDICINE
Evaluation of the Growth Assessment Protocol (GAP) for antenatal detection of small for gestational age description of the full list of ethnicities used for the customised centiles calculator is provided in Tables C and D in S3 Appendix. There were 4 tertiary level clusters in the trial; these were balanced by randomisation of 2 clusters to each of the 2 trial arms. The proportion of women with an EFW measured by ultrasound after 24 weeks was similar in the intervention and standard care arms during the outcome period (64.0% versus 75.7%; unadjusted difference −11.7%, 95% CI −31.0% to 7.6%; adjusted difference −10.0%, 95% CI −36.2% to 16.1%; adjusted for baseline, age, ethnicity, parity, and stratification factor). In the prerandomisation period, the respective rates were 62.0% versus 43.7% (Table 3). Timing of ultrasound scan after 24 weeks (i.e., utilisation per week of gestation) was broadly similar between trial arms in the outcome period (Fig 2). A common pattern of offering scans at 28, 32, and 36 weeks was observed in both standard care and intervention arms. In the prerandomisation period, a higher proportion of scans at 36 weeks was observed in the intervention arm compared to standard care; no clear difference was observed in other gestations.
The primary outcome of antenatal detection of SGA infants by both customised and population centiles was similar between trial arms (unadjusted difference intervention minus control 1.2%, 95% CI −7.5% to 9.8%; adjusted difference intervention minus control 2.2%, 95% CI −6.4% to 10.7%; adjusted for baseline, age, ethnicity, parity, and stratification factor), as was the test positive rate (unadjusted difference 0.9%, 95% CI −0.6% to 2.5%; adjusted difference 0.8%, 95% CI −0.8% to 2.3%; adjusted for baseline, age, ethnicity, parity, and stratification factor) ( Table 4). The association between antenatal detection of SGA at baseline and the comparison period across clusters is displayed in Fig J in S3 Appendix). Measures of diagnostic test performance (antenatal detection, false positive rate, positive predictive value, and negative predictive value) when SGA at birth is defined by customised centiles or by population centiles are provided in Table 4; there were no differences in antenatal detection between trial arms. There were also no differences in the rates of primary and secondary outcomes in most of the prespecified secondary and sensitivity analyses (Tables E, F, and G in S3 Appendix). In the full ITT analysis, the unadjusted difference (intervention minus control) for the primary outcome was −4.0% (95% CI −14.8% to 6.8%), and the adjusted difference was −3.5% (95% CI −14.0% to 7.0%; p = 0.52). There was no difference in the primary outcome in the sensitivity analysis excluding 1 cluster without ultrasound measurement for the prerandomisation period

PLOS MEDICINE
Evaluation of the Growth Assessment Protocol (GAP) for antenatal detection of small for gestational age (adjusted difference intervention minus control 2.4%, 95% CI −6.1% to 10.8%; p = 0.58); results were in keeping with the main analysis. All minimum requirements for GAP compliance prior to "going live" were met except the e-learning target, which was only met in 1 cluster; therefore, per protocol analysis could not be performed. The intracluster correlation coefficient observed in the outcome period for mITT analysis was 0.008 (95% CI 0.002 to 0.039).

PLOS MEDICINE
Evaluation of the Growth Assessment Protocol (GAP) for antenatal detection of small for gestational age There were 2 statistically significant differences among the 26 secondary outcomes explored. When compared to standard care, the intervention was associated with a lower rate of overall stillbirth (unadjusted difference −0.05%, 95% CI −0.21% to 0.11%; adjusted difference −0.07%, 95% CI −0.14% to −0.01%; i.e., 0.7 fewer stillbirths per 1,000 births; adjusted for baseline, age, ethnicity, parity, and stratification factor) and of perinatal mortality (unadjusted difference −0.05%, 95% CI −0.27% to 0.17%; adjusted difference −0.09%, 95% CI −0.17% to −0.004%; i.e., 0.9 fewer perinatal deaths per 1,000 births; adjusted for baseline, age, ethnicity, parity, and stratification factor) ( Table 5). The post hoc sensitivity analysis of stillbirth led to an unadjusted odds ratio (95% CI) for the intervention effect of 1.30 (95% CI 0.68 to 2.47), and adjusted odds ratio of 0.77 (95% CI 0.30 to 1.99); we do not attempt to reexpress this effect as a difference between arms as the methodology to do so with imputed data is not yet established.
In the subgroup analysis of outcomes for SGA infants (defined by both population and customised centiles; n = 1,802 pregnancies of which 31 were stillborn), SGA infants in the intervention arm were born 2 days earlier, had a lower mean birthweight, and lower rates of  stillbirth compared to SGA infants from standard care ( Table 6). There were no differences in other neonatal or maternal outcomes in the subgroup analysis, including preterm birth (<37 weeks; Table 6) and late preterm birth (34 +0 to 36 +6 weeks; post hoc analysis, 9.1% versus 8.4% for intervention and standard care arms, respectively; adjusted difference 0.3%, 95% CI −1.9% to 2.6%). The change in mean gestational age at birth reflects fewer SGA babies born at or after 39 weeks in the intervention arm compared to standard care arm (post hoc analysis, 56.3% versus 61.2%; adjusted difference −8.3%, 95% CI −14.9% to −1.7%). Clinical outcomes using available case data and for women with a scan recorded in the cluster between 18 +0 and 24 +0 weeks are reported in Tables H and I in S3 Appendix, respectively). Assessment of implementation (fidelity, dose, and reach) of GAP was performed at all implementing clusters. Implementing sites had guidelines in which concordance to the Perinatal Institute guidance ranged from high to low. All clusters achieved the face-to-face training target, but only 1 cluster achieved the e-learning target. Of the 595 women whose maternity records were reviewed, 84.9% were correctly risk stratified according to GAP guidelines (range between clusters 78.6% to 87.5%) and 88.7% had a GROW chart in their notes (range between clusters 62.2% to 98.3%). Intervention dosage varied; 30.7% (range between clusters 8.2% to 53.2%) of low-risk women had at least the minimum recommended number of fundal height measurements plotted on their GROW chart and 8.5% (range between clusters 0.0% to 16.7%) of women with risk factors for SGA had at least the minimum number of growth scans as recommended by GAP (Table 7). Detailed qualitative data with clinicians and other staff exploring implementation will be reported separately. In the standard care arm, there was wide variation in term of guidance for screening for SGA including variation in timing and interpretation of fundal height measurement, factors indicating high-risk status and number and frequency of ultrasound for high-risk women.

Discussion
The DESiGN trial has found that GAP was not superior to standard care for the antenatal detection of SGA, confirmed at birth by both population and customised centiles. All intervention clusters achieved the preimplementation requirements for access to GROW software, except for the e-learning target. In intervention clusters, GAP was implemented with varied

PLOS MEDICINE
Evaluation of the Growth Assessment Protocol (GAP) for antenatal detection of small for gestational age   levels of fidelity (high rates of face-to-face training, varied concordance of cluster site guidelines with GAP, high concordance with GAP risk stratification protocols), high levels of reach (majority of women had a GROW chart), but variable dose (low number of fundal height measurements plotted, number of growth scans below that which is recommended by GAP, high rates of referral for suspected SGA). Data are % (n/N) or median (IQR). GAP, Growth Assessment Protocol; GROW, Gestation-Related Optimal Weight chart; SGA, small for gestational age. � High fidelity (only occasional differences where GAP recommendations were partially included); medium fidelity (with partial or no inclusion of GAP recommendations in less than half of the recommendations); low fidelity (with partial or no inclusion of GAP recommendations throughout the guidelines, affecting over half of the recommendations). † Around 18/90 women who were not correctly risk stratified by GAP guidelines were correctly risk stratified according to local policy. ‡ Risk status is as classified by clinician at booking. § Approximately 11.2% (16/102) additional women did have a growth scan, but documented as another indication, e.g., reduced fetal movements.
To the best of our knowledge, the DESiGN trial is the first randomised control trial that compared the effect of GAP and standard care on the ultrasound-detection of SGA. The intervention was not superior to standard care when implemented in this study setting. It is important to note that at the time of the DESiGN trial, there was concurrent national implementation of the "Saving Babies" Lives' care bundle, which aimed to reduce rates of stillbirth through 4 components (smoking cessation, risk assessment for and surveillance of fetal growth restriction, raising awareness of reduced fetal movements, and effective fetal monitoring during labour) [18]; this has been shown to increase use of ultrasound and improve the detection of SGA [25]. The outcome period of this trial was in 2018/2019, at least 2 years after the implementation of the care bundle. While the NHS England and NHS Improvement (London) Clinical Leadership Group exempted the 5 London-based clusters in the standard care arm of this study from implementing the care bundle component related to fetal growth restriction during the study period, most units chose to implement at least some of the care bundle strategies. In previous observational studies reporting increased antenatal detection of SGA or reduced stillbirth following GAP implementation, preimplementation groups were not affected by this care bundle. This may explain some of the differences observed in antenatal detection of SGA between this and previous studies; the different study design between this randomised control trial and previous studies, which were all observational, may also explain the different results observed.
Our process evaluation highlights variation in implementation of GAP, which was also reported in the SPiRE Study [25], where 15 of 19 included maternity units had implemented GAP. The SPiRE study group found that most of the 15 local guidelines collected from GAPimplementing sites were only partially compliant with 4 out of 5 components that feature both in the fetal growth restriction element of the Saving Babies' Lives care bundle and in GAP guidelines [26]. We also observed partial concordance with GAP guidelines in this trial, demonstrated through variable implementation fidelity.
In England, multiparous women are routinely offered fewer antenatal appointments than required for compliance with GAP fundal height measurement frequency, this may partly explain why the number of fundal heights plotted is lower than that recommended by GAP (every 3 weeks). Implementation dose in terms of number of scans conducted for each woman at high risk of SGA was lower than that which is recommended by GAP (3 versus 4 scans for women with term birth). This may be explained by common practice in England whereby serial growth scans are offered at 28, 32, and 36 weeks, rather than 3-weekly. Indeed, post hoc exploration of implementation dose data has shown that 74% of high-risk women in the intervention arm of this study had 2 or more growth scans after 24 weeks, suggesting a less frequent surveillance programme than recommended by GAP. The exploratory analysis of timing of ultrasound utilisation requested by the reviewers/academic editor also supports this hypothesis and describe a similar surveillance pattern in the standard care arm. The costs related to GAP include both the annual charge from the Perinatal Institute to access the programme, training costs, and any potential increase in use of clinical resources; these need to be considered when evaluating utility of GAP. A detailed economic analysis will be reported separately.
We observed a lower rate of overall stillbirth and perinatal mortality, as well as SGA stillbirth in the intervention arm compared to standard care arm during the outcome period. The fact that this was not achieved though the expected pathway of improving detection of SGA at birth, our primary outcome, does raise the possibility of a chance finding, and the finding was not confirmed in the (albeit post hoc) sensitivity analysis. Although we are limited in our ability to ascertain the drivers of this potential effect, it is plausible that the lower proportion of births at or after 39 weeks observed among SGA babies in the intervention arm may have mediated this effect. There is conflicting evidence regarding the benefit of offering earlier iatrogenic birth to women with SGA fetuses as while it may prevent stillbirth/perinatal mortality [27], adversely, it may increase rates of short-term neonatal morbidity and poorer developmental outcomes in childhood [28,29]. Complex interventions such as GAP may have effects that do not necessarily lie on the expected pathway; however, we note the need to replicate these findings before they can be considered robust given the number of secondary outcomes in this study.
We have not performed statistical testing to assess for changes between prerandomisation and outcome period as per prespecified analysis plan; however, we did observe some differences. In particular, the use of ultrasound seems to have markedly increased during the study in standard care clusters, which likely relates to the rollout of the Saving Babies' Lives care bundle, at least in part. The SPiRe Study reported increased utilisation of ultrasound with implementation of the care bundle; the association was related to the overall care bundle and not to any specific component. Despite exempt from the fetal growth restriction component of the care bundle, clusters in this trial may have increased the utilisation of ultrasound by other related strategies such as the reduced fetal movements component.
The antenatal detection of neonates confirmed to be SGA at birth by customised centiles (secondary outcome) in this study was not higher in the intervention arm, which suggests the choice of growth chart may have limited influence in detection of SGA. Previous observational studies explored the value of customised centiles alone (not as part of GAP). We recognise that these studies have reported that population and customised charts have similar performance in detecting adverse perinatal outcomes after accounting for false positive rates for term births [30] and that the stronger associations between customised centiles and adverse perinatal outcomes (when compared to population centiles) were explained by confounding with preterm birth and maternal obesity [31], even though this is challenged by other authors.
The strength of this study is that, to the best of our knowledge, it is the first randomised trial assessing the effect of the GAP. DESiGN was a pragmatic trial capturing the real-life challenges of implementing complex interventions into clinical care and included a robust process evaluation and examination of implementation strength and variability. The trial has primarily used data from routinely collected electronic patient records, which has allowed cost-efficient inclusion of data from a large number of pregnancies. The primary outcome was antenatal ultrasound detection of SGA (after 24 completed weeks). We defined this as infants who are SGA (i.e., birthweight less than 10th centile) according to (i) population (UK1990 birthweight centiles) and (ii) customised (GROW) charts; this is considered to identify those at highest risk of adverse perinatal outcomes [32]. This is an important strength as both GAP and standard care target the detection of these infants.
We were unable to assess the impact of complete attainment of the GAP preimplementation requirements because only 1 implementing cluster achieved the training target for e-learning. The optimal interval between commencing GAP use and assessment of its effect is unknown. This study had a median interval of 9 months (range 6 to 12) from antenatal booking of women with the opportunity of exposure to GAP until commencement of outcome data collection. While the learning process of care providers may delay full programme effectiveness, an alternative "pioneering effect" may be working in the opposite direction [33]. Other limitations include issues related to the availability, or format, of data that are inherent in the use of routinely collected data, though we followed clear protocols in harmonisation and linkage of data from multiple electronic systems to minimise any variations in data quality between the randomised arms [15]. Missingness for characteristics (including customisation factors) was dealt with by multiple imputation, which is dependent on the assumption that results after inclusion of variables in the imputation model will be consistent between those with and without missing data. It is unlikely that randomisation to GAP or standard care would alter completeness of routine data collection in any cluster; therefore, this assumption is likely to be met. Ethnicity documented in hospital systems was often not as granular as that required by the customised calculator. One prespecified subgroup analysis exploring the effect of intervention in women stratified as high risk and low risk separately was not possible given lack of detailed data on some risk factors used to stratify women. The number of units randomised was modest and power was somewhat reduced by the failure of 2 units to contact the provider of GAP leading to their exclusion from our main analyses; however, the observed intracluster correlation coefficient was lower than that assumed for the power calculation; this would have preserved power to some extent.
We are not aware of other studies of GAP implementation that report as detailed assessment of the standardised implementation outcomes (fidelity, reach, and dose) as that performed in this trial [19], and by which we can benchmark these findings. While it is possible that the variable dose of implementation may explain the results of this trial, DESiGN was a pragmatic trial intended to reflect implementation in the real world. It is therefore possible that the implementation variability seen in this trial reflects the reality of implementing a complex intervention in a health service with competing needs on resources. A recent observational study of GAP implementation across the UK also described variation in implementation using nonstandardised outcomes. Their analysis demonstrated a greater reduction of stillbirth rates in maternity units that had completely implemented GAP (defined by reporting the birthweight and outcomes of more than 75% of births via the GAP online tool) compared with those that did not implement GAP [34]. A third of maternity units (31%; n = 29/94) implementing GAP achieved only partial implementation. The rate of stillbirth was no different between maternity units with partial or no implementation of GAP. The collective evidence from these studies highlights the challenges and variation in implementation of GAP.
This pragmatic study provides the only evidence from a randomised control trial regarding the effect of GAP, to the best of our knowledge. The GAP programme was not superior to standard care in the detection of SGA at birth by both population and customised centiles in this setting. Given the variable implementation observed, it is imperative that future studies assessing implementation of GAP or other interventions to improve perinatal outcomes, use standardised implementation outcomes (fidelity, reach, and dose) in order to determine the generalisability of our findings, identify barriers to implementation, and hence better inform policy for improving perinatal outcomes.

Dissemination to participants and related patient and public communities
Participating institutions and maternity units will be informed of the results soon after acceptance and any embargo period. We expect participating maternity units to share results locally in their communities aiming to also reach women that were pregnant during the study period. We will communicate with relevant stakeholders including SANDS and Tommy's Charities. The main results of the current research will also be disseminated to related patients and the public through blogs, press releases, newspapers, and conferences.