Universal third-trimester ultrasonic screening using fetal macrosomia in the prediction of adverse perinatal outcome: A systematic review and meta-analysis of diagnostic test accuracy

Background The effectiveness of screening for macrosomia is not well established. One of the critical elements of an effective screening program is the diagnostic accuracy of a test at predicting the condition. The objective of this study is to investigate the diagnostic effectiveness of universal ultrasonic fetal biometry in predicting the delivery of a macrosomic infant, shoulder dystocia, and associated neonatal morbidity in low- and mixed-risk populations. Methods and findings We conducted a predefined literature search in Medline, Excerpta Medica database (EMBASE), the Cochrane library and ClinicalTrials.gov from inception to May 2020. No language restrictions were applied. We included studies where the ultrasound was performed as part of universal screening and those that included low- and mixed-risk pregnancies and excluded studies confined to high risk pregnancies. We used the estimated fetal weight (EFW) (multiple formulas and thresholds) and the abdominal circumference (AC) to define suspected large for gestational age (LGA). Adverse perinatal outcomes included macrosomia (multiple thresholds), shoulder dystocia, and other markers of neonatal morbidity. The risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Meta-analysis was carried out using the hierarchical summary receiver operating characteristic (ROC) and the bivariate logit-normal (Reitsma) models. We identified 41 studies that met our inclusion criteria involving 112,034 patients in total. These included 11 prospective cohort studies (N = 9986), one randomized controlled trial (RCT) (N = 367), and 29 retrospective cohort studies (N = 101,681). The quality of the studies was variable, and only three studies blinded the ultrasound findings to the clinicians. Both EFW >4,000 g (or 90th centile for the gestational age) and AC >36 cm (or 90th centile) had >50% sensitivity for predicting macrosomia (birthweight above 4,000 g or 90th centile) at birth with positive likelihood ratios (LRs) of 8.74 (95% confidence interval [CI] 6.84–11.17) and 7.56 (95% CI 5.85–9.77), respectively. There was significant heterogeneity at predicting macrosomia, which could reflect the different study designs, the characteristics of the included populations, and differences in the formulas used. An EFW >4,000 g (or 90th centile) had 22% sensitivity at predicting shoulder dystocia with a positive likelihood ratio of 2.12 (95% CI 1.34–3.35). There was insufficient data to analyze other markers of neonatal morbidity. Conclusions In this study, we found that suspected LGA is strongly predictive of the risk of delivering a large infant in low- and mixed-risk populations. However, it is only weakly (albeit statistically significantly) predictive of the risk of shoulder dystocia. There was insufficient data to analyze other markers of neonatal morbidity.


Methods and findings
We conducted a predefined literature search in Medline, Excerpta Medica database (EMBASE), the Cochrane library and ClinicalTrials.gov from inception to May 2020. No language restrictions were applied. We included studies where the ultrasound was performed as part of universal screening and those that included low-and mixed-risk pregnancies and excluded studies confined to high risk pregnancies. We used the estimated fetal weight (EFW) (multiple formulas and thresholds) and the abdominal circumference (AC) to define suspected large for gestational age (LGA). Adverse perinatal outcomes included macrosomia (multiple thresholds), shoulder dystocia, and other markers of neonatal morbidity. The risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Meta-analysis was carried out using the hierarchical summary receiver operating characteristic (ROC) and the bivariate logit-normal (Reitsma) models. We identified 41 studies that met our inclusion criteria involving 112,034 patients in total. These included 11 prospective cohort studies (N = 9986), one randomized controlled trial (RCT) (N = 367), and 29 retrospective cohort studies (N = 101,681). The quality of the studies was variable, and only three studies blinded the ultrasound findings to the clinicians. Both EFW >4,000 g (or 90th centile for the gestational age) and AC >36 cm (or 90th centile) had >50% sensitivity for predicting macrosomia (birthweight above 4,000 g or 90th centile) at birth with positive likelihood ratios (LRs) of 8.74 (95% confidence interval [CI] 6.84-11.17) and 7.56 (95% CI 5.85-9.77), respectively. There was significant heterogeneity at predicting macrosomia, which could reflect the different study designs, the characteristics of the included populations, and differences in the formulas used. An EFW >4,000 g (or 90th centile) had 22% sensitivity at predicting shoulder dystocia with a positive likelihood ratio of 2.12 (95% CI 1. 34-3.35). There was insufficient data to analyze other markers of neonatal morbidity.

Conclusions
In this study, we found that suspected LGA is strongly predictive of the risk of delivering a large infant in low-and mixed-risk populations. However, it is only weakly (albeit statistically significantly) predictive of the risk of shoulder dystocia. There was insufficient data to analyze other markers of neonatal morbidity.

Author summary
Why was this study done?
• There is a debate regarding introducing universal third-trimester screening for macrosomia. An effective screening program requires two elements: an effective test at predicting a condition and an effective intervention.
• There is evidence that early-term induction of labor (IOL) could reduce the rates of shoulder dystocia. However, there is no high-quality evidence regarding the diagnostic effectiveness of fetal biometry at predicting macrosomia and associated morbidity.
What did the researchers do and find?
• We searched more than 10,000 titles and identified 41 studies including 112,034 patients that offered third-trimester ultrasounds for the prediction of macrosomia as part of universal ultrasound screening or were done in low-and mixed-risk populations. The quality of the studies was variable, and only three studies blinded the ultrasound findings to the clinicians.
• We found that the two most common ultrasound markers, the estimated fetal weight (EFW) and the abdominal circumference (AC), could predict the majority of macrosomic infants at birth (sensitivity >50%) with high diagnostic performance (positive LRs between 7 and 10).

Introduction
Macrosomia is usually defined as birthweight >4,000 g or >90th centile for sex and gestational age. Macrosomic birth weight is associated with the risk of adverse outcomes, including perinatal death [1] and injuries related to traumatic delivery [2]. Ultrasonic estimated fetal weight (EFW) was first described in 1975 [3]. The equation for EFW that is in most widespread use was published by Hadlock and colleagues in 1985 [4], and the distribution of EFW in relation to week of gestation was published in 1991 [5]. Hence, the diagnostic tools to identify small for gestational age (SGA) and large for gestational age (LGA) fetuses have been available for many years. One of the main complications associated with macrosomia is shoulder dystocia, and a Cochrane review of four randomized controlled trials (RCTs) including 1,190 women demonstrated that routine induction of labor (IOL) for suspected LGA may prevent this outcome [6]. However, it remains unclear whether screening and intervention for suspected LGA results in better outcomes. An RCT of IOL in women with an ultrasonically suspected LGA infant is in progress in the United Kingdom (The Big Baby trial, ISRCTN18229892). However, the women recruited to this trial will have been scanned because they were high risk for some reason, as the National Institute for Health and Care Excellence (NICE) has recommended that women should not be routinely scanned in late pregnancy [7]. Although the trial will confirm whether IOL is effective in high-risk women, it will not determine whether screening women without risk factors and intervening results in net benefit. It is often the case that screening and intervention programs that work well in high-risk groups do not work as well in low-risk populations, and one explanation for this can be that the screening test is less informative in low-and mixed-risk populations due to the lower prior risk of disease. In this study, we sought to quantify the diagnostic effectiveness of screening for fetal macrosomia and associated complications using universal ultrasonic fetal biometry in late pregnancy.

Sources
The protocol for this review was prospectively written and registered with PROSPERO (the International Prospective Register of Systematic Reviews), and the registration number was CRD42017064093. We searched the literature systematically using the Cochrane Database of Systematic Reviews (CDSR), Cochrane Central Register of Controlled Trials (CENTRAL), Medline, EMBASE, and ClinicalTrials.gov from inception to August 2019. An update search was done on May 28, 2020. We applied no restrictions on the language of the report or the location of the study. The studies were identified using a combination of words related to "ultrasound," "pregnancy," "estimated fetal weight," "EFW," "birthweight," "macrosomia," "large for gestational age," "shoulder dystocia," and "brachial plexus injury." The exact search strategy is presented in S1 Text.

Study selection
We set out to include cohort studies where an ultrasound scan was performed �24 weeks' gestation (wkGA), excluding multiple pregnancies. We included studies of low-risk populations, universal screening, and mixed-risk populations (i.e., included both high-risk and low-risk pregnancies). Studies that included only high-risk women, such as patients with preexisting or gestational diabetes, and those in which the ultrasound was performed during labor were excluded. Studies were not selected on the basis of the definition of the index test, i.e., the formula and the threshold used. Finally, we included both blinded and unblinded studies.

Index tests and outcomes
For the purposes of the meta-analysis, we defined suspected LGA as a fetus with an EFW >4,000 g or >90th centile or with an abdominal circumference (AC) >36 cm or >90th centile. However, we have also documented other thresholds used. The outcomes studied included macrosomic birth weight (>4,000 g or >90th centile) and severe macrosomic birth weight (>4,500 g or >97th centile); shoulder dystocia; and perinatal morbidity (neonatal unit admission, 5-minute Apgar score of six or less, metabolic acidosis, neonatal hypoglycaemia, and neonatal jaundice).

Quality assessment
Two authors (AAM and NS) independently performed the literature search, using the software package Review Manager 5.3. Any differences were addressed in consultation with the senior author (GCS). The Quality Assessment of Diagnostic Accuracy Studies (QUADAS 2) tool was used to assess the risk of biases, following the Cochrane Handbook of Diagnostic Test Accuracy Studies [8]. The QUADAS 2 tool was employed to assess potential biases in patient selection, index test, reference standard, and flow and timing. In relation to flow and timing, we assessed the risk from the perspective of universal ultrasound screening near term (i.e., around 36 wkGA). Flow and timing are based on the timing of the ultrasound scan, the timing of delivery, and the length of the interval between scan and delivery. A standardized data extraction form was employed to obtain information on the characteristics of the study (publication year, location, setting, study design, blinding), the participants (inclusion and exclusion rules and number, including inclusion or exclusion of women with diabetes, either preexisting or gestational), the index test (range of wkGA when the scan was conducted, the EFW equation employed, and the threshold for screen positive), reference standard (outcome, wkGA at delivery, and the scan-to-delivery interval).

Data extraction and synthesis
Sensitivity, specificity, positive and negative likelihood ratios (LRs) [9] were calculated from standard two-by-two tables, which had been extracted for each study by tabulating each of the different definitions of screen positive with each of the different outcomes studied. The "hierarchical summary receiver-operating characteristics" (HSROC) model of Rutter and Gatsonis [10] was utilized for data synthesis. This method allows the results of studies to be combined despite variation between studies in the threshold employed for screen positive. The bivariate logit-normal (Reitsma) model [11] was used to calculate average estimates of sensitivity and specificity and respective variances, at a specific threshold, in analyses in which data were available from at least four studies. We also used meta-analysis to obtain a summary of the diagnostic odds ratios (DORs) [12]. Publication bias was assessed using the Deeks' funnel plot asymmetry test when data was available from a sufficient number of studies. Significant asymmetry was assumed at P < 0.05 [13]. Statistical analyses were performed using STATA version 14 (StataCorp LP, College Station, Texas), specifically, its METANDI, METAN, and MIDAS packages. Analysis and reporting was performed using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (S1 PRISMA Checklist) [14].

Quality assessment
The risk of bias, as assessed by the QUADAS-2 tool, is summarized in Fig 2 and presented in detail in S1 Fig. The Galvin 2017 study [29] was published as an abstract; hence, we used a different study from the same cohort (GENESIS study) [56] to assess the risk of bias. Two of the included studies [51,52] have been authored by some of the coauthors of this paper. We used the same criteria for the quality assessment and analysis. Only three studies-Sovio 2018 [52] (Pregnancy Outcome Prediction study), Galvin 2017 [29] (GENESIS study), and Peregrine 2007 [47]-blinded the results to the clinicians. Hence, the large majority of studies were at risk of bias in relation to the reference standard. The second most common risk of bias was in relation to flow and timing, as six studies [19,24,36,39,47,55] performed the ultrasound either prior to IOL or less than 72 hours before delivery, resulting in a very short interval between the scan and delivery. Conversely, two studies [18,27] had a very long interval (ultrasound <33 wkGA). Two studies [17,20] did not present data on the gestational age at delivery. Finally, three studies [23, 48,54] were confined to pregnancies progressing beyond 41 wkGA and were classified as having "high applicability concerns due to patient selection".

Meta-analysis results
Full details of the summary diagnostic performance are presented in Table 1. In summary, both definitions of ultrasonically suspected macrosomia (i.e., either EFW >4,000 g or >90th percentile) had >50% sensitivity for predicting LGA at birth. Many associations were similar regardless of the formula employed, but the positive LRs for the Hadlock formulae (ranging between 7.5 and 12) tended to be higher than for the Shepard formula (around 5). The

PLOS MEDICINE
Third-trimester screening for macrosomia  performance of definitions using just the AC was similar to using an ultrasonic EFW. The sensitivity for predicting severe macrosomia at birth of suspected LGA was around 70%. However, macrosomia (EFW >4,000 g or >90th centile) had a lower (22%) sensitivity for predicting shoulder dystocia, although the association was statistically significant and the positive LR was approximately 2. Fig 3 has summary ROC curves for shoulder dystocia and macrosomia. For the prediction of macrosomia at birth, most of the large studies were close to the point estimate, and only a few small studies were outside the prediction intervals. For shoulder dystocia, most studies reported sensitivities below 30%, and only one study [55] reported a sensitivity of >50%. However, in this study, the total number of shoulder dystocia cases was very small (n = 3).

Discussion
The main conclusion of this analysis is that an ultrasonic EFW indicating an increased risk of a large baby was strongly associated with delivering a macrosomic infant, but it was only weakly associated with the risk of shoulder dystocia. When the EFW was calculated using the widely employed Hadlock method, the positive LRs for macrosomia were in the region of 7 to 12, whereas they were approximately 2 in relation to the risk of shoulder dystocia. This is the largest systematic review on the prediction of macrosomia and the only study that was focused on low-and mixed-risk populations from the perspective of using third-trimester ultrasound as routine screening in all pregnancies. We reported on multiple ultrasound markers and formulas. Moreover, we also reported on the prediction of shoulder dystocia, which is a major perinatal complication, the prevention of which would be a major aim of the routine ultrasound screening. The main limitation of this study is that there was significant heterogeneity between the studies in the ability to predict a macrosomic infant, as the forest plot of DORs indicates. The source of this heterogeneity is unclear, but it could relate to differences in the quality of the performance of the diagnostic test, such as the quality of the imaging equipment, the skill and training of sonographers, and the characteristics of the population. Finally, despite the large amount of studies included, only three studies [25, 29,52] reported any outcomes of neonatal morbidity, and a meta-analysis was not feasible.
In the current study, we incorporated previously published data from the POP study (Sovio 2018) [52], which included nulliparous women who had a research scan at 36 wkGA, which was blinded in most cases to the clinicians. We found that the DOR (95% CI) from the POP study was very similar to the summary DOR derived from all of the other studies, which suggests that the results from the POP study are likely to be generalizable. The POP study was one of only a few identified that blinded the ultrasound result. Another blinded study, conducted in seven centers across Ireland between 2012 and 2015, the GENESIS study (Galvin 2017) [29], was a prospective cohort study of 2,772 nulliparous pregnant women. The results of the GENESIS study have only been published in conference proceedings [29] and include the outcome of shoulder dystocia but not macrosomia. Interestingly, neither the POP study nor the GENESIS study observed a statistically significant association between ultrasonic LGA and shoulder dystocia. When blinded and unblinded studies were combined, the meta-analysis demonstrated that ultrasound may be predictive of shoulder dystocia, albeit weakly. However, the associations observed in the other studies may be due to ascertainment bias. Specifically, if the fetus is suspected to be large on the basis of the EFW, the staff attending the delivery may have a lower threshold for using maneuvers for shoulder dystocia in the event of any delay. They may also be more likely to document a given delay as being due to shoulder dystocia. Hence, unblinded studies could result in stronger associations with shoulder dystocia through ascertainment bias. The fact that ultrasonic EFW is relatively poor as a predictor of shoulder dystocia is not unexpected, given that the actual birth weight of the baby is also not strongly predictive of the outcome: the majority of cases of shoulder dystocia involve a normal birth weight infant [57].
Finally, ultrasonic suspicion of a large baby is a clinical situation where there is evidence that knowledge of the scan result may itself cause complications. Multiple studies have demonstrated that women who have a false positive diagnosis of fetal macrosomia based on EFW are more likely to be delivered by emergency caesarean section [58,59]. This finding underlines the potential for harm caused by screening low-risk women. Research studies in which the results of the scan are revealed could lead to associations with adverse outcomes that were caused by an iatrogenic harm from a false positive result. Conversely, analysis of studies in which the scan was revealed may fail to show true associations with adverse outcome as knowledge of the scan result led to interventions that mitigated the risk.
We conclude that ultrasonically suspected LGA in the general population has quite good diagnostic effectiveness for macrosomic birth weight. However, it is not strongly predictive of the risk of associated complications, such as shoulder dystocia. Similar observations have been made in relation to ultrasonically suspected SGA [60,61]. That study indicated that reduced fetal abdominal growth velocity helped discriminate between healthy SGA babies and those that were at increased risk of complications. Interestingly, the analogous finding is also true in LGA babies, in whom the combination of LGA and accelerated abdominal growth velocity was associated with the risk of neonatal morbidity [52]. We believe that future studies should address the other factors which help differentiate those suspected LGA fetuses which are at the greatest risk of complications.