Predictive modeling of emergency cesarean delivery

Objective To increase discriminatory accuracy (DA) for emergency cesarean sections (ECSs). Study design We prospectively collected data on and studied all 6,157 births occurring in 2014 at four public hospitals located in three different autonomous communities of Spain. To identify risk factors (RFs) for ECS, we used likelihood ratios and logistic regression, fitted a classification tree (CTREE), and analyzed a random forest model (RFM). We used the areas under the receiver-operating-characteristic (ROC) curves (AUCs) to assess their DA. Results The magnitude of the LR+ for all putative individual RFs and ORs in the logistic regression models was low to moderate. Except for parity, all putative RFs were positively associated with ECS, including hospital fixed-effects and night-shift delivery. The DA of all logistic models ranged from 0.74 to 0.81. The most relevant RFs (pH, induction, and previous C-section) in the CTREEs showed the highest ORs in the logistic models. The DA of the RFM and its most relevant interaction terms was even higher (AUC = 0.94; 95% CI: 0.93–0.95). Conclusion Putative fetal, maternal, and contextual RFs alone fail to achieve reasonable DA for ECS. It is the combination of these RFs and the interactions between them at each hospital that make it possible to improve the DA for the type of delivery and tailor interventions through prediction to improve the appropriateness of ECS indications.


Results
The magnitude of the LR+ for all putative individual RFs and ORs in the logistic regression models was low to moderate. Except for parity, all putative RFs were positively associated with ECS, including hospital fixed-effects and night-shift delivery. The DA of all logistic models ranged from 0.74 to 0.81. The most relevant RFs (pH, induction, and previous C-section) in the CTREEs showed the highest ORs in the logistic models. The DA of the RFM and its most relevant interaction terms was even higher (AUC = 0.94; 95% CI: 0.93-0.95).

Conclusion
Putative fetal, maternal, and contextual RFs alone fail to achieve reasonable DA for ECS. It is the combination of these RFs and the interactions between them at each hospital that make it possible to improve the DA for the type of delivery and tailor interventions through prediction to improve the appropriateness of ECS indications. PLOS  Introduction A worrisome issue in obstetrics is the longstanding increase in cesarean section rates, as well as the unjustified variations in these rates in clinical practice across public and private hospitals worldwide [1][2][3]. This is particularly important in the case of emergency (i.e., unscheduled) cesarean section (ECS) rates, assuming that the appropriateness of indications for scheduled C-sections is reasonably acceptable and much higher than that for ECSs [4][5][6][7][8][9][10][11]. Heterogeneity in clinical decision-making should always be investigated when unjustified variations are suspected. Knowing the fetal, maternal, and contextual factors that drive the decision to perform an ECS at each hospital is paramount to designing and implementing hospital-tailored interventions specifically aimed at improving the appropriateness of indications for ECSs in order to avoid unnecessary ECSs and the associated complications and costs [12][13][14][15][16][17][18][19][20][21][22]. Few current clinical guidelines and interventions target these objectives [23][24][25][26][27][28][29]. Those that do are neither based on a comprehensive set of proven fetal and maternal risk factors (RFs) with high discriminant accuracy (DA) nor designed to take into account contextual factors that have been shown to be associated with both an increased rate of unnecessary ECSs and unjustified variations in clinical practice. Furthermore, most RFs for ECSs should be considered putative, since they have mainly been selected by means of logistic regression models that usually lack information regarding both their goodness-of-fit and their DA [30][31][32][33][34][35][36][37][38]. Traditional measures of association alone are inappropriate to discriminate between who will suffer a given outcome and who will not. Therefore, interventions based on average risk estimates for people both exposed and unexposed to spurious RFs could be ineffective, inefficient, and even potentially harmful [12][13][14][15][16][17][18][19][20][21][22].
To our knowledge, very few studies have sought to improve the ability to predict which women are at higher risk of ECS. Those that do are limited to nulliparas, include only a few of the putative RFs, and report no measures of either calibration or DA of the statistical models developed [30][31][32][33][34][35][36][37][38]. Our objective is not to build an explanatory model of the decisions to perform an ECS, but to increase the predictive accuracy regarding this type of delivery in order to provide more validated information with the ultimate view to improving the appropriateness of indications for ECS and thus preventing unnecessary C-sections.

Material and methods
The present study is part of a large multifaceted intervention intended to improve the appropriateness of the indications for ECSs in 22 public hospitals of the Spanish National Health Service launched by the Spanish Ministry of Health. Of those 22 participating hospitals, four (A, B, C, and D) were included in this study because their databases were the most reliable in terms of consistency and coverage to ensure that robust predictive models of ECSs could be built. In size and complexity, the obstetric services of these four hospitals belong to level II (out of III) of the Spanish National Hospital Catalogue. They can be considered representative of about 42% of all obstetrics services of the Spanish National Health Service that belong to this level, since they all have a very similar case mix, and attend pregnant women with similar obstetric risk.
The study population consisted of all 6,157 singleton births, with no exclusions, occurring in 2014 at four public hospitals located in three different autonomous communities of Spain. According to the Spanish National Institute of Statistics, these 6,157 births account for 1,5% of all yearly births in Spain (around 420,000/year). Hospitals A and B account for 26,5% of all births occurring yearly in the Autonomous Community of the Balearic Islands, Hospital C for 12,6% of those occurring in Galicia, and Hospital D for 2,0% of those occurring in Valencia (https://www.datosmacro.com/demografia/natalidad/espana-comunidades-autonomas). Data were collected prospectively over 2014 and registered in a specifically designed database that included the fetal, maternal, and contextual independent variables (Table 1 and S1 Tables). All presentations were included in the analysis. All variables put forth in the medical literature as predictive variables (putative PFs) of the type of delivery were in principle considered in the study with few exceptions. Since birth weight is a post-delivery variable, it cannot be predictive of the type of delivery. The estimated preterm fetal weight could be considered a potential predictive variable. However, it is barely used given that its measurement is very imprecise (± 400 g) [1,4,[6][7][8][9][10].
Unlike other predictive models published, we additionally included hospital fixed-effects and night-shift delivery as potentially predictive contextual independent variables. They are unobserved effects of hospital (contextual) characteristics that are not captured by any of the independent variables included in the models. They may be predictive of the type of delivery, account for a certain fraction of the medical variations (total variance) of ECSs often found in small area analysis, and modify the strength of the associations of the independent RFs and the type of delivery. They are not explanatory of the type of delivery, but their association with it may be indicative of different entrenched, difficult to measure clinical practices across hospitals that are likely to influence the decision regarding the type of delivery and therefore they warrant further investigation. Night-shift delivery was also included as an additional potentially predictive contextual independent variable, since it has been shown to be both a good predictor of the delivery mode, and an appropriate instrumental variable to infer causal associations between the average treatment effect of non-medically indicated cesarean sections (compared with vaginal delivery) on newborn´s health outcomes [39]. Descriptive statistics were calculated for all fetal, maternal, and contextual variables. Scheduled, emergency, and overall (both scheduled and emergency) C-sections were estimated for the whole population and for each hospital with their corresponding 95% CI.
The first step in our analytical approach to identify RFs for ECS was to calculate the prevalence of each putative RF in the overall population and in mothers delivering both by vaginal birth and by ECS, as well as their 95% CI. We then estimated the prevalence ratios of each RF (by dividing the prevalence of the RF by the prevalence of ECS). Finally, we estimated the positive likelihood ratios (LR+) of each RF and their 95% CIs. (A LR+ >10 is considered high enough to rule in the outcome, 5-10 is considered moderate, and 2-5 is considered low [40][41][42][43][44][45][46][47]. The second step was to build a logistic regression model for each of the four hospitals included in the study (A, B, C, and D), as well as a logistic model for the overall sample to find out which fetal, maternal, and contextual RFs (independent variables) were associated with the outcome (delivery type: vaginal or ECS), as well as the strength of the associations found. Model specification was performed based on stepwise top-bottom variable selection, and taking into consideration the clinical relevance of each variable. Crude and adjusted ORs were obtained, as well as their 95% CIs. The models' goodness-of-fit was compared by means of the -2log-likelihood ratios and the Akaike information criterion (AIC). Their DA was assessed through their areas under the receiver-operating-characteristic (ROC) curves (AUCs) along with their 95% CI.
We then fitted a classification tree (CTREE or conditionally unbiased inference classification tree), a relatively new and useful predictive technique for studying RFs and outcomes based on the unbiased recursive splitting of the study population sample into subgroups according to the independent variables [48]. The underlying mathematical algorithm chooses which independent to split, their discriminatory value, and the order in which the splitting occurs. Outcome discrimination can thus be maximized at each step, making it possible to account for complex relationships between variables and their interactions and preventing both over-fitting and biased variable selection. The process develops a hierarchical tree structure that enables such simultaneous analyses and presents them in a clinically useful format [48][49][50].
Unlike CART models, CTREE can handle datasets with both categorical and numerical variables without producing biased splits, and the interpretation of both odds ratios and likelihood ratios is straightforward. Therefore, we used dichotomous variables to enable comparisons with other published studies despite a small potential loss of information. All births were included in the analysis, and anonymity was preserved. A database was constructed by two computer engineers, who also managed the transfer of data. Database quality was periodically audited and was considered reliable in terms of consistency, coverage, and agreement. The database is available upon request. The Spanish Ministry of Health approved this study under the Strategy for Assistance at Normal Childbirth in the National Health System (PI/01445).
We also developed a random forest model (RFM) that fits n classification trees by randomly selecting predictors for each tree. CTREE was used as the base learner, and 500 different trees were created by bootstrapping, rendering more accurate predictions than a single tree analysis.
This algorithm allows to estimate the relative importance of each independent variable in the model (i.e. the contribution of each independent variable to the predictive power of the random forest). The methodology to compute relative importance of each variable (known as conditional permutation importance), and more information regarding CART, CTREE, and RFM can be found elsewhere [48][49][50]. We also compared the models' discriminatory performance by means of their corresponding ROC curves. Goodness-of-fit analysis across the abovementioned models was performed using in-sample (n = 6,157) data with ROC curves. The statistical analyses were performed using R Statistical Software (Foundation for Statistical Computing, Vienna, Austria) [49,50].

Results
ECS rates varied from 8 to 15% across hospitals, whereas overall C-section rates were higher (12-21%) ( Table 2). Descriptive population statistics are shown in Table 3. Mothers delivering by ECS were slightly older, had higher BMIs and weight, were more likely to have had a previous C-section, had more comorbidity, presented greater obstetric risk, more often underwent labor induction and delivered during the night shift, and had a slightly lower gestational age, and intrapartum (scalp) pH than those who had eutocic deliveries. No differences were found regarding smoking during pregnancy ( Table 4).
The prevalence of the putative RFs for ECS in the overall population, as well as in eutocic and ECS deliveries, is shown in Table 5. In the overall population, the RFs with the highest prevalence (over 40%) were previous pregnancies, night delivery, BMI ! 25, and obstetric risk. The prevalence of all RFs except smoking and parity was higher in women delivering by ECS than in those with eutocic deliveries according to their 95% CI. All prevalence ratios were 6% or lower, and the LR+ of all individual RFs were low (4.14 or lower). The gender of the fetus was neither associated with the type of delivery nor improved either the calibration (-2 log likelihood ratios, AIC) or the discriminant accuracy (C statistic) of the final models. Therefore, it was excluded from the final logistic models. BMI was finally included since it did not make any difference to include height and weight separately or BMI in terms of both the calibration (AIC) and the discriminant accuracy (C statistic) of the models. We did choose the most parsimonious models as the final ones. Gestational age was also excluded from the final logistic models due to its high collinearity with the rest of the independent variables that remained in the model for each hospital, and because its inclusion led to biased intercept estimates of these logistic models. According to the final logistic regression model for the overall population (Table 6), all RFs except for the number of previous pregnancies were positively associated with ECS. The strongest associations were those found for scalp pH (OR = 5.56), Hospital C (OR = 2.69), induction (OR = 2.32), and previous ECS (OR = 2.28). The remaining ORs were lower than 1.5, although the lower limits of their 95% CI were greater than 1.0. The only inverse association found was that between parity and ECS (OR = 0.87). With regard to the contextual variables, hospital fixed-effects and night-shift delivery were also positively associated with ECS. The strongest association was found with Hospital C, what is consistent with its substantial relative importance found in the random forest ( Table 7).
The strength of the positive associations was relatively similar in the models for each of the four hospitals and in the model for the overall population. Although pH, induction, and previous ECS appear to be the RFs with the highest ORs, and age and BMI those with the lowest, their relative magnitude at each hospital varied slightly, except for pH, which was substantially higher at one hospital (OR = 7.17). Parity was positively associated with ECS at only one hospital, whereas obstetric risk was positively associated with it at only two.
The logistic model for the overall population and those for each hospital fit the data well, as indicated by both the -2log-likelihood ratio and the Akaike criterion. The goodness-of-fit of Table 6. Logistic regression models to assess the association between the putative risk factors and type of delivery for the overall population and the four hospitals.  (Table 6).
Of the two recursive partitioning models (CTREE and Random Forest), CTREE was used as the base learner for the Random Forest algorithm (n = 500). Fig 1 depicts the tree structure of the trained CTREE. The first split (p < 0.001) is scalp pH, followed by labor induction and previous ECS, for pH ! 7.20 and pH < 7.20 respectively, meaning that if the pH ! 7.20, the   next split is birth induction (p < 0.001), whereas if the pH < 7.20, the next split is previous ECS (p = 0.003). The interpretation extends to the conditional nodes (splits) and leaves. By way of example of the meaning and utility of hospital effects, on the extreme right side of Fig 1  it can be seen that mothers whose fetuses had a scalp pH > 7.20 and had not had a previous ECS, in hospital D had a probability of almost 48% of having an ECS, whereas in the other hospitals (A, B, and C) this probability went down to 27%. The AUC mean value of the CTREE was 0.88 (95% CI: 0.84-0.92).
The RFM consisted of a set of n = 500 CTREEs with an optimal number of randomly selected variables = 2. Although random forest algorithms tend to be more of a black box in terms of their interpretation, their predictive power (AUC = 0.94; 95% CI: 0.93-0.95) provides reliable predictions even at an individual level. The relative variable importance of all variables included in the RFM is shown in Table 7. The three most relevant RFs (pH, induction, and previous ECS) also showed the strongest associations in the logistic models. Since the LR+ of all the interaction terms found in the RFM were lower than 10, as was the case for the individual RFs (Table 5), they failed to rule in the type of delivery.

Discussion
The strength of the associations between some putative RFs and ECS, their prevalence, their prevalence ratios, and their LR+ in the overall population were low to moderate, indicating, as in other studies, that single RFs alone offer only a low DA for most outcomes, such as ECS [40][41][42][43][44][45][46][47].
With the exception of scalp pH, the magnitude of the strength of these associations was low and similar across the four hospitals. Likewise, all were positive except for the number of pregnancies, which showed an inverse association. Heterogeneity did not seem to play a relevant role in the study population solely on the basis of this initial analysis. Moreover, only the number of pregnancies seemed to increase the odds of a vaginal delivery, as would be expected.
In the final logistic model for the overall population both contextual variables (hospital fixed-effects and night-shift delivery) were positively associated with ECS and increased goodness-of-fit. These variables were associated with higher ECS rates and may thus favor the indication of ECS over vaginal deliveries. Regardless of maternal and fetal characteristics, and as indicated in a number of studies, different entrenched practices across hospitals seem to influence the decision regarding delivery type, similar to how physicians' desire for night-time leisure influences the decision to perform an ECS at the start of the night shift [4][5][6][7][8][9][10][11]39].
No single 100% accurate predictive model of the type of delivery has been published to date. In fact, only a few have been published all showing a low predictive and discriminant accuracy. All these contextual (hospital) factors that may contribute both to predict and explain variations in both the type of delivery and in the appropriateness of the c-section´s indications (as shown by the high variability of rates of c-sections in several published atlases of variations in medical practice) remain unobserved and unknown. The only available way to account for them is by including hospital fixed-effects in logistic models and in random forests as contextual variables (which are tantamount of the second level variables in multilevel analyses). Moreover, their inclusion in the models reduced the biases in the estimates of the measures of strength of the associations without resulting in overfitting, and increase their discriminant accuracy because they account for the abovementioned unobserved predictive factors [4][5][6][7][8][9][10][11]39].
These results illustrate the usefulness of this analytic approach because they suggest that some hospital characteristics (i.e., method of payment and other incentives, physicians' desire for night-time leisure, established non-evidence-based practices such as to perform a c-section to mothers having had a previous c-section) may explain unjustified variations and inappropriateness of some indications for c-sections that warrant further investigation.
Consequently, all fetal, maternal, and contextual factors alone failed to achieve a reasonable DA for ECS rates in different population subgroups at each hospital even after they were controlled for in these models. This is consistent with the well-known fact that the decision regarding the type of delivery hinges not only on different combinations of these RFs and the interactions between them, but also to some extent on variations across individual hospital practices and even individual clinicians' practices. It can thus be the product of unjustified non-evidence-based clinical practices, which has long been shown in studies of variations in clinical practice with regard to CS using small area analysis [4][5][6][7][8][9][10][11].
Measures of association alone are insufficient to discriminate between those individuals who will develop a given outcome and those who will not (a strong association is not tantamount to high DA given that the false positive and false negative fractions of the population are low) [40][41][42][43]. It is the set of independent variables included in the final logistic models that could make it possible to achieve acceptable DA, as shown by their high AUC (0.75-0.81). To our knowledge, no logistic regression model published to date has achieved an AUC similar to those reported here.
The AUCs of the RFM (0.93-0.95) and the CTREE (0.84-0.92) offer a considerably improved additional analytical approach to the same issue due to the nature of their optimization algorithm, maximum likelihood for logistic and unbiased recursive partitioning for CTREE. Their incremental DA is notably higher than that of logistic models due to the unsupervised detection of interactions in the CTREE model and 500 such CTREEs in the RFM. The reasons for this improvement in DA are mainly twofold. First, it results from detecting associations and interactions among the combinations of RFs used in clinical decision-making regarding the type of delivery at each hospital that are not captured by logistic models. Second, the model also captures heterogeneity (the trees' branches), among both the hospitals and the clinicians' decision-making frameworks, that logistic models likewise cannot capture.
In terms of implications for clinical practice, we found some medically unjustified differences in ECS rates for hospital D compared to the other hospitals, e.g., in induced births between 11 p.m. and 3 a.m. in which the scalp pH was above 7.20 (nodes 2, 16, and 20). Moreover, in the subgroups of deliveries with pH above 7.20 and at least one previous C-section (nodes 25 and 26), the ECS rates climbed to 50% and almost 60%, respectively. The utility of these results lies in that, despite they are neither explanatory not confirmatory, they suggest potential sources of inappropriate ECSs in Hospital D (contextual factors) that should be further investigated (i.e., changes in payment methods, lack of updated clinical guidelines, lack of utilization management, demand side issues).
One of the main limitations of this study is that only 4 out 22 obstetrics services were included as explained in the Introduction. These four hospitals could be considered representative of up the 42% of hospitals within the Spanish National Health Service in terms of obstetric case mix, obstetric risk, and number of births and CS rates. However, it is to be expected that studies intended to build a predictive model for the type of delivery fail to have a high external validity with regard to the specific RFs for ECS. As already noted, it is the combination of RFs (fetal, maternal, and contextual) at each particular hospital and the interactions between them what makes it possible to improve the DA for the type of delivery. The more the clinical practice varies across centers and clinicians, the more different RF-combination subgroups can be expected to appear in the CTREES given their higher ability to capturing them; hence, the more hospital-specific the combination of RFs and interactions between them yielding the highest DA will be. Given that we performed a 10-fold cross-validation using randomly allocated 90/10% training/test sample sizes, the chances of the RFM being overfitted and the AUCs being overestimated are very low.
Another limitation of the study is that scalp pH is a very proximate measure likely linked to fetal distress, so it is not a surprise that it is highly predictive. We did not include cord pH because it is a post-delivery endpoint and as such cannot be considered a predictive variable of the type of delivery. We could agree that scalp pH is linked to fetal distress and can be highly predictive. However, we have included it in the models as a predictive variable for several reasons: i) scalp pH is an intrapartum variable, not a final endpoint. Variations in the cut-off points actually used in clinical practice may explain both variations in the diagnosis of fetal distress, and in the fraction of appropriate and inappropriate indications for ECSs across hospitals (as it have been shown is studies of the appropriateness of the different types of emergency ECSs indications, in this particular case, fetal distress); ii) it has also been shown that both the clinical management of intrapartum (scalp) pH and thus of fetal distress varies across hospitals, and that it accounts for a considerable fraction of inappropriateness of ECSs for this specific indication, what could make scalp pH a predictive variable for some but not all ECSs; and iii) tenfold cross validation performed in the CTREE model prevented from obtaining overfitted estimates when including this variable.
Therefore, this study's main contribution is that the information provided by the combination of logistic regressions and CTREES can provide more accurate information than either method alone to help clinicians and managers find the sources of heterogeneity and unjustified variations in ECSs, design and implement hospital-tailored interventions intended to improve the appropriateness of their indications, and reduce unnecessary ECS and their avoidable complications and costs. This comprehensive and complementary statistical methodology, combined with robust data collection and audit processes, makes it possible to analyze an intricate medical decision-making problem with higher discriminant capacity than previous studies.
In conclusion, fetal, maternal, and contextual factors alone fail to achieve a reasonable discriminatory accuracy for type of cesarean delivery. We have met our objective by simultaneously considering these factors at each particular hospital by using both logistic regressions and the CTREES for the following reasons. First, this analytical strategy has improved the final discriminatory accuracy of the models for the type of delivery compared with that of the predictive models published to date. Second, the discriminatory accuracy of these models has been validated in our study by means of ten-fold cross-validation. Third, the results allow for further investigating sources of variability and inappropriateness of ECSs. Finally, based on this information, they also allow for tailoring hospital-specific interventions intended to discriminatory accuracy improve the appropriateness of indications for ECS.