Prediction of mortality in very low birth weight neonates in Spain

Objective Predictive models for preterm infant mortality have been developed internationally, albeit not valid for all populations. This study aimed to develop and validate different mortality predictive models, using Spanish data, to be applicable to centers with similar morbidity and mortality. Methods Infants born alive, admitted to NICU (BW<1500 g or GA<30 w), and registered in the SEN1500 database, were included. There were two time periods; development of the predictive models (2009–2012) and validation (2013–2015). Three models were produced; prenatal (1), first 24 hours of life (2), and whilst admitted (3). For the statistical analysis, hospital mortality was the dependent variable. Significant variables were used in multivariable regression models. Specificity, sensitivity, accuracy, and area under the curve (AUC), for all models, were calculated. Results Out of 14953 included newborns, 2015 died; 373 (18.5%) in their first 24 hours, 1315 (65.3%) during the first month, and 327 (16.2%) thereafter, before discharge. In the development stage, mortality prediction AUC was 0.834 (95% CI: 0.822–0.846) (p<0.001) in model 1 and 0.872 (95% CI: 0.860–0.884) (p<0.001) in model 2. Model 3’s AUC was 0.989 (95% CI: 0.983–0.996) (p<0.001) and 0.942 (95% CI: 0.929–0.956) (p<0.001) during the 0–30 and >30 days of life, respectively. During validation, models 1 and 2 showed moderate concordance, whilst that of model 3 was good. Conclusion Using dynamic models to predict individual mortality can improve outcome estimations. Development of models in the prenatal period, first 24 hours, and during hospital admission, cover key stages of mortality prediction in preterm infants.

Introduction Preterm birth is the main cause of perinatal mortality and accounts for more than 50% of child disability [1]. Extremely preterm infants (gestational age (GA) under 28 weeks' gestation), and especially those infants born at the limits of viability (GA between 22 and 25 weeks' gestation), are the most vulnerable infants in neonatal intensive care units (NICUs), with high morbidity and mortality rates. Decisions regarding care and treatment of these infants are especially challenging, both for health care professionals and families.
Predictive models estimate the probability, or risk, of a certain condition occurring in an individual after the combination of different prognostic factors [2]. They are of utmost importance for complex medical situations with high mortality rates. As reviewed by Medlock et al [3], at least fifty predictive models for preterm infant mortality have been developed. Some, such as the CRIB score [4] or the NICHD model [5], are widely used in neonatal units throughout the world. However, about ten models were developed in the pre-surfactant era [3,4,6] and very few have been externally validated in more than one study [4,5,7].
In Spain, around 30000 preterm infants are born each year [8][9]. The Spanish SEN1500 database was created in 2002 and collects data reported by 65 Spanish NICUs. It includes data from over 2500 very low birth weight (VLBW) infants per year, born with a gestational age (GA) under 30 weeks or birth weight (BW) under 1500g [10][11]. Data collected in this database serves as a benchmark for NICUs, allowing local and collaborative quality improvement projects [12][13][14][15][16][17].
The aim of this study was to develop and validate different mortality predictive models to be used in preterm newborns in Spain.

Population and study centres
Data was obtained from the SEN1500 database: Infants born alive with BW <1500g or GA <30 weeks, and admitted to 65 Spanish NICUs between 2009 and 2015, were included. There were two main study periods: Development of predictive models (data from years 2009-2012) and validation of models (2013)(2014)(2015). Exclusion criteria were: Fetal or delivery room deaths, major congenital defects, severe chromosomal abnormalities, and patients transferred to another hospital of whom morbidity and mortality data were missing.
The identity of patients and data collection followed strict confidentiality standards. This study was approved by the Spanish Neonatology Society and the Ethics Committee of Sant Joan de Déu Hospital, Barcelona.

Study variables
The SEN1500 records perinatal, neonatal and follow-up variables. Perinatal and neonatal variables used in this study, as well as main diagnoses, and severity-adjusted diagnoses adapted in accordance with severity criteria, are shown in Table 1. significance of p<0.15 in the Chi-square or Mann-Whitney U test were then introduced as covariates in the regression models. Where possible, evaluation of the model with Nagelkerke's R2 was performed. For all models, cut-off values with the highest Cohen Kappa index [18] in the training set were selected. Model discrimination was assessed using the area under the curve (AUC) and the odds ratio (OR). The Hosmer-Lemeshow test was used to measure model calibration; if p>0.05, the model was considered well-calibrated [19]. The Brier score, which analyses the deviation between predicted and real probabilities uninfluenced by selected cut-off values, was used to test model accuracy; setting values, from 0 for a perfect model to 0.25 for a non-informative model [20,21]. Predictive models were generated using generalized linear model (GLM) classifiers with R 3.5.1 software (Caret package). The output is a maximum-likelihood estimation for the probability of death of each sample, which is calculated with the following formula: Probability of "death" = 1 / (1+e -z ), where z is the output value of the model. Conversely, survival probability (in percentage) would be 100 -mortality (in percentage).
Model 1 is a Prenatal predictive model (dependent variable: Hospital mortality. Independent variables: Maternal and perinatal data, Table 1).
Model 2 is a 24 hours of life predictive model (dependent variable: Hospital mortality. Independent variables: All maternal and perinatal data, including severe respiratory distress syndrome, Table 1).
Model 3 is a predictive model during hospital admission (dependent variable: Hospital mortality. Independent variables: All maternal, perinatal and neonatal data from Table 1, plus length of stay in NICU). This model is a composition of two sub-models, which are alternatively applied before or after the 30 th day of admission. The choice to split the estimation into a two-step model was based on the empirical different nature of symptoms/features affecting patient survival in the first days, or after a long period in the NICU. Length of stay was incorporated as an independent variable to capture the impact of time on patient outcome.
Since we only had cumulative data from the full NICU stay for patients, in order to develop a model with the capability of expressing time-course evolution, we had to run a data simulation during the development stage. Each patient record was multiplied by the number of days they stayed in the NICU, changing the value of 'days of stay' from one to the actual length of  (14-45 days); intraventricular hemorrhage (0-10 days); periventricular leukomalacia (> 21 days) and bronchopulmonary dysplasia (> 28 days). With this method, we managed to bolster the training set and endow the regression module with a pseudo-evolution of all cases, enabling a time-based dynamic estimation. Validation stage. In this stage, the Kappa index, Accuracy, AUC and Brier score for each model has been calculated. The Accuracy and Kappa index were calculated using the cut-off point value from the development set data, to estimate the discrete outcome (i.e. likely to live or to die). As for the Kappa index (k), strength of concordance between the predicted outcome and the true value has been analyzed with the following criteria: "Poor" k = 0-0.20; "Weak" k = 0.21-0.40; "Moderate" k = 0.41-0.60; "Good" k = 0.61-0.80; "Very Good "k = 0.81-1.00.
In all models, the prognosis was reported in categorical alternatives: "Mild: Very high probability of survival" (mortality rate: 0-20%), "Moderate: High probability of survival" (mortality rate: 21-50%), "Severe: Low probability of survival" (mortality rate: 51-80%), or "Very severe: Very low probability of survival" (mortality rate: 81-100%). For all models, we also provided cut-off points, splitting the score into four intervals using the measured False Negative Rate (FNR) as the metric to define the mortality rate thresholds. The FNR measures the number of dead cases that obtained a score smaller than the threshold, divided by the total number of dead. Since the dataset was highly biased towards non-dead individuals, if we had used the Positive Predictive Value (PPV), which assesses the estimated dead probability for each value, this might have resulted in extremely close-to-zero thresholds. Such low thresholds are noninformative when interpreting results for complex cases during the NICU stay.
The prognostic ability of Model 3 was also tested with a simulation of a "real time scenario" for different lengths of stay in the NICU: Day 1, day 8, day 15, day 31, and day 61. Throughout the validation stage we followed the same protocol applied during the development. We simulated patient profiles at different days (i.e. day 1, day 8, day 15, day 31 and day 61) and eliminated those diagnoses that were incompatible with the length of stay being tested at the time. For example, respiratory distress syndrome would only be included in those scenarios contemplating the evolution from days 0 to 7. In all cases, we previously created subsets of the validation set and applied them to patients with length of stay greater than or equal to the time point under consideration.
Finally, an online calculator to access the results of these models was developed.  Table 2.

Results
The median GA for both periods was 29.1 weeks with a median BW of 1.117 g (IQR: 885-1320), and 1.103 g (IQR: 870-1320) in the development and validation stages, respectively. Hospital mortality was 14.8% in the development stage and 11,6% in the validation stage. Table 3 summarizes "regression equation coefficients" and "odds ratios" of the variables with the greatest impact in mortality in each of the predictive models studied. Values of R 2 Nagelkerke, Hosmer-Lemershow, AUC and Brier score are presented for each model.  Predictive values obtained for cut-off points for the different models, concordance (Kappa index), and accuracy in the validation stage are shown in Table 4.
In the validation stage, models 1 and 2 showed a moderate concordance(Kappa index of 0.447 and 0.475 respectively), while for model 3, it was good (Kappa index of 0.718). Table 4 also includes additional columns that define cut-off points, splitting the score into four intervals for prognostic interpretation. The thresholds have been defined using the false negative ratio (FNR) as the metric and are aimed to group the prognoses in four groups: Very high, high, low and very low probability of survival.
We also analyzed the prognostic ability of Model 3 in a 'real time scenario' in which we evaluated the validation set, imposing a specific value for the days of life. The following values

Discussion
Multivariate prediction models are used to predict diagnoses or certain outcomes for a specific group of patients. When applied to a single patient, these models provide individualized prognostic information. When applied to groups of patients, predictive models allow group stratification for clinical trials, as well as case adjustment for quality improvement projects [3].
An excellent systematic review of mortality predictive models for very preterm infants was published in 2011 [3]. It included 41 studies, with 50 models to be applied either prenatally or at different times during NICU admission. After this publication, at least 8 new predictive models have been described [23][24][25][26][27][28][29][30] in a recent systematic review [31]. The most commonly used mortality predictive models today include CRIB I and II [4,32] [4,6], data collected up to 15 years before publication date [24,26,28,29], or no existing external validation [3]. In Spain, predictive models for preterm infants based on local population data do not exist. Moreover, the two most frequently used models in clinical practice (CRIB and the NICHD-2008 model) have not been validated in very preterm Spanish infants.
The Spanish SEN1500 network has generated population studies on extreme premature patient mortality [10,15,16], the impact on morbidity and mortality at the extreme thresholds of resuscitation (22-25 weeks of gestation) [13,14], and associated morbidity during hospital admission [16,17]. Within the SEN1500 network population, the CRIB score showed a better prediction of death when compared to GA, whereas the prediction of severe intraventricular hemorrhage did not differ according to the predictor used [16].
The possibility of using different groups of patients from the SEN1500 database has been essential for the development of the three predictive models described in this paper. What's more, having a short interlude between the two stages, similar to what was done in the Canadian model [25], enhances the relevance of our study in comparison to the rest of studies mentioned in this discussion, that were developed with older data.
At the moment of regression, the GLM method was preferred given its ability to unify the treatment of numerical and categorical variables into proxy numerical encoding. Although the calculator was designed to offer a more accurate prediction, prognosis is also expressed in four categories; very high, high, low and very low survival probability, with a prediction of death of 0-20%, 21-50%, 51-80%, and 81-100%, respectively. Such categories were designed to help interpret the results and guide the clinician in the communication with the family, avoiding the reliance on a sole numeric value for the prediction. Moreover, it also considers the non-linearity of the actual diagnostic process, and takes into account a margin for doubt when making hard decisions that may not be backed by enough numerical precision. The moderate and severe categories were introduced to inform the doctor when the model lacks certainty to set a clear prediction. Caution should be exercised in this process as it may have severe repercussions for both patients and families.
In model 3, length-of stay was included. We are aware that time has a non-linear influence on mortality, and that we do not have the ideal scenario with time-based information (e.g. on which day a patient acquired a disease, or recovered from it). However, we believe that even a linear approximation is useful to generate dynamic predictions, and might favor continuous data gathering and processing to help improve outcome precision.
In all predictive models of this study, hospital mortality was used exclusively as a dependent variable without including other variables, such as disability or neurological injury, as described in other publications [5,24,28,36]. Our developed models showed good results in the AUC and accuracy (Brier score) in predicting mortality, especially model 3 during the first month of life. In addition, model 3 showed the highest concordance when validated (Kappa = 0.718; Accuracy = 93.38%). Model 2 obtained an AUC higher than CRIB-1.
Some of the variables included in this study (respiratory distress syndrome, intraventricular hemorrhage, necrotizing enterocolitis, etc), have been expressed in two ways; absolute terms (yes/no) and severity (mild/severe). Even though they are not mutually exclusive, they add depth to patient status, improving the final outcome. For example, having a mild form of intraventricular hemorrhage has a different effect on outcome, when compared to a more severe degree. If we include IVH as a 3-category variable (no IVH, mild IVH, or severe IVH), order would be disregarded. Conversely, if it is incorporated as a simple numerical variable, the model grants the same relevance to both the gap from having no IVH up to a mild form of IVH, and the gap from having mild IVH up to severe IVH. Therefore, by combining the 2 systems, we bestow the model with a cumulative effect, thus preserving the freedom to estimate differently the increase of mortality according to the degree of severity.
In model 1 (prenatal), younger GA, non-administration of maternal steroids, BW below 10 th centile, multiple pregnancy, and non-tertiary level of care were the variables with the greatest impact on hospital mortality. In contrast, male gender had a minor impact compared to other studies [5,24]. This is important for regionalized maternal care and maternal transport, as strategic interventions can improve neonatal outcomes.
In model 2 (first 24 hours of life), lower BW, multiple pregnancy, lower Apgar at 5 minutes, maternal hypertension, advanced resuscitation, lower admission temperature, severe respiratory distress syndrome, and the significant variables of model 1 were the variables that had the greatest impact on hospital mortality. Interestingly, our analysis demonstrated a protective effect of maternal hypertensive disorders on the risk of infant mortality. This association has also been identified in a 2016 cohort study published by L. Gemmell et al, in which premature infants born at 24-28 weeks' gestation to mothers with hypertensive disorders had a lower risk of mortality [37]. It has been hypothesized that maternal hypertension is an adaptive response that accelerates organ maturation in a fetus under stress, and could explain the lower mortality rate observed in this patient population. In other predicted models, the time period in which predictions were made included; the moment of birth [19,27], admission to NICU [6] or the first 12 hours of life [4,7]. In our project, we chose the first 24 hours of life, to better encompass the effect of severe respiratory distress syndrome (administration of surfactant and the use of mechanical ventilation), which increases mortality in the same way as the Canadian model [25] that also includes the first 24 hours of life.
In model 3, we chose a double stratified model by days of hospital stay (0-30 days and >30 days) because it had shown greater accuracy when compared to a single model in previous analyses. This fact may derive from a different impact of neonatal variables on mortality in the first weeks of life compared to later on. Apart from days of life, the other included variables in model 3 (0-30 days) are those from model 2 (with the exception of level of care, advanced resuscitation and admission temperature), with the addition of other potentially significant morbidity variables linked to that time period. The aforementioned variables would be; severe pneumothorax, necrotizing enterocolitis, severe invasive infection and intraventricular hemorrhage, especially grades 3-4. After 30 days of life, severe respiratory distress syndrome or HIV become less important as predictors of death, in comparison to the more chronic conditions, such as cystic periventricular leukomalacia, severe anemia and BPD (considered as needing additional oxygen or respiratory support at 36 weeks of corrected age). This double stratified model by days of hospital stay reflects such effects.
In another predictive model published by Ambalavanan N et al [24], different times have been used as predictors of mortality apart from birth (7 and 28 days of life, and 36 weeks of postmenstrual age) [24]. Although FiO 2 was the main predictor for death at all specified postnatal time points in that model, other variables had a leading influence on death in different time periods (for instance, BW and intraventricular haemorrhage at 7 days of age). At 28 days of age, there would be the number of clinical late-onset infections and days of parenteral nutrition. Finally, at 36 weeks postmenstrual age, other variables that predominantly influence death would be the total number of late-onset sepsis episodes, a higher number of days on ventilation, and having enlarged ventricles.
In our study, another cohort of patients, born in 2013-15, was used to validate the three predictive models developed using the preterm population of years 2009-2012. Furthermore, as validation measures, AUC, accuracy and Kappa index were calculated. This was the chosen methodology since we believe that it presents a better alternative to bootstrapping the same population in a later period, improving implementation [38].
The strengths of the developed models would be the use of a large and recent population and having good levels of AUC in the mortality prediction, especially model 3 (during admission), which showed an AUC of 0.977 and a good concordance (Kappa index of 0.718). Additionally, the existence of an online tool allows better implementation in daily clinical practice, as described by other authors [5,24,29].
On the down side, there are some major limitations in this model. First, the absence of family perspectives in the model design, already proposed by some authors [39]. Second, an external validation of this model in other groups of patients has not yet been carried out. Third, all models designed with existing data run the risk of having a self-fulfilling prophecy effect. Finally, there are no publications concerning the use of these types of models by families and clinicians, that we know of.
In conclusion, three predictive mortality models were developed and validated for the Spanish population of very preterm infants. We showed that the use of dynamic models in individual mortality can improve outcome predictions. Validation studies testing our online calculator in different settings and prospective populations are needed in order to confirm its accuracy and generalizability.