Clinical risk assessment in early pregnancy for preeclampsia in nulliparous women: A population based cohort study

Objective To evaluate the capacity of multivariable prediction of preeclampsia during pregnancy, based on detailed routinely collected early pregnancy data in nulliparous women. Design and setting A population-based cohort study of 62 562 pregnancies of nulliparous women with deliveries 2008–13 in the Stockholm-Gotland Counties in Sweden. Methods Maternal social, reproductive and medical history and medical examinations (including mean arterial pressure, proteinuria, hemoglobin and capillary glucose levels) routinely collected at the first visit in antenatal care, constitute the predictive variables. Predictive models for preeclampsia were created by three methods; logistic regression models using 1) pre-specified variables (similar to the Fetal Medicine Foundation model including maternal factors and mean arterial pressure), 2) backward selection starting from the full suite of variables, and 3) a Random forest model using the same candidate variables. The performance of the British National Institute for Health and Care Excellence (NICE) binary risk classification guidelines for preeclampsia was also evaluated. The outcome measures were diagnosis of preeclampsia with delivery <34, <37, and ≥37 weeks’ gestation. Results A total of 2 773 (4.4%) nulliparous women subsequently developed preeclampsia. The pre-specified variables model was superior the other two models, regarding prediction of preeclampsia with delivery <34 and <37 weeks, both with areas under the curve of 0.68, and sensitivity of 30.6% (95% CI 24.5–37.2) and 29.2% (95% CI 25.2–33.4) at a 10% false positive rate, respectively. The performance of these customizable multivariable models at the chosen false positive rate, was significantly better than the binary NICE-guidelines for preeclampsia with delivery <37 and ≥37 weeks’ gestation. Conclusion Multivariable models in early pregnancy had a modest performance, although providing advantages over the NICE-guidelines, in predicting preeclampsia in nulliparous women. Use of a machine learning algorithm (Random forest) did not result in superior prediction.


Introduction
Recent evidence suggests that the risk of the generally more severe preterm preeclampsia (delivery <37 weeks) can be substantially reduced by prophylactic use of aspirin from early pregnancy to a defined high-risk population [1,2]. Delay in the diagnosis of preeclampsia further contributes significantly to maternal morbidity and mortality [3][4][5][6]. Thus, accurate prediction of preeclampsia to enable preventive treatment and optimised surveillance is an urgent priority.
According to the National Institute for Health and Excellence's (NICE) and other current national guidelines clinical early pregnancy decision rules for detection of women at high-risk of developing preeclampsia, are based on maternal and medical history risk factors [7][8][9]. These risk factors are evaluated individually without being incorporated into combined multivariable models, resulting in poor prediction, characterized by low sensitivity and specificity [10][11][12].
Clinical risk prediction models with combined predictor variables, also including medical examinations, have been developed in recent years and has led to improved detection rates [13,14]. The Fetal Medicine Foundation (FMF) has created predictive models using a limited number of maternal factors with addition of mean arterial pressure (MAP) [15]. More complex FMF models include various combinations of biophysical, such as uterine artery Doppler, and biochemical markers, not routinely performed in antenatal care [10]. Since detection rates and cut-off values have shown to vary between populations, depending on differences in healthcare systems, incidence of disease and overfitting of the original model, the performance of these models have to be validated in other populations [16][17][18][19]. It has been emphasized that the cost-effectiveness of these more complex models has to be established before widespread use in clinical practice [14,20]. Evidence further suggests that when using a vast number of clinical predictive variables and MAP in a model for low-risk nulliparous women, uterine artery Doppler does not improve the predictive capacity for preeclampsia [13]. A recent systematic review points out the need for development of predictive models with the optimal combination of simple maternal factors [14] and the predictors included in the FMF model do not comprise all known maternal risk factors for preeclampsia [12].
Nulliparous women have higher risk of preeclampsia and the predictive capacity of both clinical decision rules and multivariable models are better for parous than nulliparous women [11,20,21]. Few predictive models have been designed for nulliparous women [14] and this group would largely benefit from an improved screening. The performance of multivariable maternal factor models with MAP compared to the NICE guidelines risk classification in nulliparous women has to be further explored.
Swedish antenatal care is free of charge and almost all women attend [22]. Information on well-recognised, less established and unknown risk factors for preeclampsia, including almost all of the maternal characteristics variables incorporated in the FMF model and more, are collected at the first visit. Construction of predictive models using this comprehensive range of Swedish maternal health care data has not yet been performed In recent years, advanced prediction techniques including Machine learning methods [23] (e.g., Random forest) [24] have been implemented in medicine [25,26]. These methods use data-driven approaches to select maximized predictive models using objective criteria rather than relying on expert opinion, and no assumptions of linearity or arbitrary cut-points are needed. These approaches also enable consideration of a large number of candidate predictors and complex interactions are possible to handle. To our knowledge, predictive models in early pregnancy for preeclampsia using a machine learning method have not been performed to date.

Study objective
The objective was to create multivariable predictive models using three different methodological approaches (a logistic regression models with pre-specified variables similar to the Fetal Medicine Foundation model including maternal variables and MAP, a backward selection model starting from the full suite of variables, and a Random forest model) and the NICEguidelines, to identify nulliparous women at increased risk of preeclampsia, using detailed routinely collected information from early pregnancy in a Swedish setting.

Setting
Data were derived from the Stockholm-Gotland Obstetric Cohort, a population-based database with information automatically retrieved from the computerized medical record system in the Stockholm-Gotland counties in Sweden. The database contains detailed, prospectively collected demographic, medical, obstetrical and neonatal data from all antenatal, delivery and postnatal care units in the region. Information is routinely entered into the medical records by midwifes or physicians in a standardized way. Approximately one fourth of all 115 000 annual births in Sweden occurs in the seven hospitals in the region.

Study population
Live-born births between January 1 st , 2008 and December 31 st , 2013 were included in the cohort of 149 298 singleton pregnancies. The population was restricted to pregnancies of nulliparous women delivered from gestational week 22. Pregnancies of women without information on gestational length or without notation of blood pressure before 15 weeks' gestation were excluded. The final study population included 62 562 pregnancies (Fig 1). We were also interested in predicting preeclampsia in pregnancies without major anomalies and among women not receiving aspirin, since this can alter the performance of the predictive models. For sensitivity analysis, pregnancies with major fetal malformations or maternal use of aspirin during pregnancy were excluded, giving a restricted population of 58 276 pregnancies (Fig 1).

Data sources
The pregnancies in the Stockholm-Gotland Obstetric Cohort were individually linked using the person-unique national registration numbers with the National Patient Register [27] and the Swedish Prescribed Drug Register [28]. The National Patient Register includes International Classification of Diseases (ICD) diagnoses on inpatient admissions and outpatient visits.
The Swedish Prescribed Drug Register holds data on all prescribed substances, ATC-code (Anatomical Therapeutic Chemical classification) and date of purchase for all dispensed drugs in the outpatient population.

Study variables
Outcome. Diagnosis of preeclampsia was the key variable, classified according to the Swedish version of ICD-10 codes (O140, O141, O149 or O15) by the responsible doctor during pregnancy or at discharge, and was retrieved from the National Patient Register. Preeclampsia was defined as hypertension (blood pressure �140 mmHg and/or diastolic blood pressure �90 mmHg two times with an interval of at least 4 hours), combined with proteinuria (� 0.3 g/24 hours or 2+ on a dipstick testing) occurring after 20 weeks' gestation. In order to fulfill our definition of preeclampsia, there had to be one diagnosis in the inpatient register or two in the outpatient register, where the date of the first diagnosis was used.
Candidate predictors. At the first visit to antenatal care, around gestational week 10, the woman is interviewed about her social, reproductive and medical background, and medical examinations are performed. The routinely collected information from this visit were included in this study as 36 candidate predictors for preeclampsia in the predictive models, presented in Table 1.
Gestational length was determined using the following hierarchy: a) date of embryo transfer, b) early first or early second trimester ultrasound, c) date of last menstrual period, and d) from postnatal assessment. Information on social factors (family situation and country of birth), smoking, snuff and alcohol habits as well as reproductive history (parity, previous miscarriage or ectopic pregnancy, assisted reproduction and infertility duration) are self-reported. Further, the women are interviewed about their medical history (including pre-existing chronic diseases). The definition of diabetes included pre-gestational diabetes type I and II. The collected information is registered in a standardized way either as tick boxes, pre-specified options, or as numbers. Family history of hypertension or preeclampsia is however registered as free text and two dichotomous variables (family history of hypertension and family history of preeclampsia) were constructed.
Maternal BMI (kg/m 2 ) was calculated from self-reported height and measured or selfreported weight. Maternal blood pressure is measured by the midwife in supine position on the right upper arm using manual blood pressure equipment with a cuff size appropriate for arm circumference. Korotkoff V is used for diastolic blood pressure. The first recorded blood pressure <15 weeks was collected. Mean arterial pressure (MAP), defined as: (systolic blood pressure + (2 x diastolic blood pressure))/3, was calculated and used in the predictive models. Capillary blood sampling for plasma glucose and haemoglobin, venous sampling for blood group and urine dipstick test for protein is collected. All the candidate predictors were treated as continuous or categorized as presented in Table 1.
Restricted population. Occurrence of major malformation was defined as any recorded congenital anomaly in the National Patient Register (ICD-10 codes Q00-Q99), excluding minor malformations not reported to the Register of Birth Defects [29]. Use of aspirin during pregnancy was defined as purchased prescription of aspirin during pregnancy in the Swedish Prescribed Drug Register (a prescription is needed for aspirin for the doses indicated during pregnancy).

Statistical methods
Statistical analyses were done with STATA 12 (StataCorp, College Station, TX, USA) for univariable and multivariable regression analyses. For Random Forest analyses, the statistical software package R (version 3.4.4, R Foundation for Statistical Computing, Vienna, Austria) was used. Chi-squared test and two sample t-test were used for comparing the variables in the study population in women who did and did not develop preeclampsia. In order to maximize the predictive power of our predictive models, we used three different multivariable statistical methods: Pre-specified variables model. In this multivariable regression model for nulliparous women we used similar variables as in the FMF maternal factors and MAP model [11]. The included variables in the two models are specified in Table 2. For internal validation, we did a 10-fold cross-validation, using randomly allocated 90% of the data to generate a predictive model, and estimation of the risk of preeclampsia is then applied to the remaining 10% of the sample. This splitting procedure is repeated a large number of times and the performance of the model is then summarized. Backward selection model. To select the best variables for this model for each outcome, we used backward selection on a multivariable logistic regression with an exclusion criterion of p-value more than 0.2. We submitted the 36 candidate predictors described above to this model-selection procedure. For internal validation, a 10-fold cross-validation was used.
Random forest model. We used Random forest [24], a machine learning method [23], which is an ensemble method making use of multiple decision trees. We submitted the same 36 candidate predictors employed in the backward selection procedure in the Random Forest approach. For each tree, a bootstrap sample was drawn, from which the tree was built. In order to get an unbiased estimate of the area under the receiver operating characteristic curve (AUC), the Out-of-Bag samples were used when predicting the probabilities of the outcomes. NICE-guidelines. In addition to the multivariable models described above, we created a risk classification system based on the NICE-guidelines binary (high-risk: yes or no) clinical decision rule. Having a high-risk for preeclampsia according to the NICE-guidelines for nulliparous women in early pregnancy include any of the following risk factors: Chronic kidney disease, systemic lupus erythematosus, antiphospholipid syndrome (not included in our NICE-guidelines model), type 1 or type 2 diabetes, chronic hypertension, age 40 or older, BMI 35 or more at registration, and family history of preeclampsia [9].
Missing values. To increase the power and minimise selection bias we used singlechained imputation with mean values for missing observations for the variables with missing information (Table 1).
AUC. The AUC for the three multivariable methods were calculated using bias corrected bootstrap confidence intervals. The detection rate of preeclampsia at a 10% fixed false positive rate (FPR) was calculated with a 95% confidence interval (CI) using the Clopper-Pearson method.

Results
In the study population of 62 562 nulliparous women, 2 773 (4.4%) developed preeclampsia during pregnancy. In total 216 (0.3%) developed preeclampsia with delivery <34 weeks, 497 (0.8%) developed preeclampsia with delivery <37 weeks and 2 276 (3.6%) developed preeclampsia with delivery �37 weeks, respectively. Aspirin was used by 623 (1.1%) of the women with non-anomalous pregnancies (Fig 1). The social, reproductive and medical background and medical examination variables from the first visit in antenatal care are presented in Table 1, stratified into women who did not develop preeclampsia, women who developed preeclampsia overall and women who developed preeclampsia with delivery <37 weeks. Women who developed preeclampsia overall were significantly (p-value of <0.05) older, had higher BMI, longer infertility duration, more often having assisted reproduction and previous miscarriages compared to women who did not develop preeclampsia. A family history of hypertension or preeclampsia, being born in Africa or in Sweden and chronic diseases were more common among women in the preeclampsia group. Medical examinations at first visit displayed increased capillary glucose levels, rates of proteinuria, haemoglobin levels and MAP among women who developed preeclampsia.

Variables included in the pre-specified model Corresponding variables included in the FMF a -model
The variable used in the backward selection analyses for prediction of preeclampsia with delivery at <34, <37 and �37 weeks with 10-fold cross validation are listed in S1 Table. The receiver operating characteristic (ROC)-curves of the variables ability to predict preeclampsia at <34, <37 and �37 weeks with the three different multivariable methods are given in Fig 2. Fig 2A-2C include the total study population and Fig 2D-2F present the restricted population without pregnancies with major malformations or treatment with aspirin.
The AUC to predict preeclampsia in the total and the restricted populations for the three outcomes are given in Table 3. Based on the variables included, regardless of any of the three multivariable methods used for prediction of preeclampsia at <34, <37 or �37 weeks, the AUC did not reach more than 0.68, indicating uniformly low-to-moderate predictive ability ( Table 3).
The sensitivity at a FPR of 10% for preeclampsia <34 and <37 weeks were superior in the groups of pre-specified variables. For detection of preeclampsia with delivery �37 weeks, the best performing models were the pre-specified variables and the backward selection, compared to the Random forest model (Table 3). When using the binary NICE-guidelines risk classification system for identifying women at risk of preeclampsia in our population, 5.8% of all nulliparous women would be classified as high risk (screen positive). The detection rate for preeclampsia with delivery <34 weeks would be 22.2% (95% CI 16.8-28.4), preeclampsia with delivery <37 weeks 19.5% (95% CI 16.1-23.3) and preeclampsia with delivery �37 weeks 12.2% (95% CI 10.9-13.7), all with a fixed FPR of about 5.5%. In our best performing models with a chosen FPR of 10%, the detection rate is higher for preterm and term preeclampsia, but with an overlapping CI for early onset preeclampsia, compared to the NICE-guidelines.

Main findings
In this population-based cohort study of 62 562 nulliparous women, we found that using routinely collected information on well-known and less established or unknown risk factors from first visit to antenatal care as predictive variables generated a modest predictive capacity for preeclampsia, irrespective of type of multivariable statistical method used.
The prediction of preeclampsia with delivery <34, <37 or �37 weeks with the three different methods was similar with AUC's of 0.58-0.68. The sensitivities at a fixed 10% FPR varied between 18.5-30.6%. The performance of the customizable multivariable risk prediction approach at the FPR of 10% was however significantly better than using the binary NICEguidelines for preeclampsia with delivery <37 weeks' and �37 weeks.

Strengths and limitations
Given the nature of predictive research, this comprehensive set of fine-grained prospectively routinely collected variables, with generally a minimal level of missing values, collected on a Preeclampsia prediction in nulliparous women large population represents a distinct strength, increasing the likelihood of accurate prediction [30]. The use of a database with automatically retrieved information from the computerized medical record system with standardized data registration, reduced erroneous data entry and transcription errors. There is no consensus regarding the best method for selection of variables for a predictive model [31]. The large number of preeclampsia cases enabled use of the full suite of the 36 variables in the predictive models giving more than ten events per candidate predictor for preeclampsia with delivery <37 and �37 weeks, and six events for preeclampsia with delivery <34 weeks. Too few events per candidate predictor has been described as a limitation in risk models for preeclampsia [32]. We internally validated the performance of the three multivariable predictive methods with 10-fold cross validation for the two logistic regression analyses and bootstrap in the Random forest analyses, preventing over-fitting observed in single-tree approaches. No external validation was carried out in the scope of our study, however, our inclusion criteria were precisely defined, facilitating future external validations.
Despite the richness of our data resources, data quality issues are nonetheless a concern. Potential misclassification of the mainly self-reported variables in the interview with the midwife could have occurred, but since the predictive models were based on routinely collected information, this would probably reflect the outcome of the models in the clinical setting. Since this is a retrospective study we could not influence the procedure of blood pressure examinations, where a potential misclassification bias by measurement errors could have been introduced. However, midwives have guidelines for conducting blood pressure examinations and a differential misclassification seems implausible.
Using the ICD-10 diagnosis instead of data from medical records to determine the diagnosis of preeclampsia could have introduced misclassification bias of the outcome. In order to improve the accuracy of the diagnosis, one diagnosis in the inpatient or two diagnoses in the outpatient register was required. The Swedish version of ICD-10 diagnoses, still define preeclampsia with mandatory proteinuria, which is less sensitive but more specific compared to current international recommendations of the diagnosis [11,20]. With few exceptions, previous predictive models have used or done sensitivity analyses with the same definition [13,14]. Overall rates of preeclampsia in nulliparous women in our study were consistent with previous populations from western countries [13,33].
The effect of aspirin treatment during pregnancy has not been taken into account in the creation of previous predictive models for preeclampsia [11,14]. When adjusting for the assumed effect of aspirin (i.e. the reduced risk of preterm preeclampsia in high risk women) in one study, the detection rates for preeclampsia were slightly, but not significantly reduced [10]. To address this in our study, we made a sensitivity analysis in a restricted cohort excluding pregnancies with use of aspirin and also major malformations, without significant alterations of the model's predictive capacity. This could possibly reflect the poor selective performance of women for use of aspirin in current Swedish clinical practice.

Interpretation
In univariate analyses we found that well established risk factors for preeclampsia, such as increasing BMI and maternal age, assisted reproduction, country of birth, chronic hypertension, pre-exciting diabetes, chronic kidney disease and family history of preeclampsia were individually associated with development of preeclampsia [12,20,34,35]. Compared to women without preeclampsia, women who developed preeclampsia had higher MAP and haemoglobin level in early pregnancy in accordance with previous knowledge [13,[36][37][38][39].
Prior studies, using the FMF-model with maternal factors and MAP for mixed parities at a 10% FPR have demonstrated a sensitivity of 47-59% and 37-43% for preterm and term preeclampsia, respectively [10,15,21,40]. Nulliparous women have higher risk of preeclampsia, and no marker of individual risk based on previous obstetric history [20,41]. The predictive capacity of both clinical guidelines and multivariable models are better for parous than nulliparous women, and there are only a few predictive studies on nulliparous women separately, making the evidence gap more pressing for this group [11]. Previous predictive studies on nulliparous women, using different inclusion criteria and combinations of maternal factors, demonstrates an AUC of 0.71 and sensitivity of 31-37% with a FPR of 10-11.5% for preeclampsia overall [11,13]. For preeclampsia with delivery <37 weeks, an AUC of 0.76 and sensitivity of 34-36% at a FPR of 5-11.5% has been reported, indicating better prediction than in our study [11,42].
Defining predictive performance according to how preeclampsia prediction is used in practice by a clinician, we argue that using a customizable multivariable risk-prediction approach is superior to the binary NICE classification system, which here results in a fixed false-positive rate of 5.5%. One advantage of using a multivariable risk-prediction framework to address this question is that it more flexible to investigator/clinician choices about optimal sensitivity/specificity. The NICE-guidelines' predictive capacity for preeclampsia in our cohort of nulliparous women was inferior compared to our multivariable models tested at a FPR of 10%, in accordance with previous knowledge, but also inferior compared to the NICE-guidelines performance in other populations [11,13] The generally lower sensitivities in our study compared to other nulliparous populations could be due to several factors: differences in demographic factors, the population-based routinely collected material in our study compared to a prospective design with volunteer enrolment, and different methodological approaches. Further, internal validation with bootstrapping and 10-fold cross validation used in our study generally tend to reduce the risk of overestimating the predictive capacity of a predictive model [19,43]. The lower detection rate could also reflect a need for validation of maternal health care data and possibly improved collection of data.
First trimester screening studies for the risk of preeclampsia in different clinical settings is emphasised by the International Society for the Study of Hypertension in pregnancy (ISSHP) [20] and to our knowledge, no previous multivariable predictive study has been evolved or evaluated in a Swedish setting.
Machine learning methods [23] as Random forest [24] have been demonstrated to yield accurate prediction in some biomedical applications [25], but in this study resulted in prediction that was less accurate than the logistic regression models. There are many potential explanations for this finding e.g., overreliance on data-based considerations, over-fitting, too many features unrelated to the outcome that swamp the true signals [44].

Conclusion
The capacity of our multivariable predictive models for preeclampsia with delivery <34, <37 and �37 weeks in nulliparous women was overall modest, but it is worth noting that these models have advantages as compared to the binary NICE-guidelines risk classification. The logistic regression models performed better than models using Random forest, indicating that future work on this topic should continue to incorporate clinical expertise as well as newer prediction approaches.
The models to predict preeclampsia reported here provide a step towards a personalised risk score for preeclampsia in nulliparous women, a group that would largely benefit from an improved screening and for whom only a few predictive models have been designed [14].
To improve overall accuracy and detection of cases, the variables for the clinical model has to be validated and potentially with the addition of biochemical or biophysical markers.
Supporting information S1 Table. Variables used in the backward selection models for prediction of preeclampsia with delivery at <34, <37 and �37 weeks' gestation with 10-fold cross validation. (DOCX)