Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Prediction of preterm birth in nulliparous women using logistic regression and machine learning

  • Reza Arabi Belaghi,

    Roles Formal analysis, Methodology, Software, Writing – original draft

    Affiliations Department of Obstetrics and Gynecology, McMaster University, Hamilton, Ontario, Canada, Department of Statistics, University of Tabriz, Tabriz, Iran

  • Joseph Beyene,

    Roles Supervision, Writing – review & editing

    Affiliations Department of Health Research Methods, Evidence & Impact, McMaster University, Hamilton, Ontario, Canada, Department of Mathematics and Statistics, McMaster University, Hamilton, Ontario, Canada

  • Sarah D. McDonald

    Roles Supervision, Writing – review & editing

    mcdonals@mcmaster.ca

    Affiliations Department of Obstetrics and Gynecology, McMaster University, Hamilton, Ontario, Canada, Department of Health Research Methods, Evidence & Impact, McMaster University, Hamilton, Ontario, Canada, Department of Obstetrics and Gynecology (Division of Maternal-Fetal Medicine), McMaster University, Hamilton, Ontario, Canada, Department of Radiology, McMaster University, Hamilton, Ontario, Canada

Prediction of preterm birth in nulliparous women using logistic regression and machine learning

  • Reza Arabi Belaghi, 
  • Joseph Beyene, 
  • Sarah D. McDonald
PLOS
x

Abstract

Objective

To predict preterm birth in nulliparous women using logistic regression and machine learning.

Design

Population-based retrospective cohort.

Participants

Nulliparous women (N = 112,963) with a singleton gestation who gave birth between 20–42 weeks gestation in Ontario hospitals from April 1, 2012 to March 31, 2014.

Methods

We used data during the first and second trimesters to build logistic regression and machine learning models in a “training” sample to predict overall and spontaneous preterm birth. We assessed model performance using various measures of accuracy including sensitivity, specificity, positive predictive value, negative predictive value, and area under the receiver operating characteristic curve (AUC) in an independent “validation” sample.

Results

During the first trimester, logistic regression identified 13 variables associated with preterm birth, of which the strongest predictors were diabetes (Type I: adjusted odds ratio (AOR): 4.21; 95% confidence interval (CI): 3.23–5.42; Type II: AOR: 2.68; 95% CI: 2.05–3.46) and abnormal pregnancy-associated plasma protein A concentration (AOR: 2.04; 95% CI: 1.80–2.30). During the first trimester, the maximum AUC was 60% (95% CI: 58–62%) with artificial neural networks in the validation sample. During the second trimester, 17 variables were significantly associated with preterm birth, among which complications during pregnancy had the highest AOR (13.03; 95% CI: 12.21–13.90). During the second trimester, the AUC increased to 65% (95% CI: 63–66%) with artificial neural networks in the validation sample. Including complications during the pregnancy yielded an AUC of 80% (95% CI: 79–81%) with artificial neural networks. All models yielded 94–97% negative predictive values for spontaneous PTB during the first and second trimesters.

Conclusion

Although artificial neural networks provided slightly higher AUC than logistic regression, prediction of preterm birth in the first trimester remained elusive. However, including data from the second trimester improved prediction to a moderate level by both logistic regression and machine learning approaches.

Introduction

Preterm birth (PTB), birth before 37 weeks, is the leading cause of neonatal death and disability [1]. Approximately, 50% of all perinatal deaths are caused by PTB [2]. In the U.S., almost 10% of babies are born preterm [3], costing the healthcare system at least $26 billion yearly [4]. In Canada, PTB comprises 8% of all births and results in direct costs of $580 million annually [5]. Risk factors for PTB are heterogeneous and include previous PTB, race, age, nulliparity, urinary tract infection, smoking, and bleeding during early pregnancy [68]. Prediction of PTB would facilitate the use of therapeutic interventions to reduce infant morbidity and mortality, thereby benefitting families, society, and the healthcare system.

Previous studies have found the prediction of PTB to be challenging, whether by logistic regression or machine learning. The area under the receiver operating characteristic curve (AUC) for prediction of PTB in previous studies ranged from 62% to 72% depending on the number of predictors and study design [915]. The predictive power of the machine learning model developed by Fergus et al. [16] was promising (AUC, 95%), but measuring uterine electrical signals (electrohysterography) is not practical on a large scale. Another drawback was the synthetic oversampling of the whole dataset, rather than just the training dataset, thereby calling into question the 95% AUC of that work.

Machine learning is a computer programming approach whereby computers learn from “big data” to make better predictions [17]. In 2019, machine learning was identified as one of the most advanced tools for prenatal diagnosis [18]. Morover, machine learning has been broadly applied in medicine, from cancer detection [19, 20] to prediction of cardiovascular diseases [21], among others. In this study, we considered some of state-of-the-art machine learning methods, including decision trees, random forests, and artificial neural networks, that are frequently used in medicine to develop prediction models [2128]. We also considered logistic regression as a traditional statistical approach to develop prediction models [29]. Unlike logistic regression, machine learning approaches are free of statistical assumptions (such as linearity and uncorrelated predictors) and can handle complex interactions between predictive factors without these interactions being explicitly specified [27, 30].

We aimed to overcome the challenges of predicting PTB, especially for nulliparous women, by evaluating logistic regression and multiple machine learning algorithms. To this end, we considered variables available in clinical care, including some not previously assessed in other studies. Our study aimed to: 1) identify important predictors associated with PTB during the first and second trimester in nulliparous women from a large population cohort; and 2) construct models to predict PTB based on logistic regression and robust machine learning algorithms.

Methods and materials

Data and population

Ontario comprises 40% of the Canadian population and has approximately 140,000 births each year [31]. We performed a population-based retrospective cohort study using Ontario’s Better Outcomes Registry and Network (BORN) database, which includes a wide range of maternal, antenatal, and birth data [32]. We included all nulliparous women with singleton births who gave birth between 20 and 42 weeks gestation in an Ontario hospital between April 1, 2012 and March 31, 2014.

Outcome.

PTB was the primary outcome variable in this study, defined as gestational age at birth (from ultrasound estimation or calculation from the first day of the last menstrual period) <37 weeks. We also considered spontaneous PTB as a secondary outcome. Spontaneous PTB was identified using the definition of Maghsouldu et al. [33], i.e.: not “induced”, not “caesarean section” and not “augmented labor”.

Predictors.

We considered predictors based on our literature review of PTB risk factors during the first and second trimesters [7, 34]. We considered socio-demographic variables including maternal age, height, pre-pregnancy body mass index (BMI), gestational weight gain during the first trimester, income, education, race, and immigration status. Further, we included the number of previous abortions (which includes miscarriages), conception type, smoking status, alcohol consumption, folic acid use, pre-existing medical health conditions, diabetes, pre-existing mental health conditions (such as anxiety, depression, and addiction) and antenatal health care provider type.

Pregnancy-associated plasma protein A and free beta-subunit of human chorionic gonadotropin were measured during the first trimester as part of the screen for Down syndrome [30], but we considered them as potential markers of placental and preeclamptic diseases [35]. We also included ultrasound measurement of nuchal translucency as another predictor [36]. For the second-trimester models, we included all of the predictors from the first trimester plus information that became available during the second trimester including dimeric inhibin A, unconjugated estriol, human chorionic gonadotropin, alpha-fetoprotein concentration, hypertensive disorders of pregnancy, gestational diabetes, infections, medication exposure, sex of the fetus, and complications during pregnancy [37].

We grouped maternal height into four categories, including <150 cm, 150 cm—169 cm, 160 cm—169 cm, and ≥170 cm. We classified pre-pregnancy BMI as underweight (<18.5 kg/m2), normal weight (18.5–24.9 kg/m2), overweight (25–29.9 kg/m2), and obese (≥30 kg/m2), according to World Health Organization criteria [38, 39]. We used the Institute of Medicine guidelines [40] to categorize gestational weight gain into three groups, including recommended weight gain, less than recommended weight gain, and more than recommended weight gain. For income, education, race, and immigration status, we used neighbourhood income quartiles, neighbourhood education quartiles, neighbourhood immigrant concentration, and neighbourhood minority quartiles, respectively (see S1 Table for the definition of these variables).

We categorized the number of previous abortions (including spontaneous and therapeutic abortions) into four groups based on Oliver et al. [41], including 0, 1, 2, and 3+. We grouped the pre-existing health conditions variable in the BORN database into “Yes” or “No” since that variable had more than 1000 possible entries (S2 Table). We treated pre-existing mental health conditions (S3 Table) as a binary categorical variable. We classified the conception type into: spontaneous, in vitro fertilization (IVF, or a combination of IVF and other methods), and other methods (such as Surrogate, Intrauterine insemination alone, or unknown) [42].

We classified protein concentrations (pregnancy-associated plasma protein A, free beta-subunit of human chorionic gonadotropin, dimeric inhibin A, unconjugated estriol, human chorionic gonadotropin, and alpha-fetoprotein) and nuchal translucency as normal, abnormal, and missing (cut-off values shown in S4 Table). The variable “complications during pregnancy” had more than 600 categories, and we therefore categorized data for this variable into three groups based on maternal-fetal expertise (SDM) as follows: no complications, mild-moderate complications, and severe complications [37].

Statistical analysis

We used the Chi-square test and univariate logistic regression to measure associations between predictors and PTB. We assessed statistical significance using 2-sided p-values, with a p-value <0.05 considered statistically significant. We then proceed with variable selection using stepwise multivariable logistic regression based on the Akaike Information Criterion (AIC). We also utilized the Boruta algorithm to select important variables for the machine learning models [43]. In short, Boruta is based on the random forest machine learning method, which selects relevant variables that significantly impact the prediction power of the model [43].

We followed the guidelines for the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis [44] for establishing prediction models. Based on these guidelines, we selected 2/3 of the data as the training set and the remaining 1/3 of the data as the test (validation) set. We balanced the training samples using a random over-sampling technique [45]. We then used ten-fold cross-validation to establish machine learning models. Finally, we used the test data to evaluate the performance of the proposed prediction models by comparing the sensitivity, specificity, positive predictive values, negative predictive values, and AUC. We performed all machine learning computations in R software using the caret package [46].

We applied multiple imputation with 10 imputations [4749] to replace missing observations on the predictors. However, for plasma proteins and nuchal translucency, missing data were treated as a new category since a large proportion of women chose not to enroll in screening for Down syndrome. We also treated gestational weight gain during the first trimester in a similar manner, since the lack of recording of weight gain may reflect less than optimal care. The Hamilton Integrated Research Ethics Board approved the study before study commencement (approval #: 14-714-C).

Results

Study participants and univariate analysis

Of 112,963 nulliparous women with singleton pregnancies, PTB occurred in 6,955 (6.2%, Table 1). Out of all PTBs, there were 3,695 (53%) spontaneous PTBs. Approximately 5% of patients were younger than 20 years of age, while 13% were over age 35 years. Approximately 2% of patients had three or more previous abortions including miscarriages. More than 50% of patients had a non-ideal pre-pregnancy BMI, of which 17.34% and 12.58% were overweight and obese, respectively. Approximately 17% of the cohort had at least one pre-existing medical condition. Only 78.67% of the patients had a documented first-trimester appointment.

thumbnail
Table 1. Distribution of maternal baseline characteristics, demographics, and clinical variables in nulliparous women.

https://doi.org/10.1371/journal.pone.0252025.t001

During the first trimester, we examined 23 predictors (Table 2). Women who were under 25 years of age, shorter in stature (<160 cm), had pre-pregnancy obesity, conceived with IVF, had prior medical conditions including diabetes, and those with low pregnancy-associated plasma protein A concentrations were more likely than women without these conditions to experience PTB. During the second trimester, we examined 35 predictors of PTB. Women who were over 29 years of age, had abnormal concentrations of the assessed proteins, diabetes, hypertensive disorders of pregnancy, women carrying male fetuses, and those with pregnancy complications were more likely than women without these conditions to experience PTB (Table 3).

thumbnail
Table 2. Univariate analyses of associations between each predictor and preterm birth during the first trimester in nulliparous women.

https://doi.org/10.1371/journal.pone.0252025.t002

thumbnail
Table 3. Univariate analyses of associations between each predictor and preterm birth during the second trimester in nulliparous women.

https://doi.org/10.1371/journal.pone.0252025.t003

Multivariable analysis.

Stepwise logistic regression identified 13 significant predictors during the first trimester (Fig 1). Diabetes (Type I: adjusted odds ratio (AOR): 4.21; 95% confidence interval (CI): 3.23–5.42; Type II: AOR: 2.68; 95% CI: 2.05–3.46) and abnormal pregnancy-associated plasma protein A concentrations (AOR: 2.04; 95% CI: 1.80–2.30) were the most significant predictors of PTB. The following factors were also associated with an increased risk of PTB: pregnancies conceived through IVF, being obese or underweight, maternal drug (substance) use, lower neighbourhood education level, lower neighbourhood immigration level, low maternal height, diabetes, and other pre-existing medical or mental health conditions.

thumbnail
Fig 1. Selected variables and adjusted odds ratios during the first trimester for prediction of preterm birth in nulliparous women.

BMI: Body mass index; IVF: In vitro fertilization; Ref: Reference group; Pre-existing maternal health conditions shown in S2 Table. Pre-existing mental health conditions shown in S3 Table. Number of previous abortions: includes the number of miscarriages.

https://doi.org/10.1371/journal.pone.0252025.g001

During the second trimester, we identified 17 significant predictors related to PTB (Fig 2) using stepwise logistic regression. Many of the selected variables were the same as those selected for the first-trimester model, with slight changes in the odds ratios. Furthermore, severe complications of pregnancy were strongly associated with PTB (AOR: 13.03; 95% CI: 12.21–13.90). Women with abnormal alpha-fetoprotein, those carrying a male fetus, and those who did not attend prenatal classes were at increased odds of PTB. Exposure to medication during pregnancy, including vitamins and herbal supplements, was associated with a decreased risk of PTB.

thumbnail
Fig 2. Selected variables and odds ratios during the second trimester for prediction of preterm birth in nulliparous women.

BMI: Body mass index; IVF: In vitro fertilization; Ref: Reference group; Pre-existing maternal health conditions shown in S2 Table. Pre-existing mental health conditions shown in S3 Table. Number of previous abortions: includes the number of miscarriages.

https://doi.org/10.1371/journal.pone.0252025.g002

Machine learning (Boruta) identified 17 and 27 important predictors of PTB during the first and second trimesters, respectively (S5 and S6 Tables). Unlike with logistic regression, machine learning models selected previous abortions (including miscarriages) as the most important predictor of PTB during the first trimester (importance: 28.23 for previous abortions (including miscarriages) vs. 7.79 for diabetes). During the second trimester, complications during pregnancy and hypertensive disorders were the most important predictors of PTB.

Prediction models and performance measures in the training and validation samples.

In the training sample, we found that random forests had a higher AUC than other models (99%), including logistic regression, which had the third highest AUC (S7 Table). We evaluated the proposed prediction models in the testing sample and found that during the first trimester the AUCs ranged from 53% (random forests) to 60% (artificial neural networks, Fig 3 and Table 4). However, all models had very high negative predictive values of ~95%. During the second trimester, artificial neural networks had the highest sensitivity of 63% (95% CI: 61–65%, Fig 3 and Table 4), but slightly lower specificity and positive predictive value than logistic regression. Random forests exhibited the lowest sensitivity among the models; however, the positive predictive value of the random forests model was the highest, but still relatively low at 36%.

thumbnail
Fig 3. Comparison of prediction models during the first and second trimester for preterm birth in nulliparous women.

https://doi.org/10.1371/journal.pone.0252025.g003

thumbnail
Table 4. Predictive power of preterm birth models during the first and second trimesters in nulliparous women.

https://doi.org/10.1371/journal.pone.0252025.t004

Overall, there was an increase in the AUC from the first trimester to the second trimester in logistic regression and artificial neural networks (60% vs. 80%). The notable improvement of the AUC to 80% with artificial neural networks and logistic regression was due to the addition of complications during pregnancy (S1 and S3 Figs). All models provided negative predictive value of ~97% during the second trimester. In a sensitivity analysis, we compared the predictive power of all models without complications during pregnancy, and found that the AUC ranged from 58% (decision trees) to 65% (artificial neural networks, S1 Fig).

Prediction of spontaneous PTB

For models predicting spontaneous PTB, during the first trimester the AUC ranged from 55% (random forests) to 59% (logistic regression, S2 Fig). During the second trimester, AUC ranged from 58% (decision trees) to 64% (logistic regression, S3 Fig). Both machine learning and logistic regression generated negative predictive values of approximately 94% for spontaneous PTB during the first and second trimesters (S8 Table). We emphasize that pregnancy complications, hypertensive disorder, and other medically induced PTB were not included in these analyses.

Discussion

We used population-based data to predict PTB in nulliparous women using logistic regression and machine learning approaches during the first and second trimesters. We found that diabetes mellitus, a history of spontaneous or therapeutic abortions, and abnormal pregnancy-associated plasma protein A concentrations were the strongest predictors for PTB during the first trimester. Thirteen selected predictors yielded a maximum AUC of 60% with artificial neural networks, thus providing poor prediction of PTB during the first trimester, even using machine learning approaches. During the second trimester, 17 variables were significantly associated with PTB, among which complications during pregnancy had the highest AOR (13.03; 95% CI: 12.21–13.9). During the second trimester, the AUC increased from 65% (95% CI: 63–66%) to 80% (95% CI: 79–81%) with the inclusion of complications during pregnancy, which is a moderate predictor [50] of PTB.

Machine learning identified more variables associated with PTB than logistic regression in our data set. During the first trimester, machine learning identified previous abortions (which includes miscarriages) as the strongest predictor of PTB, while logistic regression identified diabetes as the strongest predictor. A history of prior abortions (including miscarriages) may be a more important predictor of PTB because the incidence of prior abortions was substantially higher than that of diabetes.

We found that conventional logistic regression and machine learning had comparable performance for prediction of PTB. Other studies comparing machine learning methods to conventional logistic regression for the prediction of a variety of clinical conditions showed that in general, no single method consistently provided the best prediction [5158]. Although logistic regression is a frequently used method, it requires linearity and independence between the predictors. Conversely, machine learning is a non-parametric approach that can handle complex and non-linear models.

There was a significant decrease in the AUC between the training and the testing data, possibly due to the overfitting problem of machine learning methods [54]. Specifically, random forests are “greedy”, and thus, try to minimize the error in the training sample, which may cause overfitting (high performance in training but lower performance in the validation sample, as we observed in our models) [30].

Accurate prediction of PTB in nulliparous women has been lacking. Woolery and Grzymala [55] found machine learning had 53–88% accuracy in predicting PTB. Using data mining methods, Goodwin et al. found that seven demographic variables produced an AUC of 72% [10]. In contrast, Grobman et al. [12] found that logistic regression provided poor performance (AUC, 63%) for prediction of PTB in nulliparous women with a short cervix. Catley et al. [15] explored artificial neural networks for the prediction of PTB in high-risk pregnant women and found model sensitivity of 20% before 22 weeks of gestation. Weber et al. [13] recently applied machine learning to predict early (<32 weeks) spontaneous PTB among nulliparous women and found an AUC of only 63–65%, similar to Courtney et al. [56] (AUC, 60%) using logistic regression and a support vector machine approach.

Strengths of the study

Our study had several strengths. Firstly, our models generated high negative predictive values, higher than fetal fibronectin for spontaneous PTB [57], and thus may lead to reduction in unnecessary resource use [58]. Secondly, we considered a wide range of variables available in standard clinical care databases (e.g., proteins for screening for Down syndrome or placental diseases, gestational weight gain) that were not considered in previous studies. Another strength of the current work is the consideration of different time points (first and second trimesters) for the prediction of PTB. In addition, we evaluated a relatively large cohort, particularly compared to many of the previous studies [814]. We considered multiple methods for variable selection and prediction to maximize accuracy. We addressed several limitations of previous studies in this area: Courtney et al. [56] found that logistic regression and machine learning models based on demographic data were not able to predict PTB adequately (AUC, 60%). Those authors suggested that prenatal demographic factors such as maternal health behaviors and medical history could be used to construct accurate models, and thus, we included such factors in our study. By performing a large cohort study, we also addressed the “lack of data” problem identified in the work of Lee et al. [11]. We applied multiple imputation (repeated ten times), which is a robust technique for handling missing data [48]. Unlike Fergue et al. [16], we used random oversampling in the training set only, thus the AUC from our models was generated from clinical data and not artificial samples.

Limitations

Our study also has several limitations, including the low predictive power of the proposed models, particularly during the first trimester. The predictive ability of all models strongly depends on the predictor variables [30]. Although we had a large number of variables and a relatively large number of subjects, one of the limitations of our prediction models was the lack of information on the interventions used for pregnancies at high risk of PTB. However, data suggest relatively low rates of use of such preventive measures in our study population [59]. We categorized PTB as <37 or ≥37 weeks of gestation, which may lead to loss of statistical power [60]. Further, binary categorization collapses all types of PTB in one group despite different rates of neonatal mortality and morbidity for each category of PTB [61] and despite potentially different predictors of extremely PTB compared to PTB overall. Although low pregnancy-associated plasma protein A concentraion is associated with trisomies which themselves are associated with preterm birth, the majority of such cases are in euploid pregnancies [6266]. Finally, we were unable to examine ultrasonographic measurement of the uterine cervix, which is a strong predictor of PTB [67] as it is not available in the BORN database.

Conclusion

Including data from the second trimester improved prediction power to a moderate level of 80% AUC by both logistic regression and machine learning. However, developing an accurate prediction model during the first trimester will require further investigation. Inclusion of data from additional biomarkers may increase prediction accuracy.

Supporting information

S1 Fig. Receiver operating characteristic curves for second-trimester prediction models without the “complications during pregnancy” variable in the validation sample.

https://doi.org/10.1371/journal.pone.0252025.s001

(DOCX)

S2 Fig. Receiver operating characteristic curves for first-trimester prediction models for spontaneous preterm birth in the validation sample.

https://doi.org/10.1371/journal.pone.0252025.s002

(DOCX)

S3 Fig. Receiver operating characteristic curves for second-trimester prediction models for spontaneous preterm birth in the validation sample.

https://doi.org/10.1371/journal.pone.0252025.s003

(DOCX)

S1 Table. Definitions of neighbourhood income, immigration, education, and minority quartiles.

https://doi.org/10.1371/journal.pone.0252025.s004

(DOCX)

S2 Table. Pre-existing maternal health conditions.

https://doi.org/10.1371/journal.pone.0252025.s005

(DOCX)

S3 Table. Pre-existing mental health conditions.

https://doi.org/10.1371/journal.pone.0252025.s006

(DOCX)

S4 Table. Cut-off points for nuchal translucency and protein concentrations.

https://doi.org/10.1371/journal.pone.0252025.s007

(DOCX)

S5 Table. Variables selected by the machine learning algorithm for prediction of preterm birth during the first trimester in nulliparous women.

https://doi.org/10.1371/journal.pone.0252025.s008

(DOCX)

S6 Table. Variables selected by the machine learning algorithm for prediction of preterm birth during the second trimester in nulliparous women.

https://doi.org/10.1371/journal.pone.0252025.s009

(DOCX)

S7 Table. Optimal hyperparameters, sensitivity, specificity, and area under the receiver operating characteristic curve in training samples.

https://doi.org/10.1371/journal.pone.0252025.s010

(DOCX)

S8 Table. Predictive power of spontaneous preterm birth models during the first and second trimesters in the testing data.

https://doi.org/10.1371/journal.pone.0252025.s011

(DOCX)

Acknowledgments

We greatly appreciate the assistance of our Associate Editor and two anonymous referees for careful reading and valuable suggestions on our manuscript that significantly improved the presentation of the paper.

References

  1. 1. Saigal S, Doyle LW. An overview of mortality and sequelae of preterm birth from infancy to adulthood. Lancet Lond Engl. 2008 Jan 19;371(9608):261–9.
  2. 2. Greenough A. Long term respiratory outcomes of very premature birth (<32 weeks). Semin Fetal Neonatal Med. 2012 Apr;17(2):73–6. pmid:22300711
  3. 3. The impact of premature birth on society [Internet]. [cited 2020 Jan 22]. Available from: https://www.marchofdimes.org/mission/the-economic-and-societal-costs.aspx
  4. 4. Russell RB, Green NS, Steiner CA, Meikle S, Howse JL, Poschman K, et al. Cost of hospitalization for preterm and low birth weight infants in the United States. Pediatrics. 2007 Jul;120(1):e1–9. pmid:17606536
  5. 5. Shah PS, McDonald SD, Barrett J, Synnes A, Robson K, Foster J, et al. The Canadian Preterm Birth Network: a study protocol for improving outcomes for preterm infants and their families. CMAJ Open. 2018 Jan 18;6(1):E44–9. pmid:29348260
  6. 6. Goldenberg RL, Culhane JF, Iams JD, Romero R. Epidemiology and causes of preterm birth. The Lancet. 2008 Jan 5;371(9606):75–84. pmid:18177778
  7. 7. Ferrero DM, Larson J, Jacobsson B, Di Renzo GC, Norman JE, Martin JN, et al. Cross-Country Individual Participant Analysis of 4.1 Million Singleton Births in 5 Countries with Very High Human Development Index Confirms Known Associations but Provides No Biologic Explanation for 2/3 of All Preterm Births. PloS One. 2016;11(9):e0162506. pmid:27622562
  8. 8. Martin J, D’Alton M, Jacobsson B, Norman J. In Pursuit of Progress Toward Effective Preterm Birth Reduction. Obstet Gynecol. 2017 Apr 1;129(4):715–9. pmid:28277357
  9. 9. Woolery LK, Grzymala-Busse J. Machine learning for an expert system to predict preterm birth risk. J Am Med Inform Assoc. 1994;1(6):439–46. pmid:7850569
  10. 10. Goodwin LK, Iannacchione MA, Hammond WE, Crockett P, Maher S, Schlitz K. Data mining methods find demographic predictors of preterm birth. Nurs Res. 2001 Dec;50(6):340–5. pmid:11725935
  11. 11. Lee KA, Chang MH, Park M-H, Park H, Ha EH, Park EA, et al. A model for prediction of spontaneous preterm birth in asymptomatic women. J Womens Health 2002. 2011 Dec;20(12):1825–31.
  12. 12. Grobman WA, Lai Y, Iams JD, Reddy UM, Mercer BM, Saade G, et al. Prediction of Spontaneous Preterm Birth Among Nulliparous Women With a Short Cervix. J Ultrasound Med. 2016 Jun;35(6):1293–7. pmid:27151903
  13. 13. Weber A, Darmstadt GL, Gruber S, Foeller ME, Carmichael SL, Stevenson DK, et al. Application of machine-learning to predict early spontaneous preterm birth among nulliparous non-Hispanic black and white women. Ann Epidemiol. 2018;28(11):783-789.e1. pmid:30236415
  14. 14. Vovsha I, Salleb-Aouissi A, Raja A, Koch T, Rybchuk A, Radeva A, et al. Using Kernel Methods and Model Selection for Prediction of Preterm Birth. In: Machine Learning for Healthcare Conference [Internet]. 2016 [cited 2019 Jan 21]. p. 55–72. Available from: http://proceedings.mlr.press/v56/Vovsha16.html
  15. 15. Catley C, Frize M, Walker RC, Petriu DC. Predicting High-Risk Preterm Birth Using Artificial Neural Networks. IEEE Trans Inf Technol Biomed. 2006 Jul;10(3):540–9. pmid:16871723
  16. 16. Fergus P, Cheung P, Hussain A, Al-Jumeily D, Dobbins C, Iram S. Prediction of Preterm Deliveries from EHG Signals Using Machine Learning. PLOS ONE. 2013 Oct 28;8(10):e77154. pmid:24204760
  17. 17. Ethem Alpaydın. Introduction to Machine Learning, Third Edition [Internet]. The MIT Press. 2016 [cited 2019 Jan 21]. Available from: https://mitpress.mit.edu/books/introduction-machine-learning-third-edition
  18. 18. Chitty LS, Hui L, Ghidini A, Levy B, Deprest J, Mieghem TV, et al. In case you missed it: The Prenatal Diagnosis editors bring you the most significant advances of 2019. Prenat Diagn. 2020;40(3):287–93. pmid:31875323
  19. 19. Deo RC. Machine Learning in Medicine. Circulation. 2015 Nov 17;132(20):1920–30. pmid:26572668
  20. 20. McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020 Jan;577(7788):89–94. pmid:31894144
  21. 21. Al’Aref SJ, Anchouche K, Singh G, Slomka PJ, Kolli KK, Kumar A, et al. Clinical applications of machine learning in cardiovascular disease and its relevance to cardiac imaging. Eur Heart J [Internet]. [cited 2019 Jan 21]; Available from: https://academic.oup.com/eurheartj/advance-article/doi/10.1093/eurheartj/ehy404/5060564 pmid:30060039
  22. 22. Ragab DA, Sharkas M, Marshall S, Ren J. Breast cancer detection using deep convolutional neural networks and support vector machines. PeerJ [Internet]. 2019 Jan 28 [cited 2019 Jun 5];7. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6354665/ pmid:30713814
  23. 23. Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View. J Med Internet Res. 2016 16;18(12):e323. pmid:27986644
  24. 24. Naimi AI, Platt RW, Larkin JC. Machine Learning for Fetal Growth Prediction. Epidemiol Camb Mass. 2018 Mar;29(2):290–8. pmid:29199998
  25. 25. Deo Rahul C. Machine Learning in Medicine. Circulation. 2015 Nov 17;132(20):1920–30. pmid:26572668
  26. 26. Frizzell JD, Liang L, Schulte PJ, Yancy CW, Heidenreich PA, Hernandez AF, et al. Prediction of 30-Day All-Cause Readmissions in Patients Hospitalized for Heart Failure: Comparison of Machine Learning and Other Statistical Approaches. JAMA Cardiol. 2017 Feb 1;2(2):204–9. pmid:27784047
  27. 27. Shah NH, Milstein A, Bagley SC PhD. Making Machine Learning Models Clinically Useful. JAMA. 2019 Aug 8; pmid:31393527
  28. 28. Baxt WG. Application of artificial neural networks to clinical medicine. The Lancet. 1995 Oct 28;346(8983):1135–8. pmid:7475607
  29. 29. Venkatesh KK, Strauss RA, Grotegut CA, Heine RP, Chescheir NC, Stringer JSA, et al. Machine Learning and Statistical Models to Predict Postpartum Hemorrhage. Obstet Gynecol. 2020 Apr;135(4):935–44. pmid:32168227
  30. 30. Hastie Trevor, Tibshirani Robert, Friedman Jerome. Elements of Statistical Learning: data mining, inference, and prediction. 2nd Edition. [Internet]. Second Edition. Springer; 2009 [cited 2019 Jan 21]. Available from: https://web.stanford.edu/~hastie/ElemStatLearn/
  31. 31. Health Statistics Division. Low Birth Weight Newborns in Canada 2000 to 2013: Health Fact Sheets. [Internet]. 2016 [cited 2019 Oct 24]. Available from: https://www150.statcan.gc.ca/n1/pub/82-625-x/2016001/article/14674-eng.htm
  32. 32. Miao Q, Fell DB, Dunn S, Sprague AE. Agreement assessment of key maternal and newborn data elements between birth registry and Clinical Administrative Hospital Databases in Ontario, Canada. Arch Gynecol Obstet. 2019 Jul;300(1):135–43. pmid:31111244
  33. 33. Maghsoudlou S, Yu ZM, Beyene J, McDonald SD. Phenotypic Classification of Preterm Birth Among Nulliparous Women: A Population-Based Cohort Study. J Obstet Gynaecol Can JOGC J Obstet Gynecol Can JOGC. 2019 Oct;41(10):1423-1432.e9. pmid:31053564
  34. 34. Frey HA, Klebanoff MA. The epidemiology, etiology, and costs of preterm birth. Semin Fetal Neonatal Med. 2016 Apr;21(2):68–73. pmid:26794420
  35. 35. Pillay P, Moodley K, Moodley J, Mackraj I. Placenta-derived exosomes: potential biomarkers of preeclampsia. Int J Nanomedicine. 2017 Oct 31;12:8009–23. pmid:29184401
  36. 36. Bilagi A, Burke DL, Riley RD, Mills I, Kilby MD, Katie Morris R. Association of maternal serum PAPP-A levels, nuchal translucency and crown-rump length in first trimester with adverse pregnancy outcomes: retrospective cohort study. Prenat Diagn. 2017 Jul;37(7):705–11. pmid:28514830
  37. 37. Vahanian SA, Lavery JA, Ananth CV, Vintzileos A. Placental implantation abnormalities and risk of preterm delivery: a systematic review and metaanalysis. Am J Obstet Gynecol. 2015 Oct;213(4):S78–90. pmid:26428506
  38. 38. WHO | Obesity: preventing and managing the global epidemic [Internet]. Geneva; 2000 [cited 2019 Oct 23]. (Report of a World Health Organization Consultation). Report No.: 894. Available from: http://www.who.int/entity/nutrition/publications/obesity/WHO_TRS_894/en/index.html
  39. 39. Gilani N, Haghshenas R, Esmaeili M. Application of multivariate longitudinal models in SIRT6, FBS, and BMI analysis of the elderly. Aging Male Off J Int Soc Study Aging Male. 2019 Dec;22(4):260–5.
  40. 40. Institute of Medicine (US) and National Research Council (US) Committee to Reexamine IOM Pregnancy Weight Guidelines. Weight Gain During Pregnancy: Reexamining the Guidelines [Internet]. Rasmussen KM, Yaktine AL, editors. Washington (DC): National Academies Press (US); 2009 [cited 2019 Oct 1]. (The National Academies Collection: Reports funded by National Institutes of Health). Available from: http://www.ncbi.nlm.nih.gov/books/NBK32813/
  41. 41. Oliver-Williams C, Fleming M, Wood AM, Smith G. Previous miscarriage and the subsequent risk of preterm birth in Scotland, 1980–2008: a historical cohort study. BJOG Int J Obstet Gynaecol. 2015 Oct;122(11):1525–34. pmid:25626593
  42. 42. Cavoretto P, Candiani M, Giorgione V, Inversetti A, Abu-Saba MM, Tiberio F, et al. Risk of spontaneous preterm birth in singleton pregnancies conceived after IVF/ICSI treatment: meta-analysis of cohort studies. Ultrasound Obstet Gynecol Off J Int Soc Ultrasound Obstet Gynecol. 2018;51(1):43–53. pmid:29114987
  43. 43. Witold R. Rudnicki MBK. Feature Selection with the Boruta Package. J Stat Softw. 2010;36(11):1–13.
  44. 44. Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015 Jan 6;162(1):W1–73. pmid:25560730
  45. 45. Shanab AA, Khoshgoftaar TM, Wald R, Napolitano A. Impact of noise and data sampling on stability of feature ranking techniques for biological datasets. In: 2012 IEEE 13th International Conference on Information Reuse Integration (IRI). 2012. p. 415–22.
  46. 46. Wing MKC from J, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, et al. caret: Classification and Regression Training [Internet]. 2019 [cited 2019 Sep 30]. Available from: https://CRAN.R-project.org/package=caret
  47. 47. Buuren S van, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations in R. J Stat Softw. 2011 Dec 12;45(1):1–67.
  48. 48. Rubin DB. Multiple Imputation After 18+ Years. J Am Stat Assoc. 1996;91(434):473–89.
  49. 49. Stuart EA, Azur M, Frangakis C, Leaf P. Multiple Imputation With Large Data Sets: A Case Study of the Children’s Mental Health Initiative. Am J Epidemiol. 2009 May 1;169(9):1133–9. pmid:19318618
  50. 50. Hosmer DW, Lemeshow S. Applied Logistic Regression. New York: John Wiley and Sons; 2013.
  51. 51. Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform. 2002 Oct 1;35(5):352–9. pmid:12968784
  52. 52. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019 Jun 1;110:12–22. pmid:30763612
  53. 53. Gilani N, Kazemnejad A, Zayeri F, Asghari Jafarabadi M, Izadi Avanji FS. Predicting Outcomes in Traumatic Brain Injury Using the Glasgow Coma Scale: A Joint Modeling of Longitudinal Measurements and Time to Event. Iran Red Crescent Med J. 2017 Feb 1;19(2).
  54. 54. Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol. 1996 Nov 1;49(11):1225–31. pmid:8892489
  55. 55. Woolery LK, Grzymala-Busse J. Machine learning for an expert system to predict preterm birth risk. J Am Med Inform Assoc. 1994;1(6):439–46. pmid:7850569
  56. 56. Courtney KL, Stewart S, Popescu M, Goodwin LK. Predictors of preterm birth in birth certificate data. Stud Health Technol Inform. 2008;136:555–60. pmid:18487789
  57. 57. Melchor JC, Khalil A, Wing D, Schleussner E, Surbek D. Prediction of preterm delivery in symptomatic women using PAMG-1, fetal fibronectin and phIGFBP-1 tests: systematic review and meta-analysis. Ultrasound Obstet Gynecol Off J Int Soc Ultrasound Obstet Gynecol. 2018 Oct;52(4):442–51. pmid:29920825
  58. 58. Melchor JC, Navas H, Marcos M, Iza A, De Diego M, Rando D, et al. Predictive performance of PAMG-1 vs fFN test for risk of spontaneous preterm birth in symptomatic women attending an emergency obstetric unit: retrospective cohort study. Ultrasound Obstet Gynecol Off J Int Soc Ultrasound Obstet Gynecol. 2018 May;51(5):644–9. pmid:28850753
  59. 59. Feng YY, Jarde A, Seo YR, Powell A, Nwebube N, McDonald SD. What Interventions Are Being Used to Prevent Preterm Birth and When? J Obstet Gynaecol Can JOGC J Obstet Gynecol Can JOGC. 2018 May;40(5):547–54.
  60. 60. Naggara O, Raymond J, Guilbert F, Roy D, Weill A, Altman DG. Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable: An Example from the Natural History of Unruptured Aneurysms. Am J Neuroradiol. 2011 Mar 1;32(3):437–40. pmid:21330400
  61. 61. Moutquin J-M. Classification and heterogeneity of preterm birth. BJOG Int J Obstet Gynaecol. 2003;110(s20):30–3. pmid:12763108
  62. 62. Lucaroni F, Morciano L, Rizzo G, D’ Antonio F, Buonuomo E, Palombi L, et al. Biomarkers for predicting spontaneous preterm birth: an umbrella systematic review. J Matern-Fetal Neonatal Med Off J Eur Assoc Perinat Med Fed Asia Ocean Perinat Soc Int Soc Perinat Obstet. 2018 Mar;31(6):726–34. pmid:28274163
  63. 63. Atis A, Tandogan T, Aydin Y, Sen C, Turgay F, Eren N, et al. Late pregnancy associated plasma protein A levels decrease in preterm labor. J Matern-Fetal Neonatal Med Off J Eur Assoc Perinat Med Fed Asia Ocean Perinat Soc Int Soc Perinat Obstet. 2011 Jul;24(7):923–7. pmid:21557695
  64. 64. Grisaru-Granovsky S, Halevy T, Planer D, Elstein D, Eidelman A, Samueloff A. PAPP-A levels as an early marker of idiopathic preterm birth: a pilot study. J Perinatol Off J Calif Perinat Assoc. 2007 Nov;27(11):681–6. pmid:17703186
  65. 65. Jelliffe-Pawlowski LL, Shaw GM, Currier RJ, Stevenson DK, Baer MsRJ, O’Brodovich HM, et al. Association of Early Preterm Birth with Abnormal Levels of Routinely Collected First and Second Trimester Biomarkers. Am J Obstet Gynecol. 2013 Jun;208(6):492.e1-492.e11. pmid:23395922
  66. 66. Kaijomaa M, Rahkonen L, Ulander V-M, Hämäläinen E, Alfthan H, Markkanen H, et al. Low maternal pregnancy-associated plasma protein A during the first trimester of pregnancy and pregnancy outcomes. Int J Gynaecol Obstet Off Organ Int Fed Gynaecol Obstet. 2017 Jan;136(1):76–82. pmid:28099695
  67. 67. Mella MT, Berghella V. Prediction of preterm birth: cervical sonography. Semin Perinatol. 2009 Oct;33(5):317–24. pmid:19796729