Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Predicting preterm births from electrohysterogram recordings via deep learning

  • Uri Goldsztejn,

    Roles Conceptualization, Formal analysis, Methodology, Software, Validation, Writing – original draft

    Affiliation Department of Biomedical Engineering, McKelvey School of Engineering, Washington University in St. Louis, St. Louis, MO, United States of America

  • Arye Nehorai

    Roles Conceptualization, Investigation, Project administration, Supervision, Writing – review & editing

    nehorai@wustl.edu

    Affiliation Preston M. Green Department of Electrical and Systems Engineering, McKelvey School of Engineering, Washington University in St. Louis, St. Louis, MO, United States of America

Abstract

About one in ten babies is born preterm, i.e., before completing 37 weeks of gestation, which can result in permanent neurologic deficit and is a leading cause of child mortality. Although imminent preterm labor can be detected, predicting preterm births more than one week in advance remains elusive. Here, we develop a deep learning method to predict preterm births directly from electrohysterogram (EHG) measurements of pregnant mothers recorded at around 31 weeks of gestation. We developed a prediction model, which includes a recurrent neural network, to predict preterm births using short-time Fourier transforms of EHG recordings and clinical information from two public datasets. We predicted preterm births with an area under the receiver-operating characteristic curve (AUC) of 0.78 (95% confidence interval: 0.76-0.80). Moreover, we found that the spectral patterns of the measurements were more predictive than the temporal patterns, suggesting that preterm births can be predicted from short EHG recordings in an automated process. We show that preterm births can be predicted for pregnant mothers around their 31st week of gestation, prompting beneficial treatments to reduce the incidence of preterm births and improve their outcomes.

Introduction

Around 10% of all live births, about 15 million babies per year, are preterm, that is, they happen before 37 weeks of gestation are completed [1, 2]. Preterm births are a leading cause of newborn mortality [3]. Moreover, many preterm babies suffer from long-term morbidity, including permanent neurological damage [2, 4]. Because treatments can delay preterm births and improve their outcomes, identifying pregnant mothers at high risk of preterm birth is compelling, as recognized by the World Health Organization (WHO) [2, 5].

Although several methods can predict preterm births, they have limitations. Broad historical risk factors, such as previous preterm births or multiple gestations, can identify mothers at higher risk of preterm birth, but these risk factors alone are not sufficient to accurately predict which individual mothers will deliver preterm [68].

In clinical practice, preterm birth is usually predicted by measuring cervical length or the concentration of cervico-vaginal fibronectin alpha [8]. In mothers with symptoms of preterm labor, these minimally invasive tests can predict births that will occur within one week [9, 10]. Moreover, the combination of these tests has been reported to produce more accurate results than each method separately and could be used to predict preterm births in symptomatic mothers within two weeks of testing [11]. These measurements are helpful because they inform physicians and guide treatments to reduce the risk of preterm labor and to improve its outcomes. However, these measurements are not cost-effective screening tools for the general population of pregnant mothers because they have low predictive values among mothers at low risk for preterm labor, such as nulliparous women with singleton pregnancies [8, 12].

Home uterine activity monitors (HUAMs) were developed to measure uterine contractions and predict preterm births. The first such devices were based on tocodynamometer recordings, which measure the pressure changes associated with uterine contractions [13]. Unfortunately, these devices could not predict preterm births, and current clinical guidelines discourage their use for this purpose [8, 13, 14].

More recently, electrohysterogram (EHG) recordings have been proposed to predict preterm births [15, 16]. EHG recordings use abdominal electrodes to measure the electrical activity associated with uterine contractions, and they can be recorded with portable devices equipped with algorithms to monitor uterine contractions [15, 17]. A variety of algorithms have been developed with the aim of predicting preterm births from various features derived from EHG measurements [16, 18]. These features are generally calculated from uterine contraction intervals, either manually selected or identified using dedicated algorithms [16, 19, 20]. These intervals can also be identified with the aid of simultaneous tocodynamometer recordings [17, 19].

To the best of our knowledge, EHG measurements have not yet been shown to predict preterm births more than two weeks in advance with a performance comparable to the clinical standards, i.e., using measurements of cervical length or fibronectin alpha for detecting imminent labor. Although many researchers have reported nearly perfect predictions of preterm births based on EHG measurements from the “Term-Preterm EHG Database,” meticulous analysis revealed that these results were overoptimistic and resulted from data leakage [21, 22]. Namely, these works inadvertently introduced strong correlations between the data used to train the prediction models and the data used to test the performance of these models, as shown by Vandewiele et al. [16, 21, 22]. This problem was caused by inappropriate attempts to improve the models’ performance by balancing the number of term and preterm samples used to develop these models. After Vandewiele et al. corrected this problem, these models were no longer able to predict preterm births accurately [21, 22]. Additional works with sound methodology suggest that some features derived from EHG measurements can be used to distinguish between recordings of mothers who eventually delivered at term from those who delivered preterm [2326]. However, none of these works could predict preterm births with clinically useful accuracy. More recently, Xu et al. and Lou et al. developed methods for predicting preterm births avoiding data leakage [27, 28]. Although Xu et al. and Lou et al. achieved high classification performances on test sets including real and synthetic measurements, the performances of their approaches on test sets including only real measurements are not reported. Moreover, Fischer et al. used an end-to-end deep learning model to predict preterm births from EHG measurements without artificially increasing the number of preterm samples to avoid possible data leakage and achieved only a moderate accuracy [29].

Here, we present an end-to-end deep learning model that predicts preterm births directly from EHG measurements, without handcrafted features. Therefore, our model is not sensitive to varying implementations of specific features or to how uterine contractions are segmented. We developed our work using EHG measurements and supplementary clinical information from two public databases. Importantly, we developed our model with care to avoid data leakage. Using our model, we could predict preterm births in pregnant mothers around their 31st week of gestation. Our predictive accuracy was close to that achieved by using cervical length and fibronectin alpha measurements to predict preterm labors in mothers with symptoms of preterm labor and within one week of delivery. Moreover, by investigating the measurement components that contribute to the predictions of our model, we showed that it is possible to predict preterm births using short recording times, thus facilitating clinical adoption and at-home implementation of EHG measurements. This finding is aligned with the observations of Jager et al., who proposed that preterm births can be predicted from short contractile or non-contractile intervals of EHG measurements with similar accuracy as when using 30-minute long recordings [19]. Our work and results encourage using EHG measurements and deep learning for predicting preterm births in real-world scenarios. Their successful employment could help reduce newborn morbidity and mortality, especially in populations with limited access to healthcare, who suffer more from preterm birth [2].

Materials and methods

Study participants

In developing our work, we used two datasets in the Physionet repository, aggregating data from the “Term-Preterm EHG Database” (TPEHG DB) [23, 30] and from the “Term-Preterm ElectroHysteroGram DataSet with Tocogram” (TPEHGT DS) [19, 30]. These datasets contain bipolar EHG measurements, with nearly every recording lasting 30 minutes, and clinical information obtained from pregnant mothers during regular pregnancy checkups, as well as from mothers hospitalized for threatened preterm labor. Both datasets were acquired at the University Medical Centre Ljubljana, using the same recording protocol and device. The TPEHG DB consists of 300 records, each obtained from a different mother at either around the 22nd or the 32nd week of gestation. Additionally, the TPEHGT DS contains 26 records from 18 different mothers, obtained around the 31st week of gestation. Half of the samples in the TPEHGT DS correspond to mothers who eventually delivered preterm, while the other 13 records correspond to term deliveries. When compiling these datasets, the datasets’ authors excluded the mothers whose labors were induced or whose deliveries were performed using a Cesarean section [19, 23].

We included the records from both datasets obtained after the 26th week of gestation. We used this threshold of 26 weeks following the grouping of gestational ages at the time of the recordings in the TPEHG DB [23]. Since each record in the TPEHG DB was obtained from a different mother, we included all the records from this database that were obtained after the 26th week of gestation. On the other hand, when there were multiple records for the same mother in the TPEHGT DS, we included only the latest record during the pregnancy, provided that the record was made after the 26th week of gestation. We identified the records in the TPEHGT DS that corresponded to a particular mother by comparing the clinical information. By using a single record per mother, we prevented our models from learning features that characterize mothers rather than features that are predictive of pregnancy outcomes. Overall, we used 159 records from different mothers. Each of these records lasted 30 minutes, except for two records that were 26 and 33 minutes long. To facilitate the data analysis, we zero-padded the 26-minute-long record and truncated the 33-minute-long record, so that all the records were 30 minutes long. Among these mothers, 18.9% delivered preterm. We detail the clinical information of these mothers in Table 1. Additionally, we illustrate the distribution of gestational ages of the mothers included at the times of recording and at birth in S1 and S2 Figs.

thumbnail
Table 1. Clinical information from the records included in our work.

https://doi.org/10.1371/journal.pone.0285219.t001

Prediction models

We developed classification and regression models to predict a term or a preterm birth. The classification models were trained specifically to predict categorical outcomes, i.e., delivery at term or preterm. Pursuing a different approach, we trained the regression models to predict the gestational age at delivery, labeling predictions lower than 37 completed weeks, or 259 days, as preterm, and those above 37 weeks as term. In developing the classification and regression models, we used clinical information alone, EHG measurements alone, and clinical information combined with EHG measurements. These prediction models, developed using MATLAB 2020a, are detailed in the next subsections and summarized in a block diagram in Fig 1.

thumbnail
Fig 1. Block diagram of the three classification and regression models developed in this work.

The details of these models are provided in Methods. The clinical information model is illustrated in the upper part of the diagram, using shapes with blue outlines. This model uses clinical information, in tabular format, to predict preterm births by using logistic or linear regression, represented as a block with a blue outline and schematic illustrations below it. Preprocessing the clinical information consists of completing missing entries and normalizing the predictors, as described in Methods. The EHG model is illustrated in the lower part of the diagram, using shapes with black outlines. This model uses EHG measurements, represented by an input block with a schematic illustration below it, that are first preprocessed. This preprocessing step includes bandpass filtering (BPF) and downsampling. The preprocessed measurements are used to compute STFTs, illustrated by a block and a schematic representation, that are used as input to the RNN. This network is composed of an input layer, a BiLSTM layer, a fully connected (FC) layer, and an output layer, which are illustrated using light blue shapes with black outlines and enclosed within a dashed light blue outline. The combined model uses clinical information and EHG measurements to predict preterm births and is illustrated in the middle part of the diagram using shapes with red outlines. The dotted black outline represents the cross-validation technique employed, indicating that the operations within are applied separately for each data partition, whereas the operations outside are applied to all the data, independent of the data partition.

https://doi.org/10.1371/journal.pone.0285219.g001

Clinical information models.

First, using only the clinical information of the records, we predicted whether each mother delivered preterm or at term. We used most of the predictors shown in Table 1, namely maternal age, gestational age at the time of the recording, weight, whether the mothers had given birth previously (parous), had aborted pregnancies previously, had reported vaginal bleeding in the first trimester, had reported vaginal bleeding in the second trimester, or were smokers. We excluded diagnoses of hypertension and diabetes because these diagnoses are mostly absent in this dataset. We also excluded diagnoses of funnelling because they are made through transvaginal sonography and because, in this dataset, these diagnoses have a low predictive power [8]. Similar to funnelling, we excluded the variable in the dataset indicating the placental position, which takes the values “front” (considered as the positive value in Table 1) and “end.” We completed the missing entries for each variable in the training and testing datasets using the mode of that variable in the training set. To prevent data leakage, rather than using the modes of the entire dataset, we used the modes of the samples in the training set to complete missing entries in both the training and testing sets [31]. Therefore, our training data does not contain any information from the test set and when making predictions, our model uses only information from the training set to both complete missing entries and make predictions. In other words, our model makes predictions on each sample of the testing dataset using only information from the training set.

Next, we trained a logistic regression to predict whether deliveries were preterm and a linear regression model to predict the gestational age at birth. These models are represented using a block with a blue outline on the upper part of Fig 1. In the logistic regression model, we discarded the redundant predictors, using lasso regularization. We regularized only the classification model, and not the regression model, because we observed that the lasso regularization improved the performance of the logistic regression model slightly but marginally worsened the performance of the linear regression model. Since we regularized the logistic regression model, we also normalized the predictors in this model to prevent the regularization term from penalizing the model parameters based on the scale of the predictors. Again, to prevent data leakage, we normalized both the training and testing sets using the means and standard deviations of the samples in the training set, thus avoiding revealing information from the test set to the training set.

The operation to complete missing entries described above, together with the operation to normalize the input data, comprise the preprocessing step for the clinical information. This preprocessing step is represented in Fig 1 as a block that is executed once for each partitioning of the data into training and testing datasets, as described below.

EHG measurements models.

Then, using only the EHG measurements, we predicted whether the mothers delivered preterm or at term. We used solely the first signal (s1) in the databases, in agreement with the recommendation of Garcia-Casado et al. of using simple systems for predicting preterm birth [18]. This signal measures the electric potential difference between two electrodes aligned horizontally on the abdomen, 3.5 cm above the navel, and separated by seven cm.

We preprocessed all the EHG measurements to improve the data quality. We removed the first minute of the recordings to remove transient effects. Next, we filtered the measurements to remove baseline wander and high frequency noise. Specifically, we filtered the recordings using a fourth-order, Butterworth bandpass filter with zero-phase and cutoff frequencies of 0.05 Hz and 4 Hz. Although most uterine activity is concentrated between 0.05 Hz and 0.7 Hz, we included a higher frequency range because higher frequency components have been shown to be predictive of preterm birth [19, 32]. Finally, we downsampled the measurements to 10 Hz to improve computational speed without losing information. These preprocessing operations are represented using a block with a black outline at the bottom of Fig 1.

Next, as illustrated in the bottom part of Fig 1, we transformed the preprocessed EHG measurements to the time-frequency domain, using the short-time Fourier transform (STFT). We used the STFT following the positive results previously reported using this transformation for predicting preterm births from EHG measurements [19, 27]. The STFT usefully represents how the spectral components of the measurements change over time by constructing a matrix where each column corresponds to a sliding time interval and contains the estimated spectral content of the measurements during the corresponding time interval. This transformation is helpful in analyzing non-stationary processes, such as the contractile activity during the recordings. We estimated the STFT using Hamming windows of 60 s that were slid using a 75% overlap. We chose this configuration since uterine contractions usually last around one minute and because this configuration resulted in satisfactory temporal and spectral resolutions based on visual inspection [33].

We predicted the pregnancies’ outcomes from EHG recordings using a deep neural network, rather than using handcrafted features, because neural networks automatically learn the most informative features from the data [34, 35]. Given the limited success of various methods designed to predict preterm births from the EHG measurements in the TPEHG DB using handcrafted features, Vandewiele et al. suggested using deep learning to achieve better results [22].

In agreement with the suggestions from Vandewiele et al., we used a deep recurrent neural network (RNN) to predict the pregnancies’ outcomes from EHG measurements, developing a dedicated network architecture for this task. This RNN uses the training set, consisting of data samples labeled with their respective pregnancy outcome, to learn features from the input data that predict the pregnancies’ outcomes. The RNN consists of a series of layers that are trained to learn multiple abstractions of the data that are helpful in relating the input data to the predictions [34]. The first layer in our network is a sequence input layer that rearranges the matrices of STFTs so that the columns of the STFT matrices, which capture the spectral content of the measurements during the sliding time intervals, become a set of features for the corresponding time step in the RNN. This input layer feeds into a series of bidirectional long short-term memory (BiLSTM) cells with 100 hidden states. The BiLSTM cells are able to learn patterns from sequential data: in our case, these cells are intended to learn patterns from the spectral changes of the EHG measurements over time. Similar network architectures, using long short-term memory (LSTM) and BiLSTM cells, have been used to successfully learn informative data representations from STFTs in other applications [36, 37].

Next, using a similar approach as Zhu et al., we connected the last BiLSTM cell to a fully connected layer consisting of two neurons, and finally we connected the fully connected layer to an output layer [36]. The fully connected layer encodes the data abstraction inferred by the BiLSTM cells into a pair of scalar values, which are then used by the output layer to make a prediction. In the classification model, this pair of scalar values scores the association of each EHG recording to the preterm and term categories. This architecture is illustrated in Fig 1.

We used two different output layers, depending on whether we intended to predict the categorical outcome of the pregnancy or to predict the gestational age at birth. For the classification problem, we used a softmax output layer and trained the network using a weighted cross-entropy loss function that penalized errors in the preterm birth predictions more. We determined the weights of the loss function based on the relative frequency of each class in the training set, a strategy that addresses the class-imbalance problem of predicting preterm births. Namely, because term labors are more frequent in the general population and in the database, classification models trained on these data are naturally biased towards predicting term labors and may learn to predict term labors for every input. This loss function is given by: (1) where N is the number of samples in each training batch, wi is the penalization weight of each class, Tn = {0, 1} is the label of sample n, and yn is the output score of the sample n. We set the penalization weight for class i to be: (2) where Si is the number of samples from class i in the training set, as suggested in [38].

For the regression problem, we used a regression layer as the output of the network. This layer implements a mean square error (MSE) loss function to train the network. Since the regression models are trained on a continuous output, i.e., the gestational age at birth, these models are less sensitive to the class imbalance problem. The classification and regression output layers are represented by a single blue block with a black outline at the bottom of Fig 1.

In developing our prediction models using EHG measurements, we evaluated alternative model designs based on a single run of a five-fold cross-validation. We evaluated alternative time-frequency representations, namely wavelet transforms and the empirical mode decomposition, as described in [39, 40]. Additionally, we tested other neural network architectures, namely using long short-term memory (LSTM) cells and convolutional neural networks (CNN), with varying network parameters, such as the numbers of layers and the number of LSTM cells. Here, we report the model that produced the best prediction results.

We also fine-tuned the learning parameters based on a single run of a five-fold cross-validation. Namely, we selected an appropriate mini-batch size, number of training iterations, learning rate, and regularization hyperparameter.

Combined models.

We developed both a classification and a regression model that combine clinical information with EHG measurements to predict pregnancies’ outcomes. We hypothesized that combining all the available information can improve the performance of our models, as previously suggested [18]. We first trained the network described in the previous subsection. Then, we extracted the activation values of the fully connected layer and concatenated these values with the clinical information. Next, we used the combined data to train the logistic regression model to predict the outcome of the pregnancy, and the linear regression model to predict the gestational age at delivery. We implemented these logistic and linear regression models as described before. The difference between these models and those used for predictions based only on clinical information is that, in this case, the data vectors included the activations of the fully connected layers in addition to the clinical information. The stages of these classification and regression models, which combine clinical information and EHG measurements, are illustrated in the middle part of Fig 1.

Cross-validation

We evaluated the performance of our models using a stratified, five-fold cross-validation. We partitioned the data into a training set, containing 80% of the data, and a test set, containing the remaining 20% of the data, so that both the training and testing sets included the same proportion of preterm samples. We illustrate our data partitioning in S3 Fig. We used the training set to train our models and the testing set to evaluate the models’ performance. We repeated this process five times, each time using a different set of samples for the training and testing set, so that all the samples were used for testing throughout the five runs. This cross-validation routine is indicated in Fig 1 by a dotted black outline. This outline symbolizes that the operations represented within are applied separately for each partition of the data, whereas the operations represented outside the outline are applied once for all the data.

Statistical analysis

To evaluate the performance of the prediction models with confidence intervals and to reduce the risk of bias, we repeated the cross-validation routine 20 times, as recommended in [41]. Each time, we used a different random partition of the data. By repeating the cross-validation routine with various random partitions, we prevented our models from possibly producing over-optimistic results due to fitting of the training hyperparameters and model specifications to a specific cross-validation partition. We then calculated the mean and 95% confidence interval (CI) of the performance statistics, assuming that the performance statistics had Gaussian distributions with unknown means and variances.

Results

Performance of the prediction models

First, we attempted to predict preterm births by using only the clinical information, which supplements the EHG measurements and is described in Table 1. We developed two models: a logistic regression model to determine whether a pregnancy would result in a preterm birth, and a linear regression model to predict the gestational age at delivery, as detailed in Methods. When using the regression model, we predicted that a birth would be at term if the estimated gestational age at delivery was at least 37 complete weeks, or 259 days. The classification model predicted preterm births with an area under the receiver-operating characteristic curve (AUC) of 0.65 (95% CI: 0.63–0.67), whereas the regression model predicted preterm births with an AUC of 0.67 (95% CI: 0.65–0.70).

Next, we examined whether EHG measurements could be used to predict preterm births using end-to-end deep-learning models, directly from EHG measurements and without requiring handcrafted features. Specifically, we trained a recurrent neural network to predict whether the pregnant mothers would deliver preterm and to predict their gestational ages at delivery, as described in Methods. This network’s predictions surpassed those of the clinical information models. The classification model trained on EHG measurements was able to predict preterm births with an AUC of 0.74 (95% CI: 0.73–0.76), whereas the regression model predicted preterm births with an AUC of 0.70 (95% CI: 0.68–0.73).

We also developed models to predict preterm births based on clinical information combined with EHG measurements, as described in Methods. We hypothesized that integrating the clinical information and the EHG measurements would yield more accurate prediction models, because the models trained independently on clinical information alone and EHG measurements alone could predict preterm births better than random guessing. Moreover, the clinical information and the EHG measurements provide complementary information about the pregnancy. Consistent with our hypothesis, the prediction models trained on both clinical information and EHG measurements slightly outperformed the models trained on clinical information alone and on EHG measurements alone. Our classification model predicted preterm births with an AUC of 0.78 (95% CI: 0.76–0.80), and the regression model predicted preterm births with an AUC of 0.75 (95% CI: 0.73–0.77).

To better evaluate the performance of our prediction models, we estimated a performance bound on this classification problem. In our work, as well as in the obstetrics literature and clinical practice, births are considered preterm if the mother delivers the fetus before completing 37 weeks of gestation. However, the gestational age of the mother has an uncertainty that depends on the method used to estimate it. Generally, gestational age is estimated based on a first trimester ultrasound examination or on the timing of the last menstrual period (LMP) [42]. When the gestational age is estimated based on early ultrasound examination, the estimate has a standard deviation of about five days, whereas estimates based on the LMP have standard deviations of about seven days [43]. Notably, the incidence of preterm births depends on the method used to estimate the gestational age [44].

This estimation error translates into uncertainty in the ground truth labels and limits the possible performance of classification algorithms. We estimated the upper bound of the AUC due to this limitation by measuring the AUC obtained when predicting the gestational age at delivery using a noisy version of the true gestational ages at delivery. We corrupted the gestational ages at delivery by adding independent and identically distributed (i.i.d.) Gaussian noise with zero mean and a standard deviation of six days. After repeating this procedure 20 times to estimate the mean and 95% CI of this AUC using this approach, we found that the upper AUC bound for this classification problem is 0.98 (95% CI: 0.98–0.98).

In Fig 2, we present the receiver-operating characteristic curves (ROC) for the classification and regression models trained on clinical information alone, EHG measurements alone, and clinical information combined with EHG measurements. We observe that the classification models which leverage EHG measurements outperform the regression models trained on the same data. Moreover, we notice that regardless of whether we use the classification or regression approach, the EHG-based models outperform the clinical information-based models and that the models that leverage both the clinical information and the EHG measurements achieve the best performance.

thumbnail
Fig 2. Performance of the models for predicting preterm births.

(a) ROC curves for predicting preterm births using the classification models trained with clinical information alone, EHG measurements alone, and clinical information combined with EHG measurements. (b) ROC curves for the same tasks as in (a), but using the regression models instead of the classification models. (a), (b) The performance bound is shown in both panels by a black ROC curve. The greyed area delimited by this bound indicates unattainable performance due to the uncertainty in the ground truth labels. The AUCs of the models are presented with 95% CIs.

https://doi.org/10.1371/journal.pone.0285219.g002

To further assess the performance of our models, we measured the sensitivity, positive predictive value (PPV), and negative predictive value (NPV) at various specificity levels, as shown in Table 2 [45]. We include the PPV and NPV in our analysis because these statistics consider the incidence of preterm births in the dataset [45]. Since the classification models that use EHG measurements outperformed the regression models, we present the results only for the classification models.

thumbnail
Table 2. Performance of classification models in predicting preterm birth.

https://doi.org/10.1371/journal.pone.0285219.t002

In Table 2, we observe that the combined model outperforms the models trained on clinical information alone or EHG measurements alone in sensitivity, PPV, and NPV at various specificity levels. Moreover, we observe that our models have a much higher NPV than PPV, which results from the low incidence of preterm births. In other words, our predictions of term births are more reliable than our predictions of preterm births.

We verified that our model was not discriminating between the two datasets used in our work. The TPEHG DB and the TPEHGT DS datasets were acquired with the same device and following the same protocol, so we did not expect that our model would discriminate between the samples of either dataset. We confirmed that our model does not assign one label to the samples from one dataset and another label to the samples of the other dataset. Moreover, when we trained the classification models using only the TPEHG DB, we obtained similar AUCs to those obtained when we trained the models using data from both datasets.

Although our regression models could predict preterm births more accurately than random guessing, these models were not able to predict the gestational ages at delivery with a much lower MSE than the MSE obtained using the mean gestational age at delivery in the training set, i.e., the minimum MSE estimator. Although the correlation between the predicted and true gestational ages at delivery is positive, the accuracy of the predictions is low, as shown in S4 Fig.

Predictive components of EHG measurements

Further, we investigated how various components of the EHG measurements contribute to the preterm birth predictions by altering the STFT representations of the data. We first explored the predictive power of various frequency bands, as shown in Fig 3a and 3b. We extracted four frequency bands (B0 through B3) by using only the relevant rows of the STFT for training and testing. We considered similar frequency bands as Jager et al.: in our case, B0, B1, B2, and B3, cover the frequency ranges between 0.05 Hz and 1.0 Hz, 1.0 Hz and 2.2 Hz, 2.2 Hz and 3.5 Hz, and 3.5 Hz and 5.0 Hz, respectively [19]. The only difference between our spectral partition and that proposed by Jager et al. is that, in our case, the lower frequency cutoff of B0 is 0.05 Hz instead of 0.08 Hz [19]. According to Jager et al., B0 mostly contains electrical activity associated with uterine contractions, whereas the higher bands contain harmonic frequencies of uterine reverberation caused by maternal cardiac activity [19]. Notably, we observed that the models trained on higher frequency bands achieved higher AUCs, as shown in Fig 3b.

thumbnail
Fig 3. Effects of information loss on the prediction of preterm births.

(a) A representative STFT of an EHG recording overlaid with the limits of the frequency bands examined. (b) AUCs obtained using the classification model trained on the various frequency bands. (c) The same STFT as in (a), but with all the columns randomly rearranged. (d) AUCs obtained using the classification model trained on STFTs with varying fractions of columns randomly rearranged. (e), The same STFT as in (a), but where ten minutes of the recording were removed. The colorbar in this panel also corresponds to panels (a) and (c). (f) AUCs obtained using the classification model trained on STFTs with varying durations. (b), (d), (f), The AUCs are presented as black dots with error bars denoting the 95% CIs.

https://doi.org/10.1371/journal.pone.0285219.g003

Next, we examined how the temporal patterns of the measurements contribute to the models’ predictions. We disrupted the temporal patterns by randomly rearranging a random subset of columns of the STFTs, as illustrated in Fig 3c. The AUC of the model did not significantly change as larger fractions of columns of the STFTs were rearranged, as shown in Fig 3d. Notably, when all the columns of the STFTs were randomly rearranged, i.e., when all the temporal patterns were disrupted, our classification model trained on disrupted EHG measurements alone was able to predict preterm births with an AUC of 0.74 (95% CI: 0.72–0.76).

Based on our observations from disrupting the spectral and temporal patterns, we hypothesized that the predictions of our model are guided more by the spectral composition of the measurements than by their temporal patterns. Hence, we sought to predict preterm births using shorter EHG recordings. The duration of EHG recordings, usually between 30 and 60 minutes, is an important hindrance to their implementation in clinical settings, where personnel resources are often limited [23, 46].

To test this hypothesis, we trained and tested our model using cropped STFTs, as shown in Fig 3e. We removed columns at the beginning and at the end of the STFTs to simulate shorter EHG measurements. Since the initial point selected for these shortened STFTs slightly affects the resulting AUC, we selected a random initial point for each shortened sample. Remarkably, the performance of our model decreased only marginally with decreasing measurement duration, as shown in Fig 3f. When we trained our model using one-minute long recordings, we could predict preterm births with an AUC of 0.71 (95% CI: 0.69–0.73), which is only slightly lower than the 0.74 (95% CI: 0.73–0.76) AUC we obtained using the entire 30-minute long recordings.

Discussion

We developed a deep learning method to predict preterm births from EHG measurements and clinical information obtained from two public databases. We predicted preterm births with good accuracy directly from the data and without using handcrafted features, manual annotations, or simultaneous tocography measurements. Thus, our method potentially enables automatic prediction of preterm births from EHG recordings.

To assess the performance of our method from the perspective of clinical practice, we compared the performance of our method with the performances reported for other technologies and methods to predict preterm births, as shown in Table 3. For this comparison, we included results only from studies published in peer-reviewed journals, with sound methodology, that reported the AUC of the predictions, and which included at least 50 pregnant mothers. Similarly to the datasets used in this work, the results reported in these studies correspond to obstetric populations excluding medically induced births. However, whereas many of these studies included either mothers with or without symptoms of preterm labor, the TPEHG DB and the TPEHGT DS contain EHG recordings obtained both during regular checkups and from mothers hospitalized with symptoms of preterm labor. Unfortunately, we could not distinguish EHGs based on whether the mothers had symptoms of preterm labor during the recordings because this information is not provided in the datasets.

thumbnail
Table 3. Accuracy of several technologies and methods to predict preterm births.

https://doi.org/10.1371/journal.pone.0285219.t003

From this comparison, we observe that the performance of our method is superior to the performance of existing methods to predict preterm births that take place before 37 complete weeks of gestation. Importantly, our method outperforms the gold standard biomarkers of preterm birth, i.e., cervical length and fibronectin alpha, in this task. Moreover, the performance of our method in predicting preterm births in mothers around their 31st week of gestation is relatively close to the performance of the gold standard tests in predicting preterm birth within only one week in mothers with symptoms of preterm labor. Our results support previous findings suggesting that preterm birth can be predicted by using EHG measurements from around the 31st week of gestation [16, 23].

Additionally, we investigated how the temporal and spectral components of the EHG measurements contribute to our model’s predictions. We observed that the higher frequency components of the EHG measurements are more predictive of preterm births. A possible explanation for this phenomenon is that the higher frequency bands contain spectral harmonics of the electrical activity in EHG measurements and more spectral information may be coded in the higher frequency bands. However, further research is needed to decipher the sources of the various spectral components of EHG measurements.

Importantly, we observed that the temporal patterns measured in EHG measurements are not crucial to predicting preterm births. This observation agrees with the results published by Iams et al., who showed that the frequency of uterine contractions is not predictive of preterm births [49]. Moreover, this observation also might explain the inability of tocography, which measures the temporal patterns of uterine contractions, to predict preterm births [13]. Inspired by this observation, and specifically by our results presented in Fig 3d, we explored whether we could use shorter EHG measurements to predict preterm births.

Notably, we found that shortening the EHG measurements did not substantially degrade the performance of our model in predicting preterm births. Our findings using short EHG measurements suggest that shorter EHG recordings could be sufficient to predict preterm births. From the perspective of clinical adoption, a shorter recording is easier for the users and saves cost [18]. Moreover, the shortened recording time combined with the automaticity of our method facilitates at-home implementations.

Whereas the classification and regression models could predict preterm births with good accuracy, surprisingly, the regression models could not predict the gestational ages at delivery accurately, as shown in S4 Fig. This effect can be explained by the pathology of preterm births and by analyzing the distribution of the gestational ages at delivery. Preterm birth is an abnormal physiological condition, not just a pregnancy that happened to end early. Therefore, we can expect that physiological measurements, such as EHG recordings, may show a stronger dichotomy between pregnancies that end with either preterm or term deliveries than is shown in continuous characteristics correlated with gestational age at delivery.

We observe the dichotomous aspect of preterm and term births through the distribution of the gestational ages at delivery, shown in S1 and S2 Figs. The distribution of the gestational ages at birth of the mothers included in this work only from the TPEHG DB is left-skewed and does not appear to follow a Gaussian distribution, as shown in the panel d of S2 Fig. This skewness may can be caused by either an excess of preterm births compared to what would be expected if the gestational ages at birth followed a Gaussian distribution and by the induction of postterm births, which can skew the distribution towards earlier deliveries. However, when we exclude the preterm births the distribution of gestational ages at birth appears to follow a Gaussian distribution, as shown in the panel h of S2 Fig. This observation suggests that the skewness results from an over-representation of preterm births rather than from the induction of postterm births. Since the gestational ages at delivery do not follow a Gaussian distribution where the left tail accounts for preterm births, we suggest that the dynamics that dictate the gestational age at delivery do not follow a continuum between preterm and term births. Therefore, we propose that predicting the gestational age at delivery is more complicated than predicting preterm births using categorical outputs.

The significance of predicting preterm births several weeks before delivery is that it can be helpful in delaying preterm births and improving their outcomes. For example, clinical providers can prescribe progesterone to these mothers to prolong their pregnancies [50, 51]. Additionally, medical providers could more frequently screen mothers at high risk of preterm birth to identify and treat hypertensive disease and cervical insufficiency [52, 53]. Moreover, anticipating preterm births can be useful in planning for the birth to take place at a hospital with a neonatal intensive care unit (NICU), rather than at home, in birthing centers, or in hospitals without a NICU, thus avoiding ambulance transport and admission delays and improving outcomes [5456]. Furthermore, identifying mothers at high risk of preterm birth may help researchers assess the efficacy of potential approaches and therapies to delay preterm births and improve their outcomes.

Although machine learning algorithms can contribute to improving healthcare and much research is yielding advances in this field, important challenges remain [57, 58]. For example, machine learning predictions usually lack interpretability, meaning that it is challenging to identify the causes justifying the algorithms’ predictions [57, 58]. In our case, although our predictions could influence pregnancy management, our predictions would need to be supplemented with additional medical examinations to determine which therapies are more likely to reduce the risk of preterm birth and improve its outcomes. Additionally, machine learning algorithms in healthcare settings need to be carefully developed to protect data privacy and to prevent social biases from driving the predictions [5860].

Despite the limitations of machine learning algorithms for developing medical devices, the number of medical products based on machine learning is steadily increasing thanks to their good performance [61]. By predicting preterm births with good accuracy directly from the measurements, while avoiding data leakage, our work is a step forward towards developing a medical device for predicting preterm births from EHG measurements using deep learning.

Our work is limited by the etiology of preterm birth and the dataset that we used to develop our models. Because preterm birth is a syndrome with many causes, it is most likely that no single physiological measurement will predict preterm births with perfect or nearly perfect accuracy [5, 62]. A combination of measurements of various physiological processes is likely to produce better results [5, 63].

The limited size of the datasets employed limits our work. We evaluated our prediction models using cross-validation rather than separating a subset of the data exclusively for testing after developing our models, because such a testing set would be too small for accurate performance evaluation [41]. For example, if we set apart 20% of the data for the final testing, this dataset would contain five preterm and 25 term samples. Moreover, all the samples in the dataset were acquired in a single hospital, and thus our model may not generalize well to measurements from mothers in different populations. Additionally, the datasets used in our work and in those mentioned in Table 3 excluded medically induced births and therefore, these populations may differ from general obstetric populations. However, Erkamp et al. found similar screening performance for preterm birth using sonographic measurements when either including or excluding medically induced births from their analysis [47]. Furthermore, we used a dataset with a larger proportion of preterm births than the general population due to the inclusion of the TPEHGT DS, which has the same number of term and preterm samples. This overrepresentation of preterm births can bias our results with respect to the expected performance in the general population, especially affecting the PPV and NPV, which depend on the incidence of preterm births. A larger database, preferably acquired across multiple healthcare centers, could rectify these limitations. Specifically, a larger database would enable us to separate a subset of samples to further evaluate the generalizability of our model. Moreover, because of the limited size of the database, we trained a small neural network with a limited number of parameters. In the future, a larger database would also enable us to train larger and more complex prediction models for better results [64].

Our work can be expanded to improve its performance and clinical value. First, following the same approach we used to combine EHG measurements with clinical data to predict preterm births, our method could incorporate other data, such as cervical length and fibronectin alpha measurements, which are likely to improve its performance. Additionally, to track the evolution of EHG activity towards birth and develop a dynamic prediction model, multiple EHG measurements could be recorded throughout pregnancy for each mother. Moreover, alternative techniques can be used for preprocessing EHG measurements that could potentially improve the performance of our prediction model [16, 65]. Lastly, our work could be integrated with models connecting surface EHGs with uterine sources to include anatomical and physiological information for making predictions [66, 67].

Conclusions

In summary, we developed a deep learning model to predict preterm births using clinical information and EHG measurements. Our method predicted preterm births more accurately than existing technologies. We also showed that preterm births can be predicted using short EHG recordings. Our work and results are useful for developing applications to predict preterm births early during pregnancy and for ultimately improving their outcomes.

Supporting information

S1 Fig. Distribution of gestational ages at the times of recording and at the times of birth.

Distribution of gestational ages at the times of recording and at the times of birth. (a) Gestational ages at birth plotted against gestational ages at the times of recording. (b) Elapsed times between recordings and births, plotted against gestational ages at recording. (a), (b) The dashed black lines separate preterm (red circles) and term births (blue diamonds).

https://doi.org/10.1371/journal.pone.0285219.s001

(TIF)

S2 Fig. Distribution of gestational ages at the times of recording and delivery.

Note dissimilar scale ranges between graph pairs. (a) Histogram of the gestational ages when the EHGs were recorded. (b) Same as a, but using only the samples used from the TPEHG DB. (c) Histogram of the gestational ages at the times of delivery. This distribution is left skewed (skewness = -1.5) and does not appear to follow a Gaussian distribution (p = 5.1 × 10−10). (d) Same as (c), but using only the samples used from the TPEHG DB. This distribution is also left skewed (skewness = -1.8) and does not appear to follow a Gaussian distribution (p = 8.5 × 10−10). (e) Histogram of the gestational ages at the times of delivery for the preterm births. Preterm births are more common at older gestational ages. (f) Same as (e), but using only the samples used from the TPEHG DB. (g) Histogram of the gestational ages at delivery for the term births. This distribution appears to more closely follow a Gaussian distribution (p = 4.5 × 10−3). (h) Same as (g), but using only the samples used from the TPEHG DB. This distribution also appears to more closely follow a Gaussian distribution (p = 4.2 × 10−2). (c), (d), (g), (h), Normality was assessed using the Shapiro-Wilk test.

https://doi.org/10.1371/journal.pone.0285219.s002

(TIF)

S3 Fig. Stratified partitioning of the data for the five-fold cross-validation.

(a) Distribution of term and preterm samples in the training set of each cross-validation fold. (b) The corresponding distributions for the test set. In each fold of the cross-validation the training and test sets contain approximately the same proportion of term and preterm samples.

https://doi.org/10.1371/journal.pone.0285219.s003

(TIF)

S4 Fig. Predictions of gestational ages at delivery using the regression models.

(a) Predicted gestational ages at birth, using the clinical information alone plotted against the true gestational ages at birth. Each blue circle shows the gestational age at delivery, predicted based on the clinical information and the true gestational age at delivery for a single mother. The solid black line represents the linear fit between the predictions and the true values. The dashed black line represents a perfect correspondence between predictions and true values. The legend shows the root mean square error (RMSE) of the predictions, the coefficient of determination (R2) of the predictions, and the slope of the linear fit. (b) Bland–Altman plot for the predicted gestational ages at birth, using the clinical information alone and the true gestational ages at birth. Each blue circle represents the difference between predicted and true gestational ages at birth, and the mean of these values. The solid and dashed black lines show the mean of the difference between the predicted and the true values, and the 95% limits of agreement, calculated as mean ± 1.96 standard deviations, respectively. (c) Similar to (a), but using the predictions based on EHG measurements alone. (d) Similar to (b), but using the predictions based on EHG measurements alone. (e) Similar to (a), but using the predictions based on clinical information combined with EHG measurements. (f) Similar to (b), but using the predictions based on clinical information combined with EHG measurements. All values are presented as mean with 95% CI.

https://doi.org/10.1371/journal.pone.0285219.s004

(TIF)

References

  1. 1. Walani SR. Global burden of preterm birth. Int J Gynaecol Obstet. 2020;150(1):31–33. pmid:32524596
  2. 2. World Health Organization, et al. Born too soon: the global action report on preterm birth. 2012;.
  3. 3. Lawn JE, Wilczynska-Ketende K, Cousens SN. Estimating the causes of 4 million neonatal deaths in the year 2000. Int J Epidemiol. 2006;35(3):706–718. pmid:16556647
  4. 4. Saigal S, Doyle LW. An overview of mortality and sequelae of preterm birth from infancy to adulthood. Lancet. 2008;371(9608):261–269. pmid:18207020
  5. 5. Green NS, Damus K, Simpson JL, Iams J, Reece EA, Hobel CJ, et al. Research agenda for preterm birth: recommendations from the March of Dimes. Am J Obstet. 2005;193(3):626–635. pmid:16150253
  6. 6. Goldenberg RL, Culhane JF, Iams JD, Romero R. Epidemiology and causes of preterm birth. Lancet. 2008;371(9606):75–84. pmid:18177778
  7. 7. Blondel B, Macfarlane A, Gissler M, Breart G, Zeitlin J. General obstetrics: Preterm birth and multiple pregnancy in European countries participating in the PERISTAT project. BJOG: Int J Obstet. 2006;113(5):528–535.
  8. 8. American College of Obstetricians and Gynecologists, et al. Prediction and prevention of spontaneous preterm birth: ACOG Practice Bulletin, Number 234. Obstet Gynecol. 2021;138(2):e65–e90. pmid:34293771
  9. 9. Honest H, Bachmann LM, Gupta JK, Kleijnen J, Khan KS. Accuracy of cervicovaginal fetal fibronectin test in predicting risk of spontaneous preterm birth: systematic review. BMJ. 2002;325(7359):301. pmid:12169504
  10. 10. Sotiriadis A, Papatheodorou S, Kavvadias A, Makrydimas G. Transvaginal cervical length measurement for prediction of preterm birth in women with threatened preterm labor: a meta-analysis. Ultrasound Obstet Gynecol. 2010;35(1):54–64. pmid:20014326
  11. 11. DeFranco EA, Lewis DF, Odibo AO. Improving the screening accuracy for preterm labor: is the combination of fetal fibronectin and cervical length in symptomatic patients a useful predictor of preterm birth? A systematic review. Am J Obstet. 2013;208(3):233–e1. pmid:23246314
  12. 12. Esplin MS, Elovitz MA, Iams JD, Parker CB, Wapner RJ, Grobman WA, et al. Predictive accuracy of serial transvaginal cervical lengths and quantitative vaginal fetal fibronectin levels for spontaneous preterm birth among nulliparous women. JAMA. 2017;317(10):1047–1056. pmid:28291893
  13. 13. The Collaborative Home Uterine Monitoring Study, et al. A multicenter randomized controlled trial of home uterine monitoring: active versus sham device. Am J Obstet. 1995;173(4):1120–1127. pmid:7485304
  14. 14. Ressel G. ACOG issues recommendations on assessment of risk factors for preterm birth. Am Fam Physician. 2002;65(3):509. pmid:11858631
  15. 15. Huber C, Shazly SA, Ruano R. Potential use of electrohysterography in obstetrics: a review article. J Matern-Fetal Neonatal Med. 2021;34(10):1666–1672. pmid:31303075
  16. 16. Xu J, Chen Z, Lou H, Shen G, Pumir A. Review on EHG signal analysis and its application in preterm diagnosis. Biomed Signal Process Control. 2022;71:103231.
  17. 17. Hao D, An Y, Qiao X, Qiu Q, Zhou X, Peng J. Development of electrohysterogram recording system for monitoring uterine contraction. J Healthc Eng. 2019;2019. pmid:31354930
  18. 18. Garcia-Casado J, Ye-Lin Y, Prats-Boluda G, Mas-Cabo J, Alberola-Rubio J, Perales A. Electrohysterography in the diagnosis of preterm birth: a review. Physiological measurement. 2018;39(2):02TR01. pmid:29406317
  19. 19. Jager F, Libenšek S, Geršak K. Characterization and automatic classification of preterm and term uterine records. PLoS One. 2018;13(8):e0202125. pmid:30153264
  20. 20. La Rosa PS, Nehorai A, Eswaran H, Lowery CL, Preissl H. Detection of uterine MMG contractions using a multiple change point estimator and the K-means cluster algorithm. IEEE Trans Biomed Eng. 2008;55(2):453–467. pmid:18269980
  21. 21. Vandewiele G, Dehaene I, Janssens O, Ongenae F, Backere FD, Turck FD, et al.; Springer. A critical look at studies applying over-sampling on the TPEHGDB dataset. Conference on artificial intelligence in medicine in Europe. 2019; p. 355–364.
  22. 22. Vandewiele G, Dehaene I, Kovács G, Sterckx L, Janssens O, Ongenae F, et al. Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling. Artif Intell Med. 2021;111:101987. pmid:33461687
  23. 23. Fele-Žorž G, Kavšek G, Novak-Antolič Ž, Jager F. A comparison of various linear and non-linear signal processing techniques to separate uterine EMG records of term and pre-term delivery groups. Med Biol Eng Comput. 2008;46(9):911–922. pmid:18437439
  24. 24. Ryu J, Park C. Time-frequency analysis of electrohysterogram for classification of term and preterm birth. IEIE Trans Smart Process Comput. 2015;4(2):103–109.
  25. 25. Janjarasjitt S. Examination of Single Wavelet-Based Features of EHG Signals for Preterm Birth Classification. IAENG Int J Comput Sci. 2017;44(2).
  26. 26. Nieto-del Amor F, Ye Lin Y, Garcia-Casado J, Díaz-Martínez MdA, González Martínez M, Monfort-Ortiz R, et al.; SCITEPRESS. Dispersion entropy: A measure of electrohysterographic complexity for preterm labor discrimination. Proc Int Conf Eng Sci Appl, Volume 4: BIOSIGNALS. 2021; p. 260–267.
  27. 27. Xu J, Wang M, Zhang J, Chen Z, Huang W, Shen G, et al. Network theory based EHG signal analysis and its application in preterm prediction. IEEE J Biomed Health Inform. 2022;26(7):2876–2887. pmid:34986107
  28. 28. Lou H, Liu H, Chen Z, Dong B, Xu J, et al. Bio-process inspired characterization of pregnancy evolution using entropy and its application in preterm birth detection. Biomed Signal Process Control. 2022;75:103587.
  29. 29. Fischer AM, Rietveld AL, Teunissen PW, Bakker PCAM, Hoogendoorn M. End-to-end learning with interpretation on electrohysterography data to predict preterm birth. Comput Biol Med. 2023;158:106846. pmid:37019011
  30. 30. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):e215–e220. pmid:10851218
  31. 31. Géron A. Hands-on machine learning with scikit-learn and tensorflow: Concepts. Tools, and Techniques to build intelligent systems. 2017; p. 60.
  32. 32. Batista AG, Najdi S, Godinho DM, Martins C, Serrano FC, Ortigueira MD, et al. A multichannel time–frequency and multi-wavelet toolbox for uterine electromyography processing and visualisation. Comput Biol Med. 2016;76:178–191. pmid:27474810
  33. 33. Raines DA, Cooper DB. Braxton Hicks Contractions. In: StatPearls [Internet]. StatPearls Publishing; 2017.
  34. 34. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–444. pmid:26017442
  35. 35. Jang HJ, Cho KO. Applications of deep learning for the analysis of medical data. Arch Pharm Res. 2019;42(6):492–504. pmid:31140082
  36. 36. Zhu W, Li X, Liu C, Xue F, Han Y. An STFT-LSTM system for P-wave identification. IEEE Geosci Remote Sens Lett. 2019;17(3):519–523.
  37. 37. Bhatti SG, Bhatti AI. Radar Signals Intrapulse Modulation Recognition Using Phase-Based STFT and BiLSTM. IEEE Access. 2022;10:80184–80194.
  38. 38. King G, Zeng L. Logistic regression in rare events data. Polit Anal. 2001;9(2):137–163.
  39. 39. Limem M, Hamdi MA. Uterine Electromyography signals denoising using discrete wavelet transform. In: 2015 International Conference on Advances in Biomedical Engineering (ICABME). IEEE; 2015. p. 101–103.
  40. 40. Hassan M, Boudaoud S, Terrien J, Karlsson B, Marque C. Combination of canonical correlation analysis and empirical mode decomposition applied to denoising the labor electrohysterogram. IEEE Trans Biomed Eng. 2011;58(9):2441–2447. pmid:21558055
  41. 41. Kim JH. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Comput Stat Data Anal. 2009;53(11):3735–3745.
  42. 42. American College of Obstetricians and Gynecologists. Methods for estimating the due date. Committee Opinion No. 700. Obstet Gynecol. 2017;129(5):e150–e4. pmid:28426621
  43. 43. Hunter LA. Issues in pregnancy dating: revisiting the evidence. J Midwifery Womens Health. 2009;54(3):184–190. pmid:19410210
  44. 44. Duryea EL, McIntire DD, Leveno KJ. The rate of preterm birth in the United States is affected by the method of gestational age assignment. Am J Obstet. 2015;213(2):231–e1. pmid:25935778
  45. 45. Trevethan R. Sensitivity, specificity, and predictive values: foundations, pliabilities, and pitfalls in research and practice. Front Public Health. 2017;5:307. pmid:29209603
  46. 46. Alexandersson A, Steingrimsdottir T, Terrien J, Marque C, Karlsson B. The Icelandic 16-electrode electrohysterogram database. Sci Data. 2015;2(1):1–9. pmid:25984349
  47. 47. Erkamp JS, Voerman E, Steegers EA, Mulders AG, Reiss IK, Duijts L, et al. Second and third trimester fetal ultrasound population screening for risks of preterm birth and small-size and large-size for gestational age at birth: a population-based prospective cohort study. BMC Med. 2020;18(1):1–12. pmid:32252740
  48. 48. Fergus P, Cheung P, Hussain A, Al-Jumeily D, Dobbins C, Iram S. Prediction of preterm deliveries from EHG signals using machine learning. PLoS One. 2013;8(10):e77154. pmid:24204760
  49. 49. Iams JD, Newman RB, Thom EA, Goldenberg RL, Mueller-Heubach E, Moawad A, et al. Frequency of uterine contractions and the risk of spontaneous preterm delivery. N Engl J Med. 2002;346(4):250–255. pmid:11807149
  50. 50. Society for Maternal-Fetal Medicine Publications Committee, et al. Progesterone and preterm birth prevention: translating clinical trials data into clinical practice. Am J Obstet. 2012;206(5):376–386. pmid:22542113
  51. 51. Barros FC, Bhutta ZA, Batra M, Hansen TN, Victora CG, Rubens CE. Global report on preterm birth and stillbirth (3 of 7): evidence for effectiveness of interventions. BMC Pregnancy Childbirth. 2010;10(1):1–36. pmid:20233384
  52. 52. Haas DM. Preterm birth. BMJ-BRIT MED J. 2011;2011. pmid:21463540
  53. 53. Goya M, Pratcorona L, Merced C, Rodó C, Valle L, Romero A, et al. Cervical pessary in pregnant women with a short cervix (PECEP): an open-label randomised controlled trial. Lancet. 2012;379(9828):1800–1806. pmid:22475493
  54. 54. Chien LY, Whyte R, Aziz K, Thiessen P, Matthew D, Lee SK, et al. Improved outcome of preterm infants when delivered in tertiary care centers. Obstet Gynecol. 2001;98(2):247–252. pmid:11506840
  55. 55. Sasaki Y, Ishikawa K, Yokoi A, Ikeda T, Sengoku K, Kusuda S, et al. Short-and long-term outcomes of extremely preterm infants in Japan according to outborn/inborn birth status. Pediatr Crit Care Med. 2019;20(10):963. pmid:31232855
  56. 56. Chung MY, Fang PC, Chung CH, Chen CC, Hwang KP, Chen FS. Comparison of neonatal outcome for inborn and outborn very low-birthweight preterm infants. Pediatr Int. 2009;51(2):233–236. pmid:19405922
  57. 57. Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinformatics. 2018;19(6):1236–1246. pmid:28481991
  58. 58. Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, et al. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018;15(141):20170387. pmid:29618526
  59. 59. Qayyum A, Qadir J, Bilal M, Al-Fuqaha A. Secure and robust machine learning for healthcare: A survey. IEEE Rev Biomed Eng. 2020;14:156–180.
  60. 60. Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med. 2018;178(11):1544–1547. pmid:30128552
  61. 61. Zhu S, Gilbert M, Chetty I, Siddiqui F. The 2021 landscape of FDA-approved artificial intelligence/machine learning-enabled medical devices: an analysis of the characteristics and intended use. Int J Med Inform. 2022;165:104828. pmid:35780651
  62. 62. Villar J, Papageorghiou AT, Knight HE, Gravett MG, Iams J, Waller SA, et al. The preterm birth syndrome: a prototype phenotypic classification. Am J Obstet. 2012;206(2):119–123. pmid:22177191
  63. 63. Glover AV, Manuck TA; Elsevier. Screening for spontaneous preterm birth and resultant therapies to reduce neonatal morbidity and mortality: A review. Semin Fetal Neonatal Med. 2018;23(2):126–132. pmid:29229486
  64. 64. Sun C, Shrivastava A, Singh S, Gupta A. Revisiting unreasonable effectiveness of data in deep learning era. Proc IEEE Int Conf Comput Vis. 2017; p. 843–852.
  65. 65. Goldsztejn U, Nehorai A. Estimating uterine activity from electrohysterogram measurements via statistical tensor decomposition. Biomed Signal Process Control. 2023;85:104899.
  66. 66. Zhang M, La Rosa PS, Eswaran H, Nehorai A. Estimating uterine source current during contractions using magnetomyography measurements. PloS One. 2018;13(8):e0202184. pmid:30138376
  67. 67. Zhang M, Tidwell V, La Rosa PS, Wilson JD, Eswaran H, Nehorai A. Modeling magnetomyograms of uterine contractions during pregnancy using a multiscale forward electromagnetic approach. PLoS One. 2016;11(3):e0152421. pmid:27019202