Dynamic classification of fetal heart rates by hierarchical Dirichlet process mixture models

Kezi Yu; J. Gerald Quirk; Petar M. Djurić

doi:10.1371/journal.pone.0185417

Abstract

In this paper, we propose an application of non-parametric Bayesian (NPB) models for classification of fetal heart rate (FHR) recordings. More specifically, we propose models that are used to differentiate between FHR recordings that are from fetuses with or without adverse outcomes. In our work, we rely on models based on hierarchical Dirichlet processes (HDP) and the Chinese restaurant process with finite capacity (CRFC). Two mixture models were inferred from real recordings, one that represents healthy and another, non-healthy fetuses. The models were then used to classify new recordings and provide the probability of the fetus being healthy. First, we compared the classification performance of the HDP models with that of support vector machines on real data and concluded that the HDP models achieved better performance. Then we demonstrated the use of mixture models based on CRFC for dynamic classification of the performance of (FHR) recordings in a real-time setting.

Citation: Yu K, Quirk JG, Djurić PM (2017) Dynamic classification of fetal heart rates by hierarchical Dirichlet process mixture models. PLoS ONE 12(9): e0185417. https://doi.org/10.1371/journal.pone.0185417

Editor: Elena Tolkacheva, University of Minnesota, UNITED STATES

Received: January 18, 2017; Accepted: September 12, 2017; Published: September 27, 2017

Copyright: © 2017 Yu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The studied data are from a publicly available website: https://physionet.org/physiobank/database/ctu-uhb-ctgdb/.

Funding: This work was supported by National Institutes of Health (US) 1R21HD080025-01A1 (https://www.nih.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Fetal heart rate (FHR), along with other physiological signals, is routinely monitored before and during labor to assess fetal health. The first fetal monitor became commercially available in 1968 [1], and ever since, electronic fetal monitoring (EFM) has been widely used in hospitals in the U.S. Its rate of use in obstetric practice has climbed from 68.4% in 1989 to 85.2% in 2002 [2].

Nowadays, the evaluation of FHR signals is primarily performed visually by experienced physicians, following guidelines published by various medical institutions including the National Institute of Child Health and Human Development (NICHD) [3] and the International Federation of Gynecology and Obstetrics (FIGO) [4]. These guidelines define different patterns of FHR, such as baseline, variability, acceleration and deceleration. Based on the combination of appearances of certain patterns, the FHR tracings are classified into three classes, “normal”, “indeterminate” and “abnormal.”

Notwithstanding the long presence of FHR tracings in obstetrics, their use for assessing the well-being of fetuses has constantly been questioned. For example, in a recent study, it has been reported that the subjective assessment of FHR tracings exhibits large inter- and intra-variability [5]. The study has also shown that the sensitivity of clinicians’ majority vote to objective outcomes was only 39%. This and similar findings suggest that the high false positive rates have led to increase in the rate of cesarean section deliveries [6], which altogether have put the benefits of using EFM under criticism [7].

The deficiencies of subjective assessment of FHRs raise the need for modern and computerized methods for their processing. Such methods would be able to provide objective and consistent evaluation and to capture hidden dynamics in FHR signals, which are often too challenging for human’s eyes’ inspection. Furthermore, machine learning techniques have been proved to be extremely successful in real-world applications in various fields in recent years. The advances in machine learning have been also reflected in research on FHR classification. This research has produced a number of newly proposed computerized methods.

In one approach, in addition to morphological features proposed in the guidelines [3], for quantifying the underlying patterns of FHR, advanced features extraction algorithms were applied. In [8], the authors worked with several linear and nonlinear features. The former included short-term and long-term variabilities whereas the latter were features related to power spectra and entropy. In [9], a more comprehensive collection of non-linear features were used to model the non-linearity of FHR signals. These features included fractal dimension, approximate entropy, sample entropy, and the Lempel Ziv complexity. The classification of the features was carried out by a support vector machine (SVM) algorithm.

SVMs have not been in the only machine learning methodology for classification of FHR signals. In [10], artificial neural networks were employed as the classifier with 6 FHR features and 6 clinical variables as inputs. Two types of generative models, naïve Bayes and hidden Markov models, were implemented in [11], which were novel attempts because the majority of the methods in the literature were based on discriminative models. In [12], the authors explored the performance of linear regression and SVMs with different kernels. This work also included feature selection and the use of reduction methods such as random forest and principle component analysis (PCA).

The search for better solutions based on machine learning algorithms with more flexibility and robustness as well as better overall performance has continued. Hierarchical Dirichlet process (HDP) mixture models [13], for instance, free the classic mixture models from fixing the number of mixing components, and allow for modeling of grouped data jointly. These models exhibit excellent performance in areas such as information retrieval and topic modeling [14].

In this paper, for classifying FHR signals, we propose two novel approaches based on HDP. We describe the underlying principles of the approaches and show on real-world data that they have very good performance in terms of accuracy and probabilistic interpretations of the results.

The paper is organized as follows. First, we provide some background on non-parametric Bayesian statistical models used in the paper, and we discuss their advantages over traditional parametric models (Hierarchical Dirichlet Process Mixture Models). Then, in the subsequent section, Features for Classification, we describe the set of features we used in our experiments. The details of our experimental setting, including the database, the pre-processing of the data, the dimensionality reduction of the feature space, and the performance assessment are explained in the section Experimental Settings. Results obtained by our approach and comparisons with an existing method are provided in the section Results. In the last section, we discuss the results and make final conclusions.

The main contribution of this paper lies in the novelty in applying hierarchical Dirichlet process-based mixture models to FHR classification. In our previous research, we explored a set of suitable features for our models and obtained several preliminary results with HDP mixture models [15, 16]. Here, we continue to explore the potential of the models and present classification results with probabilistic interpretations. An important extension of our work is the use of the time-varying models from [17] to achieve dynamic real-time classification.

Hierarchical Dirichlet process mixture models

In this section, we describe the two mixture models that we implemented in our experiments, the well-known HDP mixture model and our time-varying modification of it. We explain how one can generate data by using the models and then how one can conduct inference about the model unknowns from data that are generated by the model.

Notation

In the problems of our interest, the observations are organized into groups. We adopt the notation from [13], where x_j,i denotes the ith observation in the jth group. We consider that each observation is drawn independently from a mixture model. In the context of FHR problems, each group x_j = (x_j,1, x_j,2, …) corresponds to features of one FHR recording, and each observation x_j,i to features of one segment of the recording.

Models

We start with the HDP mixture models proposed in [13] and then describe their modification proposed in [17] to accommodate for time-evolving statistics of the data.

Hierarchical Dirichlet process mixture models.

A hierarchical Dirichlet process defines a set of random probability measures G_j linked to a global random probability measure G₀. Specifically, G₀ is distributed as a Dirichlet process (DP) with concentration parameter γ and base probability measure H, i.e., (1) The random measures G_j’s are conditionally independent given G₀, and distributed according to (2) where α is also a concentration parameter. We explain this model and its extension to mixture models by way of the Chinese restaurant franchise (CRF) metaphor.

Suppose that there is a restaurant franchise with a shared menu across the restaurants. With x_j,i we denote the ith customer in the jth restaurant, and with θ_j,i, the dish type served to this customer. In this setup, the customers correspond to the observations x_j,i, a restaurant corresponds to an FHR recording x_j and the dish type to a parameter set of a distribution used for drawing the observations. The index z_j,i is the index of such parameter set and associated with the observation x_j,i.

Next, we introduce K iid random variables ϕ₁, …, ϕ_K, which represent global dishes and which are distributed according to H. Each customer x_j,i is seated at a table, denoted by t_j,i, and each table is paired with one dish ϕ_k. Furthermore, let ψ_j,t represent the dish served on table t in restaurant j, and k_j,t be the indicator of the dish served on table t in restaurant j. For example, t_3,4 = 6 means that customer 4 in restaurant 3 sits at table 6, ψ_3,6 = ϕ_{k_3,6} signifies that on table 6 in restaurant 3 dish ϕ_{k_3,6} is served, where k_3,6 ∈ {1, 2, ⋯, K}. With this notation, we have that θ_3,4 = ϕ_{z_3,4}, where z_3,4 ∈ {1, 2, ⋯, K}. Note the difference between k_j,t and z_j,i. The former is the index of the dish served in restaurant j on table t, and the latter, the index of the dish served to customer i in restaurant j.

We also need a notation for counts. With n_j,t,k we denote the number of customers in restaurant j at table t serving dish k, and with m_j,k, the number of tables in restaurant j serving dish k. We represent marginal counts by dots. For example, n_j,⋅,k represents the number of customers in restaurant j eating dish k; m_j,⋅ represents the number of tables in restaurant j, and so on. At each table in each restaurant, one dish from the menu is ordered by the first customer at that table and shared by the remaining customers sitting at the same table.

A customer entering a restaurant can either choose an occupied table according to a probability proportional to the number of customers already seated at the table, or get a new table with a probability determined by the concentration parameter α. Specifically, in restaurant j, the ith customer chooses a dish (and thereby a table) according to (3) where δ_{ψ_j,t} is probability measure concentrated at ψ_j,t. If a customer chooses an existing table, say t, then we increment n_j,t by one, and set θ_j,i = ψ_j,t, t_j,i = t, and z_j,i = k_j,t. If a new table is chosen, then we increment m_j,⋅ by one, draw the dish for that table ψ_{j,m_j,⋅+1} ∼ G₀ and set θ_ji = ψ_{j,m_j,⋅+1}, t_j,i = m_j,⋅ + 1, and z_j,i = k_{j,m_j,⋅+1}, where k_{j,m_j,⋅+1} is the index of the drawn dish from G₀.

Now let us consider the dish-level distributions. Similarly, a table can be served with an existing dish with probability proportional to the number of tables already serving the dish in the whole franchise, or with a new dish with probability determined by the concentration parameter γ. To be specific, the probability distribution of table t in restaurant j serving a particular dish is given by (4) If an existing dish is served, i.e., k_j,t ∈ {1, 2, ⋯, K}, we increment the count of that dish, m_{⋅,k_j,t}, by one, and set ψ_j,t = ϕ_{k_j,t}. If we choose a new dish, then we increment K by one. We also draw the new dish by ϕ_K+1 ∼ H, and set k_j,t = K + 1.

This completes the description of the CRF metaphor. We summarize the variables, their meanings and how they relate to our problem in Table 1. We reiterate that the dishes are shared among the restaurants, which corresponds to a key property of the HDP.

Download:

Table 1. Meaning of the variables of the HDP process and their relationship to the FHR classification problem.

https://doi.org/10.1371/journal.pone.0185417.t001

The HDP mixture model is a non-parametric Bayesian approach to data processing. It aims at modeling grouped data jointly, where each group (segment features of an FHR recording) is associated with a mixture model, and all the mixing components are shared across the groups (different FHR recordings share features). We assume that each dish type ϕ_k defines a mixing component that is used for generating actual dishes (features). We denote the generating distribution of the features by F(ϕ_k). In summary, each observed feature x_j,i (the features of segment i of recording j) is generated by (5) where ϕ_{z_j,i} is the parameter of the feature distribution, and z_j,i is the index that defines the parameter. By setting the F’s to be Gaussian distributions, we obtain a Gaussian mixture model with HDP as the prior.

Chinese restaurant franchise with finite capacity.

Now we consider a modified version of the CRF that was proposed in [18]. Assume that each restaurant has a limited capacity of accommodating customers, and without loss of generality, we assume that it is N for all the restaurants. Before the number of customers reaches that limit, the process is the same as in the CRF metaphor. After a restaurant is “full,” a new customer can come in and be seated only after the “oldest” customer leaves the restaurant. Then, the ith customer in the jth restaurant, where i > N, chooses a dish by (6) where the * notation represents the changes after the oldest customer (the (i − N)th of restaurant j) leaves. Similarly, we update the table counts after the table is chosen. In addition, the probability that table t in restaurant j serves a particular dish type is (7) After the dish is selected, the dish counts are updated accordingly.

We call this new process “Chinese restaurant franchise with finite capacity” (CRFC). The CRFC is designed to model grouped time-varying data, and capture the underlying dynamics. Simulation results on how the CRFC mixture model finds the cluster assignments of data over time can be found in [17].

Inference

We describe a Markov chain Monte Carlo (MCMC) sampling scheme for estimating the parameters of the HDP and CRFC mixture models. This is a Gibbs sampling scheme based on the CRF [13]. To simplify the inference, the base distribution H is assumed to be conjugate to the data distribution F. For the non-conjugate case, the sampling approach can be adapted from techniques developed for non-conjugate DP mixtures [19]. In addition, here we assume known values for the concentration parameters α and γ. When they are unknown, we describe a sampling scheme for them in a later section. In the sequel, the notation x^−ij signifies x = (x_j′i′: all j′i′ except j, i), i.e., x^−j,i = x\x_j,i. Similarly, t^−i,j = t\t_j,i and k^−j,t = k\k_j,t. To make the sampling more efficient, instead of directly dealing with the x_j,is and z_j,ts, we sample their indicator variables t_j,i and k_j,t. We first describe the sampling of t and then the sampling of k.

Sampling t.

The prior probability of t_j,i taking an occupied table is proportional to according to Eq (3), where, as before, the notation ^−j,i means the corresponding variable is removed from a set or a count. And the prior probability of t_j,i taking a new value is proportional to α. More specifically, t_j,i is sampled from (8) where represents the likelihood of sample x_j,i belonging to an existing mixture component k_j,t given all the other data, and is given by (9) If a new table is chosen, i.e., t_j,i = t^new, we need to draw a dish k_j,t^new for t^new, and the probability is (10) where (11) is the prior density of x_j,i. Therefore, the likelihood of a customer choosing a new table is (12)

During sampling, some n_jt. may become zero, i.e., the corresponding table t may become unoccupied. Then we need to update the corresponding dish count m_⋅,k, which may result in deleting some mixture component if m_⋅,k = 0.

Sampling k.

The likelihood of setting k_j,t = k is given by , where x_j,t represents all the x_j,is such that t_j,i = t, and is the conditional density of x_j,t given all the data related to component k without x_j,t. For the conditional probability of k_j,t, we can write (13)

The inference of CRFC mixture models can easily be obtained by changing the prior probabilities of the indicator variables t_j,i and k_j,t.

Features for classification

Here we present the complete list of features of FHR traces that we used for classification. As mentioned before, feature extraction has attracted much attention in the field of FHR analysis. The features can roughly be divided into three categories: time domain, frequency domain, and non-linear features. Time-domain features measure the variability of FHR signals in various forms, whereas the frequency domain features usually describe the powers in different frequency bands. Non-linear features quantify the non-linearity of FHR, e.g., with entropy and fractal dimension.

In our experiments, we divided the FHR series into non-overlapping segments, with length ranging from 40 to 120 samples. Then, from each segment we extracted one feature vector. This vector did not contain nonlinear features. Instead, it had 9 features from the time domain, and five from the frequency domain. The reason for not including non-linear features is that their reliable estimation usually requires much longer segments [9]. For example, the approximate entropy is applicable when the data series are longer than 100 samples [20].

In summary, we used only linear features from the time and frequency domains that are known from the literature on fetal heart rate processing. In the classification, we used 14 features, which are described in the next two subsections. However, the classifier operated in a feature space with reduced dimension and obtained via principle component analysis (PCA), as explained in the next section.

Time-domain features

They include the mean and the standard deviation of the segment s_ji. In addition, we also use the short-term variability (STV) and long-term variability (LTV), which are defined in [8] as (14) (15) where s(k), k = 1, …, K represents one segment of FHR series, K is the number of samples in each segment and M is the number of minutes of the segment. STV and LTV essentially quantify the changes of FHR series in different forms.

On the feature list, we also have the short-term irregularity (STI) and long-term irregularity (LTI) from [21] and defined by (16) (17) where IQR stands for inter-quartile range with k = 1, …, K. In essence, STI and LTI describe the variability of FHR series too.

The other features are the standard descriptors of the Poincaré plot, SD1 and SD2, as well as the complex correlation measure (CCM) proposed in [22], which are defined by (18) (19) where γ_RR(0) and γ_RR(1) are the autocorrelation functions for lags 0 and 1 of the RR intervals, and being the mean of the RR intervals. The RR intervals are another representation of FHR, which stands for beat-to-beat interval and that can be obtained by (20) CCM, on the other hand, is a function of several lags of the autocorrelation functions of the RR intervals, or more specifically, (21) where C_n is a normalizing constant, defined as C_n = π × SD1 × SD2, and m is an integer. In our experiments, we set m = 1. These features are different types of descriptors of FHR variability.

Frequency-domain features

These features represent powers in four frequency bands: very low frequency (VLF: 0–0.06 Hz), low frequency (LF: 0.06–0.3 Hz), medium frequency (MF: 0.3–1 Hz) and high frequency (HF: 1–2 Hz). In addition, they also include the ratio of powers of two bands LF/(MF+HF). The frequency-domain features represent the underlying physiological activity of either the mother or the fetus. It is worth noting that there is no consensus on how to define the frequency bands. In our experiments, we used the ranges from [23].

The complete list of features is shown in Table 2.

Download:

Table 2. List of all the features.

https://doi.org/10.1371/journal.pone.0185417.t002

Experimental settings

In this section, we describe in detail our experiments of classifying FHR signals using non-parametric Bayesian models.

Database

In our work, we used the open-access cardiotocography (CTG) database collected from the Czech Technical University (CTU) and University Hospital in Brno (UHB) [24]. This database contains 552 CTG recordings, each comprising an FHR time tracing and a uterine contraction (UC) signal, both sampled at 4 Hz. All recordings start at a maximum of 90 minutes before delivery. Fetal outcome data, which include measurements of umbilical artery blood samples and Apgar scores evaluated at 1 and 5 minutes after delivery, are available for assessment purposes. Additional fetal and delivery information, such as sex, weight, type of delivery, are also collected. More details on the data collection can be found in [25].

Pre-processing and segmentation

The acquisition of FHR signals suffers from different kinds of artifacts, which are generally caused by maternal and fetal movements or displacements of the transducer used in the acquisition. There are two types of artifacts, either the measured samples are incorrect or they are simply missing (the values are equal to 0). Therefore, the FHR signals have to be pre-processed before they are used for analysis.

In practice, any successive samples with differences greater than 25 bpm are considered as artifacts. All artifacts, including missing data with duration less than 15 seconds, are interpolated by piecewise cubic Hermite polynomial method. If the duration is longer than 15 seconds, they are simply discarded. Fig 1 shows an example of an FHR series before and after pre-processing.

Download:

Fig 1. Comparison between a raw FHR signal and the signal obtained after pre-processing.

https://doi.org/10.1371/journal.pone.0185417.g001

Out of 552 FHR recordings, we selected a balanced dataset with the same number of recordings and labeled as healthy and unhealthy. The labels were defined by the following criteria: an FHR recording is healthy if its associated umbilical cord pH value is greater than a threshold τ₀, and it is labeled as unhealthy if the pH value is less than or equal to τ₁. There is no consensus on the exact values of the thresholds, so we experimented with τ₀ = 7.2 and both τ₁ = 7.05 as in [9] and τ₁ = 7.1 as in [10]. The number of recordings N in the selected dataset ended up with 88 and 122 respectively.

In our experiments, the last M-minute data of the FHR recordings were analyzed. Each recording was divided into non-overlapping segments of l seconds, where l ranged from 10 to 30 seconds. Thus, the number of segments in each series was m, where m = M × 60/l. For each segment s_ji, which is the i-th segment in the j-th recording, a feature vector x_ji of dimension d was extracted.

Dimensionality reduction

As described in Section 1, in our experiments, each feature vector has 14 dimensions. High dimensionality is usually difficult to deal with, specifically in terms of issues such as computational costs and convergence in Gibbs sampling. Hence, before training the models, we reduced the dimension of the feature space from 14 to q by way of principle component analysis (PCA) [26].

Since PCA is sensitive to the scales of different dimensions of input data and the ranges of feature values in each dimension can vary largely, we scaled these values into the interval (−1, 1) before applying PCA. After scaling, we computed the variance ratio of each component. An example of PCA results of all the data when the number of recordings N = 88 and the segment length l = 10 is shown in Fig 2. The gray bars are the explained variance ratios of each principle component, and the blue line represents the cumulative variance ratio.

Download:

Fig 2. Explained variance ratio as a function of number of principle components.

https://doi.org/10.1371/journal.pone.0185417.g002

According to the preliminary analysis of all the data, we concluded that most of the variance lies in the first 4 principle components. Therefore, we experimented with different choices of q = 2, 3, and 4. Note that in each iteration of cross-validation, only the training data were used to obtain the linear transformation matrix, and the testing data were transformed accordingly.

Model priors

The HDP and CRFC mixture model both have two concentration parameters, γ and α, as described in Section 1. Instead of assigning fixed values to them, we implemented an auxiliary sampler provided in [13] to infer them. In our experiments, the concentration parameters were given gamma priors, γ ∼ gamma(1, 1) and α ∼ gamma(10, 1). Therefore, in our experiments, we needed to choose only two variables: the segment length l and the feature dimension q after PCA.

Classification process

The process of using HDP based models to classify FHR tracings is as follows. The last 30-minute data were used in the classification tasks. During the training stage, two HDP Gaussian mixture models (HDPGMs), and , were constructed from the FHR recordings and labeled as healthy and unhealthy, respectively. For estimation of the models’ parameters, we implemented the collapsed Gibbs sampler (proposed in [13]). During the testing stage, given a new FHR tracing x_j, the classification is made by comparing the likelihoods L₀ and L₁, which are defined by (22) If L₀ > L₁, the FHR series is classified as healthy and vice versa. Note that here we assume that the priors of the fetuses were equal.

In using the CRFC Gaussian mixture models, first we set a window length M_win equal to 30 minutes. Essentially, this is equivalent to the restaurant capacity in the CRF metaphor. We analyzed the last 45 minutes of FHR recordings. Two models, and were initiated from the first 30-minute data (i.e., the last 45 to 15 minutes from the original FHR series) from the respective groups. At each time instance, we moved the window by one segment, and trained the models by adding new data and removing the oldest data. The likelihoods of being healthy and unhealthy, L₀ and L₁, were computed similarly to (22).

We define the probabilities of FHR series corresponding to healthy or unhealthy fetuses, denoted as p₀ and p₁, by (23) where m is the number of segments in each FHR series. We call this method the “naïve approach”. A modified version of the probabilities is defined as follows. (24) where (25) and w_i’s are weights defined as (26) where u_i is the percentage of data that are not interpolated in the i-th segment, which is a measure of signal quality. We call this method the “weighted approach”.

Performance assessment

We assessed the classification performance of the models with the standard metrics, true positive rate (TPR) and true negative rate (TNR). We also used the weighted relative accuracy (WRA) [27], which is defined by WRA = 4 × cost × (TPR − FPR)/(1 + cost)², where FPR represents false positive rate. In this study, we assigned the cost to 1.

To fully utilize the dataset and avoid the bias caused by randomly selecting training/testing data, we used the 5-fold cross-validation (CV) method for performance assessment. At each iteration, 80% of the data were used for training and the rest for testing. The outcome metrics were averaged across all iterations and the mean values were reported.

Results

In this section, we first provide the classification performance of HDPGMs and the comparison with that of SVMs, which achieved the best performance in studies [9, 12]. Then we show the real-time classification of FHR tracings by models based on CRFC.

Performance by HDPGMs

As described in Section 1, we experimented with two different thresholds τ₁ that delineate the non-healthy group of fetuses. By setting τ₁ = 7.05, the number of recordings with total length exceeding 30 minutes N is 88, and for τ₁ = 7.1, N equals 122. After segmentation, feature extraction and PCA, the dataset was transformed to N groups of data, each group containing m observations of dimension q. We experimented with different choices of segment length l and dimension q. The results, with the best performance highlighted in bold font, are provided in Table 3,.

Download:

Table 3. Performance of HDPGMs.

https://doi.org/10.1371/journal.pone.0185417.t003

The same datasets were used to test the SVM-based method. The classification process was as follows: instead of segmentation, the 14 features were extracted from the whole FHR series of the last 30 minutes. The feature vectors were scaled to the range (−1, 1), and then used as input to the SVMs classifier. The SVMs classification algorithms had two free parameters: cost C and γ. We searched for the optimal combination of these parameters in terms of the testing performance metric, WRA. Five-fold CV method was used to eliminate biases. The results obtained by SVMs are shown in Table 4.

Download:

Table 4. Peformance of SVMs.

https://doi.org/10.1371/journal.pone.0185417.t004

By comparing the results in Tables 3 and 4, we conclude that in both cases of τ₁, the proposed method outperformed the SVM-based method.

Real-time classification

In this experiment, we set the threshold τ₁ = 7.05. The number of FHR recordings of total lengths greater than 45 minutes is 70. We randomly chose 60 recordings for training the CRFC Gaussian mixture models, and the rest for testing. Due to lack of labels of FHR recordings at each time instant, we assumed that the training series stayed in the same group for the whole duration. At each time instant, we computed the probability of the FHR series being associated with a healthy fetus. We used both, the naïve and the weighted methods. Fig 3 shows the changes of probabilities of different FHR recordings being healthy over time. The corresponding pH values are given in the legends. The left three figures are the probabilities obtained by the naïve approach and the right are obtained by the weighted approach. The figures in the different rows correspond to different experimental settings.

Download:

Fig 3. One instance of real-time classification results.

The X axis represents time and the Y axis represents probability.

https://doi.org/10.1371/journal.pone.0185417.g003

From the results, we can observe small differences between the two approaches. The probabilities obtained by different experimental settings are not identical but agree with each other in terms of the overall trend.

Conclusion

In this paper, we implemented the hierarchical Dirichlet process mixture model and its variation in classifying fetal heart rate tracings. In our method, we employed 14 features that have been used in the literature before. We showed that our method outperformed the state-of-the-art algorithm in terms of weighted relative accuracy when using the same feature set. Furthermore, we demonstrated how our method can be adapted to online learning of data and computing the probability of a fetus being healthy in real-time.

The merits of non-parametric Bayesian models, as shown in our experiments, are being free from parameter-tuning and model selection. In addition, the experiment results suggested that our methods were able to accurately model the FHR data. On the other hand, the Chinese restaurant franchise with finite capacity models are able to process data of a fixed length sequentially. Therefore, if applied in real-world scenarios, the CRFC model can evaluate the FHR data with time and provide the physicians with real-time estimates of the fetal status. However, in our experiments, since the true online fetal health information was unavailable, we were unable to validate how our method performed.

Acknowledgments

The authors gratefully acknowledge the support of NIH under Award 1R21HD080025-01A1.

References

1. Freeman RK, Garite TJ, Nageotte MP, Miller LA. Fetal heart rate monitoring. Lippincott Williams & Wilkins; 2012. p. 1–7.
2. Martin JA, Hamilton BE, Sutton PD, Ventura SJ, Menacker F, Munson ML, et al. Births: final data for 2002. National vital statistics reports. 2003;52(10):1–113. pmid:14717305
- View Article
- PubMed/NCBI
- Google Scholar
3. Macones GA, Hankins GD, Spong CY, Hauth J, Moore T. The 2008 National Institute of Child Health and Human Development workshop report on electronic fetal monitoring: Update on definitions, interpretation, and research guidelines. Journal of Obstetric, Gynecologic, & Neonatal Nursing. 2008;37(5):510–515.
- View Article
- Google Scholar
4. Rooth G, Huch A, Huch R. FIGO News: Guidelines for the use of fetal monitoring. Int J Gynecol Obstet. 1987;25:159–67.
- View Article
- Google Scholar
5. Hruban L, Spilka J, Chudáček V, Janků P, Huptych M, Burša M, et al. Agreement on intrapartum cardiotocogram recordings between expert obstetricians. Journal of evaluation in clinical practice. 2015;21(4):694–702. pmid:26011725
- View Article
- PubMed/NCBI
- Google Scholar
6. Garite TJ, Dildy GA, McNamara H, Nageotte MP, Boehm FH, Dellinger EH, et al. A multicenter controlled trial of fetal pulse oximetry in the intrapartum management of nonreassuring fetal heart rate patterns. American journal of obstetrics and gynecology. 2000;183(5):1049–1058. pmid:11084540
- View Article
- PubMed/NCBI
- Google Scholar
7. Banta HD, Thacker SB. Historical Controversy in Health Technology Assessment∷ The Case of Electronic Fetal Monitoring. Obstetrical & gynecological survey. 2001;56(11):707–719.
- View Article
- Google Scholar
8. Gonçalves H, Rocha AP, Ayres-de Campos D, Bernardes J. Linear and nonlinear fetal heart rate analysis of normal and acidemic fetuses in the minutes preceding delivery. Medical and Biological Engineering and Computing. 2006;44(10):847–855. pmid:16988896
- View Article
- PubMed/NCBI
- Google Scholar
9. Spilka J, Chudáček V, Kouckỳ M, Lhotská L, Huptych M, Janku P, et al. Using nonlinear features for fetal heart rate classification. Biomedical Signal Processing and Control. 2012;7(4):350–357.
- View Article
- Google Scholar
10. Georgieva A, Payne SJ, Moulden M, Redman CW. Artificial neural networks applied to fetal monitoring in labour. Neural Computing and Applications. 2013;22(1):85–93.
- View Article
- Google Scholar
11. Dash S, Quirk JG, Djurić PM. Fetal heart rate classification using generative models. IEEE Transactions on Biomedical Engineering. 2014;61(11):2796–2805. pmid:24951678
- View Article
- PubMed/NCBI
- Google Scholar
12. Xu L, Redman CW, Payne SJ, Georgieva A. Feature selection using genetic algorithms for fetal heart rate analysis. Physiological measurement. 2014;35(7):1357. pmid:24854596
- View Article
- PubMed/NCBI
- Google Scholar
13. Teh YW, Jordan MI, Beal MJ, Blei DM. Hierarchical Dirichlet processes. Journal of the American Statistical Association. 2012;101:1566–1581.
- View Article
- Google Scholar
14. Teh YW, Jordan MI. Hierarchical Bayesian nonparametric models with applications. Bayesian Nonparametrics. 2010;1:158–207.
- View Article
- Google Scholar
15. Yu K, Quirk JG, Djurić PM. Fetal heart rate analysis by hierarchical Dirichlet process mixture models. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2016. p. 709–713.
16. Yu K, Quirk JG, Djurić PM. Fetal heart rate classification by non-parametric Bayesian methods. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2017.
17. Yu K, Djurić PM. Dirichlet process mixture models for time-dependent clustering. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2016. p. 4383–4387.
18. Djurić PM, Yu K. On generative models for sequential formation of clusters. In: Signal Processing Conference (EUSIPCO), 2015 23rd European. IEEE; 2015. p. 2786–2790.
19. Neal RM. Bayesian mixture modeling. In: Maximum Entropy and Bayesian Methods. Springer; 1992. p. 197–211.
20. Pincus S. Approximate entropy (ApEn) as a complexity measure. Chaos: An Interdisciplinary Journal of Nonlinear Science. 1995;5(1):110–117.
- View Article
- Google Scholar
21. De Haan J, Van Bemmel J, Versteeg B, Veth A, Stolte L, Janssens J, et al. Quantitative evaluation of fetal heart rate patterns: I. Processing methods. European Journal of Obstetrics & Gynecology. 1971;1(3):95–102.
- View Article
- Google Scholar
22. Karmakar CK, Khandoker AH, Gubbi J, Palaniswami M. Complex Correlation Measure: a novel descriptor for Poincaré plot. Biomedical engineering online. 2009;8(1):1.
- View Article
- Google Scholar
23. Signorini MG, Magenes G, Cerutti S, Arduini D. Linear and nonlinear parameters for the analysisof fetal heart rate signal from cardiotocographic recordings. IEEE Transactions on Biomedical Engineering. 2003;50(3):365–374. pmid:12669993
- View Article
- PubMed/NCBI
- Google Scholar
24. Chudácek V, Spilka J, Burša M, Janků P, Hruban L, Huptych M, et al. Open access intrapartum CTG database. BMC pregnancy and childbirth. 2014;14(1):1.
- View Article
- Google Scholar
25. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. Physiobank, Physiotoolkit, and Physionet components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):e215–e220. pmid:10851218
- View Article
- PubMed/NCBI
- Google Scholar
26. Jolliffe I. Principal component analysis. Wiley Online Library; 2002.
27. Kohavi R, et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai. vol. 14; 1995. p. 1137–1145.

[ref1] 1. Freeman RK, Garite TJ, Nageotte MP, Miller LA. Fetal heart rate monitoring. Lippincott Williams & Wilkins; 2012. p. 1–7.

[ref2] 2. Martin JA, Hamilton BE, Sutton PD, Ventura SJ, Menacker F, Munson ML, et al. Births: final data for 2002. National vital statistics reports. 2003;52(10):1–113. pmid:14717305
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. Macones GA, Hankins GD, Spong CY, Hauth J, Moore T. The 2008 National Institute of Child Health and Human Development workshop report on electronic fetal monitoring: Update on definitions, interpretation, and research guidelines. Journal of Obstetric, Gynecologic, & Neonatal Nursing. 2008;37(5):510–515.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref4] 4. Rooth G, Huch A, Huch R. FIGO News: Guidelines for the use of fetal monitoring. Int J Gynecol Obstet. 1987;25:159–67.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref5] 5. Hruban L, Spilka J, Chudáček V, Janků P, Huptych M, Burša M, et al. Agreement on intrapartum cardiotocogram recordings between expert obstetricians. Journal of evaluation in clinical practice. 2015;21(4):694–702. pmid:26011725
View Article
PubMed/NCBI
Google Scholar

[13] View Article

[14] PubMed/NCBI

[15] Google Scholar

[ref6] 6. Garite TJ, Dildy GA, McNamara H, Nageotte MP, Boehm FH, Dellinger EH, et al. A multicenter controlled trial of fetal pulse oximetry in the intrapartum management of nonreassuring fetal heart rate patterns. American journal of obstetrics and gynecology. 2000;183(5):1049–1058. pmid:11084540
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref7] 7. Banta HD, Thacker SB. Historical Controversy in Health Technology Assessment∷ The Case of Electronic Fetal Monitoring. Obstetrical & gynecological survey. 2001;56(11):707–719.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Gonçalves H, Rocha AP, Ayres-de Campos D, Bernardes J. Linear and nonlinear fetal heart rate analysis of normal and acidemic fetuses in the minutes preceding delivery. Medical and Biological Engineering and Computing. 2006;44(10):847–855. pmid:16988896
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref9] 9. Spilka J, Chudáček V, Kouckỳ M, Lhotská L, Huptych M, Janku P, et al. Using nonlinear features for fetal heart rate classification. Biomedical Signal Processing and Control. 2012;7(4):350–357.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref10] 10. Georgieva A, Payne SJ, Moulden M, Redman CW. Artificial neural networks applied to fetal monitoring in labour. Neural Computing and Applications. 2013;22(1):85–93.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref11] 11. Dash S, Quirk JG, Djurić PM. Fetal heart rate classification using generative models. IEEE Transactions on Biomedical Engineering. 2014;61(11):2796–2805. pmid:24951678
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref12] 12. Xu L, Redman CW, Payne SJ, Georgieva A. Feature selection using genetic algorithms for fetal heart rate analysis. Physiological measurement. 2014;35(7):1357. pmid:24854596
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref13] 13. Teh YW, Jordan MI, Beal MJ, Blei DM. Hierarchical Dirichlet processes. Journal of the American Statistical Association. 2012;101:1566–1581.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref14] 14. Teh YW, Jordan MI. Hierarchical Bayesian nonparametric models with applications. Bayesian Nonparametrics. 2010;1:158–207.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref15] 15. Yu K, Quirk JG, Djurić PM. Fetal heart rate analysis by hierarchical Dirichlet process mixture models. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2016. p. 709–713.

[ref16] 16. Yu K, Quirk JG, Djurić PM. Fetal heart rate classification by non-parametric Bayesian methods. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2017.

[ref17] 17. Yu K, Djurić PM. Dirichlet process mixture models for time-dependent clustering. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2016. p. 4383–4387.

[ref18] 18. Djurić PM, Yu K. On generative models for sequential formation of clusters. In: Signal Processing Conference (EUSIPCO), 2015 23rd European. IEEE; 2015. p. 2786–2790.

[ref19] 19. Neal RM. Bayesian mixture modeling. In: Maximum Entropy and Bayesian Methods. Springer; 1992. p. 197–211.

[ref20] 20. Pincus S. Approximate entropy (ApEn) as a complexity measure. Chaos: An Interdisciplinary Journal of Nonlinear Science. 1995;5(1):110–117.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref21] 21. De Haan J, Van Bemmel J, Versteeg B, Veth A, Stolte L, Janssens J, et al. Quantitative evaluation of fetal heart rate patterns: I. Processing methods. European Journal of Obstetrics & Gynecology. 1971;1(3):95–102.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref22] 22. Karmakar CK, Khandoker AH, Gubbi J, Palaniswami M. Complex Correlation Measure: a novel descriptor for Poincaré plot. Biomedical engineering online. 2009;8(1):1.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref23] 23. Signorini MG, Magenes G, Cerutti S, Arduini D. Linear and nonlinear parameters for the analysisof fetal heart rate signal from cardiotocographic recordings. IEEE Transactions on Biomedical Engineering. 2003;50(3):365–374. pmid:12669993
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref24] 24. Chudácek V, Spilka J, Burša M, Janků P, Hruban L, Huptych M, et al. Open access intrapartum CTG database. BMC pregnancy and childbirth. 2014;14(1):1.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref25] 25. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. Physiobank, Physiotoolkit, and Physionet components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):e215–e220. pmid:10851218
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref26] 26. Jolliffe I. Principal component analysis. Wiley Online Library; 2002.

[ref27] 27. Kohavi R, et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai. vol. 14; 1995. p. 1137–1145.

Figures

Abstract

Introduction

Hierarchical Dirichlet process mixture models

Notation

Models

Hierarchical Dirichlet process mixture models.

Chinese restaurant franchise with finite capacity.

Inference

Sampling t.

Sampling k.

Features for classification

Time-domain features

Frequency-domain features

Experimental settings

Database

Pre-processing and segmentation

Dimensionality reduction

Model priors

Classification process

Performance assessment

Results

Performance by HDPGMs

Real-time classification

Conclusion

Acknowledgments

References