A novel algorithm based on ensemble empirical mode decomposition for non-invasive fetal ECG extraction

Non-invasive fetal electrocardiography appears to be one of the most promising fetal monitoring techniques during pregnancy and delivery nowadays. This method is based on recording electrical potentials produced by the fetal heart from the surface of the maternal abdomen. Unfortunately, in addition to the useful fetal electrocardiographic signal, there are other interference signals in the abdominal recording that need to be filtered. The biggest challenge in designing filtration methods is the suppression of the maternal electrocardiographic signal. This study focuses on the extraction of fetal electrocardiographic signal from abdominal recordings using a combination of independent component analysis, recursive least squares, and ensemble empirical mode decomposition. The method was tested on two databases, the Fetal Electrocardiograms, Direct and Abdominal with Reference Heartbeats Annotations and the PhysioNet Challenge 2013 database. The evaluation was performed by the assessment of the accuracy of fetal QRS complexes detection and the quality of fetal heart rate determination. The effectiveness of the method was measured by means of the statistical parameters as accuracy, sensitivity, positive predictive value, and F1-score. Using the proposed method, when testing on the Fetal Electrocardiograms, Direct and Abdominal with Reference Heartbeats Annotations database, accuracy higher than 80% was achieved for 11 out of 12 recordings with an average value of accuracy 92.75% [95% confidence interval: 91.19–93.88%], sensitivity 95.09% [95% confidence interval: 93.68–96.03%], positive predictive value 96.36% [95% confidence interval: 95.05–97.17%] and F1-score 95.69% [95% confidence interval: 94.83–96.35%]. When testing on the Physionet Challenge 2013 database, accuracy higher than 80% was achieved for 17 out of 25 recordings with an average value of accuracy 78.24% [95% confidence interval: 73.44–81.85%], sensitivity 81.79% [95% confidence interval: 76.59–85.43%], positive predictive value 87.16% [95% confidence interval: 81.95–90.35%] and F1-score 84.08% [95% confidence interval: 80.75–86.64%]. Moreover, the non-invasive ST segment analysis was carried out on the records from the Fetal Electrocardiograms, Direct and Abdominal with Reference Heartbeats Annotations database and achieved high accuracy in 7 from in total of 12 records (mean values μ < 0.1 and values of ±1.96σ < 0.1).


Single channel methods
The study [31] presented a combination of the EMD and correlation analysis. The main idea was to decompose the abdominal ECG (aECG) signal into individual oscillatory functions using the EMD and then determine the correlation between the reference fECG signal and the oscillatory functions of the aECG signal. Subsequently, the oscillatory functions that correlated with the reference signal the most, were found. The sum of the oscillatory functions provided the enhanced fECG. The authors tested the method on real data and achieved an average accuracy of 100% when detecting the fQRS complexes.
A single channel method combining singular value decomposition and the ICA method was tested in [37]. The objective was to project the input signal onto higher dimensions and, thanks to the statistical independence between the source signals, separate them from each other in order to estimate the fECG. When testing on synthetic data, they did not perform any statistical evaluation, but based on visual comparison, they stated that the algorithm achieved effective fECG extraction.
A combination of singular value decomposition and polynomial classifiers was introduced in [38]. The mECG signal was estimated using the singular value decomposition and subsequently mECG and aECG signals were fed to the input of the polynomial classifiers. The fECG signal was extracted using the dynamics and nonlinearities of the estimated mECG signal. The method was tested on real and synthetic data. They evaluated the extraction quality of synthetic data using the signal-to-noise ratio (SNR) parameter and real data by means of visual comparison. The method was evaluated as suitable for single-channel fECG extraction.
The authors of the study [39] tested a combination of the extended Kalman smoother (EKS) and ANFIS. In the first step, the maternal component was estimated from the aECG signal using the EKS. Subsequently, the estimated component was centred with the aECG signal using the ANFIS. The fECG signal was extracted based on subtracting the estimated mECG from the aECG. The evaluation of filtration efficiency was performed using SNR in synthetic records whereas in real records, it was evaluated by the accuracy of fQRS detection achieving an accuracy of 90.20%.

Multichannel methods
The multichannel method combining ICA, ensemble empirical mode decomposition (EEMD) and wavelet shrinkage (WS), was tested in [32]. In the first step, the ICA method, which separated the fECG signal from the aECG signals, was applied. Since the fECG signal still contained residues of the mECG component, it had to be further filtered using the EEMD method. In the last step, the residual high-frequency noise was removed using the WS method. The authors evaluated the effectiveness of the method using SNR and mean squared error (MSE) and stated that the limitation of the method was its low computational speed and the fact that it causes changes in signal morphology.
Another multichannel method, based on a combination of periodic component analysis and generalized eigenvalue decomposition was tested in the study [40]. The detection of Rpeaks was based on the idea of time-varying periodicity. The cross-correlation between timedelayed samples in different channels represented by a constant time variable was replaced with a calculated phase variable. The method was tested on real aECG recordings. No statistical results were presented in the study, but the proposed algorithm was, according to the authors, effective and time-efficient in fECG extraction.
The combination of ICA and WT was presented in the study [41]. The method is based on the principle of separating the fECG component from the aECG signals using the ICA and the subsequent removal of the residual noise in the fECG signal using WT. The evaluation of the results was performed by visual comparison and authors stated that in addition to the efficient extraction of fECG, the morphology of the signal was preserved.
In the study [42], the FastICA and the quality index optimization (QIO) methods were combined. First, the FastICA, which decomposed the signal into individual components, was applied. Since the fECG contained residues of the mECG signal, the mECG component was subtracted from the fECG. In the last step, the quality of the fECG signal was improved using the QIO method. When evaluating the accuracy of fQRS detection using the F1-score, an accuracy of 99.38% was achieved when tested on real records and an accuracy of 98.78% when tested on synthetic records.
The authors of the study [43] dealt with a machine learning approach integrating an echo state network (ESN) with dynamic programming. The idea was based on the detection of fQRS complexes using the recurrent ESN. The resulting positions of the fQRS complexes were obtained by dynamic programming interpreting the outputs of the ESN. The algorithm was tested on real records, in which it was able to effectively suppress the maternal component and the extraction of fECG was thus efficient. The method was relatively fast, but further improvement should be possible by increasing the ESN contribution.
An artificial neural network and correlation-based technique were combined in study [44]. First, the mECG component was extracted from the aECG signals using the artificial neural network. Based on the correlation, the mECG component was centred with the aECG signals.
Finally, the mECG component was subtracted from the aECG signals, and the fECG signal was obtained. The method was tested on real records and an accuracy of 93.75% was achieved in the detection of fQRS complexes.
Finally, a combination of correlation analysis and the FastICA was presented in [45]. First, the self-correlation analysis was applied to reduce the temporal correlation. Subsequently, the FastICA method was applied, which decomposed the input signal into individual components, one of which was the resulting fECG signal. The authors did not use any statistical parameters to evaluate the extraction efficacy, but according to the visual evaluation, the extracted fECG signals contained almost no maternal residues.
This work is based on the methods we proposed in our previous studies. In [46], we presented the ICA-ANFIS-WT and ICA-RLS-WT procedures and, in [47], we tested the ICA-EMD, ICA-EMD-WT and ICA-RLS-EMD approaches. When determining the fHR, all methods provided better results than using its individual components separately. However, the disadvantage of the methods, which includes the WT, was the deformation of the fQRS complexes, which would not allow the subsequent morphological analysis of the signal, such as the ST analysis. The goal of this study is to introduce an algorithm composed of ICA, RLS and EEMD methods, which could provide better results in determining the fHR than the previously presented approaches, it would be effective in a wide range of signals and would allow the subsequent morphological analysis of the fECG.

ST segment analysis
In addition to fHR, which is the main parameter assessing the fetal well-being, new techniques focusing on the changes of the fECG waveform (P-R interval, QT interval, T/QRS ratio and ST segment) caused by the hypoxic states have been developed and presented in the literature [41,[48][49][50][51][52][53][54][55]. However, the STAN 1 Method (Neoventa Medical, Sweden) is the only analysis tool that is available in the clinical practice. The combination of ST-Analysis provided by the STAN 1 device and standard CTG parameters (the trace of both fHR and uterine contractions) provides extended and more accurate assessment of fetal well-being during the labor than CTG alone.
The principle: The analysis of the fECG using STAN 1 is possible only during the labor (after membrane rupture) since it requires the scalp electrode to be placed on the fetal body. The STAN device analyses 30 consecutive ECG complexes and compares them with the 'baseline value', which was calculated over the first four to five minutes for each fetus. Each analysis is marked by 'X' on the lower part of the monitor (see Fig 1). In the presence of an 'ST event' (value differing significantly from the "baseline value"), it is necessary to classify the CTG trace according to STAN guidelines and then to determine if any action is needed [56].

Material and methods
This section describes the component methods that were selected based on the literature review and the results of our previous research. Within the system, the following methods were implemented: ICA, RLS and EEMD. The databases on which the algorithm was tested, and the description of statistical parameters used to determine the effectiveness of the filtration method in fQRS complexes detection, are also described.

Independent component analysis
Independent component analysis is a statistical method based on separation of mixed independent signals and extraction of the original source components [57]. In order to provide good results, when using ICA, two assumptions must be met: the source signals must be independent, and only one original component can have a Gaussian distribution. If the method is applied to signals where there are multiple Gaussian sources, then it cannot extract these sources [27,58]. In recent years, many improved ICA algorithms have been proposed [26,[59][60][61][62]. One of them, the FastICA algorithm, is widely used due to its fast convergence. FastICA is a fixed-point iterative algorithm, minimizing mutual information between the estimated components. There are various types of FastICA algorithms, including those based on kurtosis, maximum likelihood, and maximum negentropy [25]. A detailed mathematical description of the method can be found in studies [25,27,57,58,63].

Recursive least squares
Recursive least squares is an adaptive method based on minimizing the error defined as the difference between the desired output and the real output of the adaptive algorithm. The minimization of the error function is achieved by automatic adjustment of the adaptive filter coefficients [1]. The RLS algorithm is remarkable due to its fast convergence and provides excellent results in time-varying environments [64]. Nevertheless, in its basic form it has higher computational demands and, in some cases, it may have stability problems [1]. To reduce the computational demands, the RLS algorithm uses "adaptation" or "forgetting" factor λ, whose task is to decrease the contribution of "old" error values. The forgetting factor value ranges from zero to one [1,64]. To reduce the computational complexity, a filter order which indicates the final number of the previous values processed, is also selected. Proper setting of the RLS algorithm parameters has a great effect on the stability of the filter [1]. Further information about RLS can be found in studies [1,[64][65][66].

Ensemble empirical mode decomposition
The ensemble empirical mode decomposition is an improved version of the empirical mode decomposition (EMD) method. The EMD is based on the decomposition of the non-stationary signal, such as the biomedical signal, into oscillatory functions called intrinsic mode functions (IMFs) [67]. The EMD algorithm works on the principle of identifying local extremes of the signal and creating the upper and lower envelopes of the signal. The mean of these envelopes is subtracted from the input signal, thus obtaining the first IMF. The entire process is repeated, but instead of the input signal, the residual signal which was obtained by subtracting the IMF from the input signal, is used. The iterative process terminates when the final residual signal is extracted [47,68]. A more detailed description of the EMD method can be obtained in studies [31,32,47,67].
Despite many advantages, the EMD has a limitation called mode mixing, which can significantly affect the decomposition efficiency. This phenomenon occurs when one of the IMFs contains several components with very different frequencies, or when components with very similar frequencies are contained in different IMFs [32]. To prevent this phenomenon, EEMD was proposed in the study [69], based on adding different series of white noise into the input signal in several trials [70]. Further information and a more detailed description of this method can be found in studies [32,67,[69][70][71]. A brief description of the method can be summarized in the following steps: 1. Set the number of ensemble trials N and the standard deviation of the added noise series N std .
2. Add a white noise series w(t) into the input signal s(t): 3. Decompose s 1 (t) signal using the EMD algorithm.
4. Repeat step 1. and 2. N-times, but with different white noise series each time.
5. The final IMF j (t) of the EEMD is obtained by averaging all IMFs related to N trials: where i is a trial number and j is the IMF scale. The principle of the EEMD algorithm is shown in Fig 2. These methods were selected for following reasons: 1. ICA-the ICA method is a separation technique able to decompose a composite input signal, such as aECG, into its components, such as fECG, mECG, and noise [46,47,72]. The advantage of the FastICA variant is that it requires little memory space, is conceptually simple and computationally efficient, as it uses a fixed iteration scheme [25,73]. Previous studies [27,46,47,74] have shown that when processing fECG, the ICA method could highlight the fetal component in the aECG signal, but the maternal component was not sufficiently eliminated. On the other hand, ICA was able to extract the mECG signal very accurately. The advantage of the ICA application in fECG processing is that it requires only signals from the abdominal electrodes as input and there is no need to record the reference mECG signal by means of the chest electrodes [24]. This brings benefits, such as greater comfort and mobility, especially for the mother. Compared to the PCA method, which is also often used to estimate the maternal component, ICA is able to obtain a more accurate estimate, and in addition, ICA produces both maternal component (mECG signal) and an aECG signal with an enhanced fetal component [27].
2. RLS-the advantage of the RLS algorithm is its adaptability and high performance in time-varying environments which makes it suitable for processing fECG [5,75]. However, its performance is highly dependent on the quality of the input signals. When high quality aECG and mECG signals enter the algorithm, RLS usually achieves very promising results in fECG extraction. There are two different approaches to achieve the fECG extraction. The first approach uses an abdominal signal as a primary input, whereas the signal is recorded using the chest electrode as reference. The second approach is based on using solely abdominal recordings and an algorithm that is able to separate the maternal reference from an aECG signal [5]. When using only the latter approach, the obtained mECG reference is more similar to the maternal component in corresponding aECG signal, and thus the adaptive system can achieve better results. The thoracic signal may be of poor quality, its continual measurement is uncomfortable for the mother, and change in the sensor location leads to changes in the resulting signal's morphology, which may differ from the maternal component than in the abdominal area and extraction by the RLS algorithm is thus more difficult and less effective [1,5]. Frequently used adaptive algorithms also include the least mean squares methods, however, the experimental results show that the RLS algorithm outperforms it [1,76], although at the cost of higher computation time.
3. EEMD-the EEMD algorithm is suitable for processing non-stationary and non-linear signals, such as fECG [32,67]. In most studies [32,47,[77][78][79] that deal with the extraction of fECG using EMD-based methods, these methods are used in combination with other methods, as they are not able to extract either fECG or mECG precisely enough. However, EMD-based methods excel in the final smoothing of the residuals of the maternal component in the fECG signal, which was previously extracted by another more efficient method such as adaptive RLS. In addition, the methods allow the subsequent morphological analysis of the resulting signal to be performed. This makes them suitable as part of diagnostic systems that, in addition to accurately determining fHR, require a more detailed morphological analysis of the fECG waveform [47]. Compared to the EMD method, which is also very often used for the final smoothing of the fECG signal, EEMD is more efficient because it is less limited by the mode mixing problem. However, the disadvantage of EEMD is its lower computing speed [80,81].

PLOS ONE
To achieve accurate results, this study proposed a combination of these three methods, by incorporating them in a way that their aforementioned benefits are utilized: • ICA alone is not able to estimate fECG accurately enough, but it can excellently estimate mECG and aECG signal with an enhanced fetal component. At the same time, the ICA generates input signals for the RLS method, using solely aECG signals.
• As the RLS is prone to poorly scanned reference mECG signals, using high-quality input signals estimated by the ICA method, the RLS can extract high-quality fECG with minor maternal residues.
• These maternal residues are eventually smoothed by the EEMD method, which as such is not able to completely extract either fECG or mECG. In addition, it allows the subsequent morphological analysis of the resulting signal to be performed.

Dataset
The study protocol was approved by the Ethical Committee of the Silesian Medical University, Katowice, Poland (NN-013-345/02). Subjects read the approved consent form and gave written informed consent to participate in the study. In this study, two databases were used to test the performance of the ICA-RLS-EEMD extraction method. The first one is called the Fetal Electrocardiograms, Direct and Abdominal with Reference Heartbeats Annotations (FECGDARHA) and is available in the figshare repository [82]. It contains 12 real recordings acquired from 12 different women, in 38-42 week of pregnancy. The recordings contain four abdominal signals and one signal from the scalp electrode forming the reference. The signals of five minutes length each were digitized with a 16-bit resolution at 500 Hz sampling frequency for abdominal signals and 1000 Hz for a direct reference fECG [83]. The recordings were registered using the KOMPOREL system. This system automatically detected and marked R-peaks, whose accuracy was verified offline by a group of cardiologists, providing the reference annotations [82,83]. A detailed description of the database can be found in [83].
The second dataset considered is the Physionet Challenge 2013 database [84], from which we analysed the first 25 out of 75 recordings. This database was created for the Computing in Cardiology Challenge 2013 in order to improve the detection of fQRS complexes and the determination of the fQT interval from a non-invasive abdominal recording [85]. Each recording contains four abdominal signals with a sampling frequency of 1000 Hz and annotations indicating the positions of R-peaks. The length of each recording is one minute. The recordings come from different sources and are acquired using a variety of devices with different resolutions and configurations. The fetuses do not have the same gestational age and no attention was paid to the placement of the electrodes [84].

Extraction system design
The ICA-RLS-EEMD algorithm was based on several steps. The first was the decomposition of input aECG signals into three components using the ICA. However, only two components were used for further processing. The first was the mECG component and the second was the component in which the mECG signal was most suppressed and the fECG signal was most enhanced. These two signals were used as the inputs of the adaptive RLS system. The RLS output comprised an fECG signal which was still corrupted with residual noise. Therefore, the EEMD method was applied in order to remove the noise and obtain an enhanced fECG � . The block diagram of the algorithm and the output signals are shown in Fig 3. Algorithms settings-to obtain accurate results, it was necessary to select the most suitable combination of aECG signals and optimal hyperparameters setting of the RLS (filter order) and EEMD (N, N std , and IMFs combination) algorithms. The most suitable combination of aECG signals and parameters setting was selected on the basis of an automated algorithm. It compared all combinations of the input aECG signals and parameters setting and selected the combination that provided the highest accuracy according to the ACC measure. The comparison was carried out using the reference annotations. The best parameters values of the RLS and EEMD methods are presented in Tables 1 and 2, for each recording separately. Collumn n Block diagram explaining the ICA-RLS-EEMD method: a) input aECG signals; b) input aECG FIR signals preprocessed by the FIR filter; c) three source components extracted by the ICA method; d) ICA components assigned to the source signals, which were time and amplitude centred and served as inputs to the RLS algorithm; the fECG signal was the output of the ICA-RLS algorithm; e) the first five IMFs that were obtained after the application of the EEMD method; f) a reference scalp electrode recording and the resulting fECG � , which was extracted after applying the ICA-RLS-EEMD method. https://doi.org/10.1371/journal.pone.0256154.g003

PLOS ONE
represents For the sake of clarity, the principle of the extraction system is described in the following steps: 1. First, the most appropriate combination of the aECG signals was selected for each recording. Since the ICA method is multichannel, at least two abdominal signals had to be selected. For each recording, it was investigated which combination of electrodes gave the best results based on the calculation of the ACC measure. Four abdominal signals were available for each recording; thus, 11 combinations of signals were tested. If all four signals were used for some recordings, less accurate extraction was achieved, because not all signals were recorded in high quality.
2. Subsequently, the aECG signals were filtered using a bandpass finite impulse response filter (FIR) with a range of 3-150 Hz and filter order of 500. The purpose of the filter was to remove the fluctuations of isolines.
3. The ICA method was further applied to the aECG signals. The FastICA algorithm based on kurtosis was used. The number of iterations was set to 20. The input abdominal signals (aECG) were divided into three components. In most cases, one component corresponded to the mECG signal, the second one to the aECG signal (denoted as aECG � ) with the enhanced fECG component, and the third one was noise (the assignment of the components was based on the number of peaks detected). When using the ICA method, the polarity of the individual components may be inverted, see mECG in Fig 4a). This problem was  solved by means of an algorithm controlling the positive and negative maxima in the interval of 50 samples from the detected R wave. In this way, the algorithm was able to detect and adjust the polarity (see Fig 4b).
4. The mECG and aECG � signals, after time and amplitude centering, were fed to the RLS system. The RLS algorithm adjusted the mECG component to the form of the parent component contained in the aECG � signal. Finally, this RLS algorithm output (denoted as mECG RLS ) was subtracted from the aECG � signal. The output of the RLS algorithm comprised an fECG signal in which the mECG component was suppressed. The forgetting factor was set to one in order to optimize the RLS algorithm. For each recording, values from 2 to 100 with a step of two were tested to find the most appropriate filter order. 5. The EEMD method, which decomposed the signal into 12 IMFs, was applied to remove the residual noise. An enhanced fECG � was obtained as the sum of the suitable IMFs. The value of standard deviation of the added noise series N std was tested in the range of 0.1-0.9 with a step of 0.1, and the number of ensemble trials N was set to 10, 30 and 60. Testing for higher values of N was not necessary as there was no further improvement in the fECG extraction.
6. Finally, the fHR was determined by detecting R-peaks. The detection was done using an R-peaks detector, which was based on the continuous WT [86]. The first derivative of the Gaussian function was used as the mother wavelet. The comparison of fQRS complexes detection accuracy was done using annotations containing precisely marked positions of

PLOS ONE
R-peaks. The following statistical parameters were used to evaluate the performance of the method: false positive (FP), false negative (FN) and true positive (TP). A detected fQRS complex was considered correct if its R-peak was found within 50 ms time frame before or after the reference (annotated) R-peak position [42]. Using FP, FN and TP values, it was possible to further determine the accuracy (ACC), indicating the correctness of the fQRS complexes detection, the sensitivity (SE), the positive predictive value (PPV) and F1-score indicating the overall accuracy of the filtration [27,42]. The relevant equations are given The RLS algorithm structure with examples of the input and output signals: a) aECG � signal, referred to as primary input or desired signal d(n), b) mECG signal that needed to be adjusted by an adaptive filter, denoted as x(n). Example c) represents an mECG RLS component that has been adjusted by the filter into a shape of the mECG component in the aECG � signal, denoted as y(n). This modified mECG RLS signal was subtracted from the aECG � signal, thus generating the fECG signal, denoted as error signal e(n). https://doi.org/10.1371/journal.pone.0256154.g005

PLOS ONE
below: 7. Bland-Altman plots were used to graphically evaluate the accuracy of fHR determination.
The results of the fHR determination were compared with the values in the annotations. The 95% limits of agreement are most often used for this graph. This is an estimate of the interval μ±1.96σ in which 95% of the difference values can be expected. If the values of the differences determined lie within this interval, the difference between the methods compared is negligible [87,88]. Also, graphical evaluation using fHR traces was done to determine the accuracy of the fHR over time. To plot the traces, it was necessary to determine the current heart rates between the individual R-peaks and to apply a moving average. The most suitable size of the moving average window was 30 samples.
8. Finally, ST segment analysis was performed on records from the FECGDARHA database. First, the signal magnitude was normalized according to the R-peak maximum, and then 30 individual cycles of fECG were averaged. This was inspired by the STAN device used in clinical practice, which also averages the T/QRS values of thirty consecutive fECG cycles. A fixed window of 600 ms (determined to cover all the physiological changes occurring) was used for averaging, namely 190 ms from the R wave to the left and 410 ms from the R wave to the right. To determine the T/QRS ratios, it was necessary to detect the R, S, and T waves. The beginning and end of the fQRS complex were detected using the same continuous WT detector that was used to detect R-peaks. The detector was set to four levels of decomposition and first order Gaussian mother wavelet. A different approach was used for the T wave detection. The signal was first duplicated, then a Butterworth bandpass filter was applied at frequencies of 0.5-10 Hz, (in this range, T wave has the most significant frequency representation). Subsequently, the QRS complex was zeroed and the T wave was found by thresholding. ST analysis was obtained by determining T/QRS ratios for each of the averaged fECG cycles.

Results
The results of the ICA-RLS-EEMD method are provided in this section. The evaluation of the performance was conducted in terms of the correctness of the fQRS complexes detection and the fHR determination on recordings from the FECGDARHA and Physionet Challenge 2013 database. Bland-Altman plots and fHR traces are also presented. This section also provides the results of ST segment analysis, which was carried out on the recordings from the FECG-DARHA database. In the last subsection, the results of the ICA-RLS-EEMD method are compared with the results obtained in other studies.

Detection of fetal QRS complexes
Statistical evaluation of the fQRS complexes detection accuracy was performed using annotations. First, TP, FP and FN values were determined, and based on them, the ACC, SE, PPV and F1-score indices were calculated. When testing the proposed method on the FECG-DARHA database (see Table 1), the ACC higher than 95% was not noticed for recordings r04, r06, r07, r11 and r12. The SE was higher than 95% for most of the recordings except r04, r07, and r11, the PPV parameter exceeded the value of 95% for all recordings except r11, and, the F1-score was over 95% for most of the recordings except r04 and r11. For all recordings except r11, all considered quality indices exceeded the value of 80%. In the case of r01, r02 and r05 recordings, all fQRS complexes were extracted correctly and there were no false positive or negative detections, consequently all quality parameters achieved a value of 100%. When testing the method on the Challenge database (see Table 2), the limit value of 95% was not achieved for the recordings a01, a02, a06, a07, a09, a10, a11, a16, a18, a20, and a21 when considering ACC, SE and F1-score, and for the recordings a02, a06, a07, a09, a11, a16, a18, and a21, when taking into account PPV. Values of ACC, SE and F1-score higher than 80% were achieved for most of the recordings except a02, a06, a07, a09, a11, a16, a18 and a21, and the PPV higher than 80% was achieved for most of the recordings except a02, a07, a09, a11, a16 and a18. For recordings a04, a05, a08, a15, a17, a22 and a25, all fQRS complexes were correctly detected, zero value of FP and FN were achieved, resulting in 100% value of all considered quality indices. For clarity and easier visualization, we provide a summary of the records from both databases were reaching threshold values (80% and 95%), see Table 3.

Determination of fetal heart rate
Bland-Altman plots are used to show the differences between two measurements. The vertical axis of the plot shows the differences between paired signal values, while the horizontal one shows their arithmetic mean. The middle horizontal line indicates the mean μ of all differences. Based on this line, a 95% limits of agreement that lie in the interval μ ± 1.96σ, are plotted [87]. The results can be interpreted as follows: the smaller the range of the limits of agreement, the smaller the difference between the fHR, which is determined by the method, and the annotated signal [88]. The mean values μ and the values of limits of agreement determined by the According to Table 4, it can be noticed, that the ICA-RLS-EEMD method was effective for all recordings included in the FECGDARHA database except the recording r11, since high values of μ and limits of agreement were achieved. Based on the results from Table 5, it can be concluded that the method was effective for most of the recordings (except a02, a06, a09, a10, a11, a16, a18, a20, and a21) when considering values of μ. For the recordings a01, a04, a07 and Table 4  ACC > 95% r01, r02, r03, r05, r08, r09, r10 a03, a04, a05, a08, a12, a13, a14, a15, a17, a19, a22, a23, a24, a25

PLOS ONE
A novel algorithm based on EEMD for NI-fECG extraction Table 5. a15, small values of μ were achieved as well, however, accompanied with high values of limits of agreement. Fig 8 shows Bland-Altman plots for recordings from the FECGDARHA database. Plot a) presents a recording characterized by high quality of fHR determination, while plot b) shows an example of the recording for which the ICA-RLS-EEMD method was not effective. Fig 9 illustrates Bland-Altman plots for recordings from the Challenge database. As in the previous case, plot a) presents a recording of high quality of fHR determination, whereas plot b) is an example of recording in which the ICA-RLS-EEMD method was not effective.

Rec. μ (bpm) Upper limit of agreement (bpm) Lower limit of agreement (bpm)
In order to plot fetal heart rate traces, it is necessary to determine the fHR value between the consecutive R-peaks. Based on the moving averages, fHR traces were created and compared with annotations of R-peaks determined by experts. The graphical representation used is inspired by FIGO classification [89]. The interval of fHR physiological values (110-150 bpm) is marked in white, the intervals of fHR with an increased risk of hypoxia (90-110 bpm and 150-180 bpm) are marked in yellow and the area of fHR with a high risk of hypoxia (below 80 bpm and above 180 bpm) is marked in pink [46].
A comparison of fHR traces for both databases is shown in Fig 10. The top arrows indicate the names of the recordings related to the given section of the graph. With the FECGDARHA   Fig 10a) shows the fHR traces determined using the signals from the FECGDARHA database. It can be stated that all of the signals except the recording r11 copy the trend of the reference trace, and thus the determination of fHR was accurate. On the other hand, for the recording r11, the deviation of the resulting fHR trace from the reference one can be noticed, which means that the determination of fHR was not accurate. Moreover, based on the analysis of Fig 10b) we can state that the traces representing recordings a02, a06, a07, a09, a10, a11, a16, a18, and a21 from the Challenge database do not copy the trend of the reference trace and the method was thus not effective in determining the fHR for these recordings. For the recordings a01 and a20, a slight deviation of the resulting curve from the reference one can be noticed, which means that the fHR was not determined accurately using this method in some parts of these recordings.

ST segment analysis
For evaluation of the ST segment analysis accuracy, we compared the T/QRS ratios estimated using the method with those determined using the reference signals acquired by means of fetal scalp electrode. For the comparison, we used the mean values μ and values of limits of agreement which are summarized in Table 6. Low mean values μ and values of limits of agreement were obtained for recordings r01, r02, r03, r05, r08, r09, and r10, showing a high accuracy of ST segment analysis. Contrary, in recordings r04, r06, r07, r11, and r12, the mean values μ and values of limits of agreement reached high values and thus the ST segment analysis was inaccurate.

PLOS ONE
These results were confirmed by plotting the estimated and reference T/QRS ratios of the averaged fECG complexes over time, see Fig 11. Accurate ST segment analysis was achieved in 7 of the 12 records tested (r01, r02, r03, r05, r08, r09 and r10). For the remaining five records (r04, r06, r07, r11, and r12), the ST segment analysis was not effective-the estimated values significantly differ from the reference ones. For records r06, r07, r11, and r12, the deviations of the estimated T/QRS ratios from the reference T/QRS ratios were so large that most of the red data points in these records were outside the y-axis range and are therefore not visible.

Comparison of the results
In this part, the results provided using the ICA-RLS-EEMD method are compared with the results of other methods reported in other studies. Such a comparison is very difficult, as a large number of various databases and different recordings are used for testing. They include, for example, the Database for the Identification of Systems (DaISy) [90], used in [29,33,39,40], the Fetal ECG Synthetic Database (FECGSYNDB) [50], used in [91,92], or the Non-Invasive Fetal ECG Database (NIFECGDB) available on the PhysioNet website [84], applied in [36,44,67]. There are also studies [26,37], in which the authors do not specify on which data they tested their methods, and, in some studies, i.e. [38], authors use their own set of signals. In

PLOS ONE
terms of evaluating the effectiveness of the method, various quality indices are applied. In [38] the SNR was used, in [93] the MSE was applied, while in [94] authors used the correlation coefficient. In addition, a number of studies [28,29,35,67,95], do not use any statistical evaluation of the effectiveness, presenting examples of extracted signals only. Other studies, like [31], test their methods only on a limited number of signal samples. For these reasons, we made both an objective and a subjective comparison of the results summarized in Table 7.
• A database containing the same recordings as the FECGDARHA is called Abdominal and Direct Fetal ECG Database (ADFECGDB) and is available in Physionet [84], but contains only five recordings (r01, r04, r07, r08 and r10). In [96], the authors tested a combination of compressive sensing (CS) and the ICA method. The method was tested only on these five recordings from the ADFECGDB database and the SE, PPV and F1-score were used to evaluate the extraction quality. The study achieved 92.20% average accuracy of R-peaks detection according to F1-score parameter. This is a worse result compared to the ICA-RLS-EEMD when tested on the FECGDARHA database, but better when tested on the Challenge database.
• In [97], the authors applied a combination of WT and a clustering-based technique (CT). The effectiveness of the method was also tested only on the recordings from the ADFECGDB database (whereas we used seven additional signals published later in [83]), and on 26 recordings from the Challenge 2013 database, where 12 of those records were the same as in

PLOS ONE
our study. The effectiveness of the method was evaluated using ACC, SE, PPV, and F1-score parameters. The authors of the study achieved an average accuracy of 98.63% in the R-peak detection according to F1-score when tested on the ADFECGDB database, and 94.77% when tested on the Challenge database, which is a better result in both cases than we achieved. In general, however, the WT-CT method was less effective for recordings with higher noise levels.
• In [98], a combination of de-shape short time Fourier transform (STFT) and nonlocal median (NM) was investigated. The method was tested on recordings r01, r04, r07, r08 and r10 from the ADFECGDB database and on 75 recordings from Challenge 2013 database (including 25 records that we used herein). The authors achieved and average accuracy of 98.86% in R-peaks detection according to F1-score when tested on the ADFECGDB database; when tested on the Challenge database, they achieved an average accuracy of 86.31%, both slightly outperforming our method. However, the authors state that the P and T waves "were buried in the noise" when using the STFT-NM and thus their extraction would be nearly impossible.
• In [32], the authors tested the combination of ICA-EEMD-WS on five real recordings from the ADFECGDB database and on 500 synthetic recordings generated using the Fetal ECG Synthetic (FECGSYN) generator introduced in [101]. Unfortunately, the authors used other parameters (SNR, MSE) to evaluate the accuracy of the extraction. However, according to the visual evaluation, it can be stated that the ICA-EEMD-WS was as effective as the method we proposed. According to the authors, the limitations of the ICA-EEMD-WS method include low computational speed and significant noise suppression affecting the morphology of the output signal, which can affect the clinical information contained in it.
• In [99], the authors introduced a combination of CS and non-negative matrix factorization (NMF). The method was tested on five recordings from the ADFECGDB database and 60 recordings from the Challenge 2013 database (including 25 records that we used for testing). The effectiveness of the method was evaluated using the parameters SE, PPV, and F1-score. The authors achieved an average accuracy of 94.80% in R-peaks detection according to F1-score when tested on the ADFECGDB database; when tested on the Challenge database, they achieved an average accuracy of 84%, which is the same result we achieved.
• The combination of EKS, ANFIS and differential evolution (DE) was tested in [100]. The authors tested the method on 75 records from the Challenge 2013 database and 55 records from the NIFECGDB database. The effectiveness of the method was evaluated according to the parameters ACC, SE, PPV, and F1-score. When testing on the Challenge database, the authors achieved an average accuracy of 91.82%, which is a better result than ours, and when testing on the NIFECGDB database, the average accuracy was 95.12%. According to the authors, the limitation of the method is that it may perform worse in the extraction of the fECG at lower sampling frequency compared to the extraction of the fECG at higher sampling frequency.
• Authors of the study introduced in [42] tested the combination of ICA a QIO methods. They used records from the Challenge database and from the FECGSYNDB database for testing. They used the parameters SE, PPV, and F1-score to evaluate the quality of extraction. The authors achieved an average accuracy of 99.38% according to the F1-score when testing on the Challenge database and an average accuracy of 98.78% when testing on the FECG-SYNDB database, thus outperforming our method. However, the authors state that this is a low computational speed algorithm that cannot extract fECG signals in multiple pregnancy.
• In [45] the authors tested the combination of ICA and self-correlation analysis (SCA) methods on the recordings from the ADFECGDB database. The authors did not use any statistical parameters for the evaluation, and in addition, they tested the method on only five signals. However, according to the visual comparison of the extracted signals, it can be stated that our method was able to suppress the mECG component better.
• The combination of the EMD and correlation analysis (CA) was introduced in [31] and tested on signals from the ADFECGDB database. The authors evaluated the effectiveness of the method using the ACC, SE, and PPV parameters. However, since they reached 100% for all parameters, it can be assumed that the F1-score would also acquire an accuracy of 100%. In this case, however, these are very inconclusive results, as the method was tested only on four signals with a length of ten seconds.
We also compared the results obtained using the conventional ICA method alone, for all recordings from both the FECGDARHA and Challenge databases were better results achieved using the proposed procedure. Using the ICA-RLS-EEMD method and the FECGDARHA database, ACC >80% was achieved for 11 out of 12  Finally, we performed a further comparison using methods which were presented in our previous studies. In [46], we introduced the ICA-ANFIS-WT and ICA-RLS-WT methods and, in [47], we tested the ICA-EMD, ICA-EMD-WT and ICA-RLS-EMD procedures. We applied the same testing routine as presented in this work, i.e. using the FECGDARHA and Challenge databases, and the same set of quality indices i.e., ACC, SE, PPV and F1-score. Using the ICA-RLS-EEMD method, we achieved higher values of considered indices for most of the recordings. The exceptions are the recording a07 from the Challenge database, where better results were provided for the ICA-ANFIS-WT method, and the recording a21, for which the ICA-ANFIS-WT and ICA-RLS-WT methods provided better results. The average values of the ACC, SE, PPV and F1-score parameters, which were obtained using all our previous methods along with results of the conventional ICA, are summarized in Table 8 (the FECGDARHA database) and Table 9 (the Challenge database). The second column of the tables shows the number of recordings for which the ACC parameter value was higher than 80%.

Discussion
The results presented in the previous section demonstrate the ICA-RLS-EEMD method capability to extract high-quality fECG signals. For most of the recordings tested, the method eliminated the maternal component very effectively and determined the fHR with high accuracy. The maternal component was eliminated mainly with the help of the RLS method, and the subsequent application of the EEMD further improved the resulting filtration. Unfortunately, for some signals we were not able to sufficiently suppress the maternal component. In this section we discuss the reasons behind the differences in the quality of the extraction outcomes.
Studies [1,102,103] demonstrated a significant impact of electrode placement and data acquisition quality on the final fECG extraction. An example of such influence is shown in Fig  12. Example a) shows the recording r08 from the FECGDARHA database and example c) is an illustration of recording a15 from the Challenge database. In both of them, it can be seen that the magnitude of the fetal component is sufficient compared to the maternal one. In these cases, extraction by the system was highly effective, and, in both cases, ACC >99% was achieved when detecting the fQRS complexes. Example b) illustrates the recording r11 from the FECGDARHA database being an example of aECG signals of lower quality. In this case,

PLOS ONE
the magnitude of the fetal component is very low compared to the maternal component and, in addition, the signal contains noise. From such input signals, it is difficult to extract a good quality fECG signal and to accurately detect the fQRS complexes. As a result, the reported ACC was less than 53%. Finally, example d) of recording a18 from the Challenge database is presented. Here, the ICA-RLS-EEMD method completely failed to extract the fECG as the ACC of the QRS detection and the determination of the fHR was below 13%. In such aECG signals the magnitude of the fetal component is almost negligible and hence, the fECG extraction is difficult or practically impossible. Such input aECG signals are not usable for the methods of NI-fECG extraction. Fig 13 shows the influence of the quality of aECG on the extracted signal by comparing the fHR traces determined using the ICA-RLS-EEMD method with the reference fHR signal derived from the scalp electrode. Examples of abdominal, extracted and reference ECG signals from the scalp electrode are also presented. To best illustrate this effect, the recording r10, characterized by an overall high-quality extraction (ACC >95%), was selected from the FECG-DARHA database. Examples a), b) and c) correspond to the signal sections where the fHR determination achieved high accuracy. These parts of aECG signals contain well recorded fetal component. Examples d), e) and f) correspond to sections where the estimated fHR trace deviates slightly from the annotation curve and where, as a result, the fHR determination was less accurate. This could be due to the noise which caused a slight decrease in the quality of the extraction. However, these are insignificant deviations that do not affect the final accuracy of fetal hypoxia determination. Fig 14 shows another example of the impact of the quality of the aECG signals on the resulting extraction quality and the subsequent fHR determination. The recording a16, for which unsatisfactory results were obtained (ACC <30%), was selected from the Challenge database. In this case, only the aECG signal and the signal extracted using the ICA-RL-S-EEMD method are presented, as the scalp electrode reference recording is not available. Only the annotation used to plot the fHR trace is included. Subfigures a), b) and c) correspond to sections where the method curve deviates the least from the annotation curve, but, even in these sections, sufficient extraction quality was not achieved. This is due to the very low magnitude of the fetal component . In examples d), e) and f), the complete failure of the proposed extraction method can be seen, as the fetal component is almost invisible in the aECG signal. Again, it is clear that such signals are not suitable for fECG extraction and their filtration is not feasible.
As shown, the quality of aECG signals is one of the most important factors affecting the resulting filtration quality and the accuracy of fHR determination. It can be stated that the extraction is effective if the ratio between the fetal and maternal component in the aECG record is sufficiently high and the amount of noise in the recording is low. One of the factors in aECG signal quality is the electrode placement. It affects the signal's polarity, morphology, ratio between the maternal and fetal component or even the ability of the system to suppress the unwanted signals (e.g. movement and muscle artifacts). Unfortunately, there is currently no standardization for non-invasive fECG measurement in terms of the number, location of electrodes, or system configuration, as is the case of conventional ECG. The recordings in the publicly available databases (including the FECGDARHA and Challenge 2013) differ in the number and location of the electrodes on the maternal body. As a result, the quality of the signals may differ significantly. This effect can be observed especially in the case of the results achieved using the data from the Challenge 2013 database, which differ significantly from case to case (e.g. for the record a04 ACC = 100% whereas for a18, ACC = 12.61%). These significant differences may be caused by the fact that this database is known to be very inconsistent; it includes records of different quality, from different sources and acquired using a variety of devices with different resolutions, configurations and electrode placements. Large amount of the records are also synthetic. In addition, almost no information on the setting of the measuring system, gestational age or the position of the fetus for individual records is publicly available.

PLOS ONE
The future research should therefore aim to find the optimal number and location of electrodes, which would be standardized with respect to the position and gestational age of the fetus. It can be assumed that the more electrodes used for sensing, the more likely it is that high quality aECG signals will be captured. However, this assumption is only theoretical, as

PLOS ONE
from a practical point of view, recording many signals simultaneously is both clinically and technically demanding. In addition, for the signals to be of high quality, the patient's skin needs to be properly cleaned and the sensor attached well, which is not only time consuming in the case when many electrodes are used, but also stressful and uncomfortable for the pregnant woman. The electrodes can be placed on the abdominal area (for aECG measurement) or also on the chest (to obtain mECG). Purely abdominal sensing is beneficial for the clinical staff due to decreased requirements for preparation but also for the mother since it offers greater comfort and mobility. On the other hand, the combination of abdominal and thoracic electrodes allows easier interpretation of mECG and fECG signals. However, the effectiveness of the combined electrode placement is strongly dependent on the quality of the chest signals, which is often insufficient in clinical practice. In addition, change in the location of the sensor leads to variations in the morphology of the mECG signal which can then differ from the maternal component obtained in the abdominal area. Thus, the extraction may not be a sufficiently accurate [1,5]. This is especially true for adaptive algorithms such as RLS, which are very susceptible to the quality of input signals, especially mECG.
The influence of the performance of the ICA, RLS, EEMD individual algorithms and selection of a combination of aECG signals on the resulting quality of the outcome signal is also presented. Fig 15 shows an example of the effect of the input aECG signal selection on the resulting signal quality. Example a) presents the optimal selection of input aECG signals (aECG2 and aECG4), when the resulting extracted signal was of very high quality and ACC > 99% was achieved, while b) shows an example of inappropriate combination of input signals (aECG2 and aECG3) causing the method to fail to extract fECG signals (ACC < 30%).
In addition to the subjective evaluation of the input aECG waveforms and their influence on the resulting extraction quality, we also performed an objective statistical evaluation, see the summary in Table 10. We have introduced four objective parameters and evaluated their influence on the accuracy of fQRS complex detection:

PLOS ONE
1. Number of input channels-this factor was related to the theoretical assumption that the quality of the resulting extraction does not depend on the number of channels used, but on their quality and suitability in terms of ECG curve morphology. The resulting correlation coefficient was -0.28 [95% confidence interval: -0.55 to 0.05], indicating a very low negative correlation between the number of input aECG signals and the resulting accuracy in Table 10. Analysis of the influence of four factors (number of input aECG signals, average ratio between mR and fR oscillations, average value of sSQI and kSQI) on the resulting extraction quality. The influence of each factor on the accuracy was evaluated using a correlation coefficient (the 95% confidence interval is reported in parenthesis). • First, all mR and fR peaks were detected in each aECG signal for each recording and their voltage levels were determined.

Recordings Number of input aECG signals (-) mR:fR ratio (-) sSQI (-) kSQI (-)
• The average amplitude of the mR and fR oscillations was determined and the average mR: fR ratio was calculated for each aECG signal.
• The resulting ratio was determined as the average of the ratios of all aECG signals used. The value of the correlation coefficient was -0.79 [95% confidence interval: -0.89 to -0.62], indicating a high negative correlation between the average mR:fR ratio and the resulting accuracy in detecting fQRS complexes according to ACC (i.e. the smaller the ratio between mR and fR peaks, the higher the ACC value).
3. Skewness signal quality index (sSQI)-this parameter is often used in the ECG signal quality evaluation in adults, see [104,105]. The resulting value for a given record was determined as the average value of all aECG signals used. In the case of sSQI, a slight positive correlation was observed with the ACC values with a correlation coefficient value of 0.59 [95% confidence interval: 0.32 to 0.76].
4. Kurtosis signal quality index (kSQI)-similarly to sSQI, this parameter is used to assess the quality of ECG signal in adults. Both parameters were designed as indices to identify outliers in the ECG that are equivalent to noise, see [104,105]. In the case of kSQI, a high negative correlation was observed with ACC values with a correlation coefficient value of -0.82 [95% confidence interval: -0.90 to -0.67].
Based on the statistical results, it can be stated that the number of input signals does not affect the final quality of extraction, but the quality of input aECG signals plays an important role, especially in terms of noise presence and the ratio between the maternal and fetal components.
For the RLS algorithm, the quality of the mECG and aECG � components extracted using the ICA method is crucial. These components, forming the reference and primary inputs to the RLS system, and also the output fECG signal extracted using the ICA-RLS method, are shown in Fig 16. Examples of well-extracted fECG signals are presented for the recordings: a) r01 and c) r05 from the FECGDARHA database. It can be seen that if the ratio between the maternal and fetal components is sufficient, the algorithm is able to extract a high quality fECG. However, if the fetal component is of insufficient magnitude, and, in addition, the mECG component estimated using the ICA method contains fECG residues or noise, then the extraction by means of the RLS algorithm is less accurate, see example b) for recording r04 and d) for recording r11.
With the RLS algorithm, it is very important to choose the appropriate filter order so that the input mECG signal resembles as closely as possible the mECG component in the aECG � signal. It is then possible for the RLS method to subtract the modified mECG RLS component from the aECG � signal to obtain the fECG with the suppressed maternal component. The influence of the optimal filter order on the RLS algorithm is presented in Fig 17. Example a) shows the aECG � and mECG signals that were used as inputs to the algorithm in recording r05. Example b) demonstrates too low filter setting (sixth order), which led to an insufficient adjustment of the mECG component to the desired shape in the aECG � signal. The resulting fECG thus contained unnecessarily significant residues of the parent component. Example c) shows the ideal setting (16th order filter), which was able to sufficiently suppress the maternal component. Example d) shows a too high filter order (100th order filter), which was able to suppress the parent component. In addition, the fECG component was also suppressed, which affected the resulting fHR trace.
Moreover, for the same record, we demonstrate the effect of different RLS-based filter setting on the quality of the extraction using a 3D graph, see Fig 18. This graph takes into account Therefore, although the input signals for this recording were of a high quality, improper setting of the RLS filter caused poor outcomes.
When using the EEMD method, the optimal setting of the N and N std parameters is crucial. The influence of these parameters on the resulting extraction quality is shown in Fig 19. The upper signal in all examples is the result of the most unsuccessful setting of the EEMD algorithm, while the lower signal represents the setting of the EEMD algorithm for which the best extraction results and most accurate fHR determination were achieved. Examples a) and b) represent the influence of the optimal setting of parameter N, while keeping N std constant. If 10 ensemble trials were used for the recording a13, then the EEMD method failed to suppress the residues of the maternal component and produced worse results (ACC < 71%) comparing to N = 60 ensemble trials (ACC > 97%). When using 60 ensemble trials the method was able

PLOS ONE
to filter the residues, and consequently, it was possible to detect the fQRS complexes more accurately. The same influence can be seen for the recording a19. However, in this case worse results were obtained using 60 ensemble trials (ACC < 75%), while better were achieved for N = 10 (ACC > 98%). Examples c) and d) present the influence of the optimal setting of N std , while keeping the same value of N. For the recording a22, the EEMD method failed to suppress some maternal components (ACC < 47%) when assuming N std = 0.9. The use of N std = 0.1 for the added noise series was more appropriate in this case, leading to the highest level of the In addition to the optimal setting of the N and N std parameters, the correct selection of extracted IMFs plays an important role since they are used to compose the resulting fECG signal. An example of the optimal selection of IMFs (IMF3 + IMF4) for r02 recording, in which ACC = 100% was achieved in the detection of fQRS complexes, is shown in Fig 21a). Example  Fig 21b) presents an inappropriate selection of IMFs (IMF2 + IMF4), where the selected component IMF2 corresponds to the noise. This noise contained in the resulting signal led to an impaired detection of fQRS complexes (ACC < 92%). Example c) represents an inappropriate selection of IMFs (IMF3 + IMF5) in terms of insufficient removal of mECG residues, which also affected the accuracy of R wave detection (ACC < 94%). The last example d) demonstrates the effect of inappropriate selection of IMFs (IMF5 + IMF6), where both IMFs do not contain a well-extracted useful signal and there is a complete loss of information about the fECG signal. It is almost not recognizable even visually and accurate detection of fQRS complexes is nearly impossible (ACC < 45%).
Another important factor besides the quality of the aECG recordings is the filter setting. Improper setting of the methods either led to insufficient suppression of the interference or, on the other hand, the interference was effectively removed, however, a part of the useful signal was also filtered out along with it. Nevertheless, the functionality of the method and finding the optimal setting were demonstrated only in physiological records. It would be appropriate to verify the effectiveness of the method also on pathological records. Unfortunately, there are currently very few publicly available databases with real records that contain annotations with reference positions of fQRS complexes, and no dataset with pathological fECG records exists. Future research should therefore focus on creating a comprehensive database containing records from clinical practice. The ideal database should contain hundreds of well-scanned

PLOS ONE
multichannel aECG records with a reference scalp fECG record, annotations indicating the positions of fQRS complexes, and annotations with reference T/QRS ratios. The database should contain records that have been obtained from both normal and pathological fetuses of different gestational age and different maternal health conditions. The database should include a description of how the signals were captured, information on electrode placement, gestational age of the fetus, fetal position, maternal age, maternal health, genetic predisposition, information on whether the pregnancy is at risk and whether it is single or multiple pregnancy.

PLOS ONE
Such a database would be used for further research on extraction methods, which could be compared with each other both in terms of fHR determination and for deeper morphological analysis.
As we tried to obtain the most accurate fECG extraction in order to perform a morphological analysis, a relatively wide range of parameters was tested (11 combinations of aECG signals, 50 values of RLS filter order, nine values of N std parameter, three values of N parameter, and 250 combinations of IMFs). In order to simplify the search for optimal parameters and reduce the number of algorithm runs, the number of parameters could be decreased. Such range of parameters could be chosen, for which we know that the method works relatively correctly and is sufficient for a reliable determination of fHR (e.g. four combinations of aECG signals, nine values of the RLS filter order, two values of the N std parameter, one value of the N parameter, and 12 combinations of IMFs would be tested, see Table 11).
Reducing the computational time of the algorithm can be achieved by optimizing the hyperparameters using a shorter section of the signal and applying this setting to the entire signal. A similar approach can then be used in clinical practice. Therefore, we performed the testing on signals from the FECGDARHA database using three variants, where a signal section of 15, 30, and 60 seconds was used for optimization. Subsequently, the whole five minute recording was filtered using the method with these set of parameters and a statistical evaluation of the accuracy of detection of fQRS complexes for the whole recording was performed, see Table 12. A signal section of 15 and 30 seconds cannot be considered sufficient, as poor results were achieved in the statistical evaluation of the whole record. Optimization using a 60-second signal proved to be the most suitable and accurate variant, as the subsequent statistical evaluation of the detection of fQRS complexes for the whole recording achieved almost as accurate results as in the case of parameter optimization for the whole recording length (five minutes). According to ACC, for eight records (r01, r02, r03, r05, r06, r08, r09 and r10), the accuracy difference was less than 1%. For recording r07 the value of ACC differed by 1.17%, for recording r11 by 2.63%, for recording r12 by 2.24% and the most significant difference was for recording r04, where the value of ACC differed by 5.56%.
Another variant, the advantage of which is to maintain the efficiency of filtration in the case of fetal or maternal movement, is optimization at regular intervals. The filter would be optimized for 60 seconds, following 60 seconds would be then filtered with these optimized parameters, followed by further optimization lasting 60 seconds. In clinical practice, parameter optimization based on fHR evaluation would also be beneficial. The parameters would be again first optimized for 60 seconds and with these set of parameter values the signal would be filtered. The algorithm would monitor whether the fHR is in the expected interval. If not, the optimization would be performed again. Of course, a combination of both mentioned

PLOS ONE
approaches is also possible, i.e. optimization at regular intervals and optimization based on the deviation of fHR from the expected interval. This approach also solves the problem that arises in practical use, when we do not have annotations with reference positions of fQRS complexes. The algorithm would receive both mHR information (which can be easily obtained from the aECG record) and fHR information. Based on the information on the mR-mR positions and the estimated fR-fR, the algorithm would check whether the fHR is in the expected interval or whether it has deviated and optimization is needed. A limitation of this approach is its use in pathological records, where fHR and mHR may vary.
The problem of the absence of information on the position of fetal heart beats can be further solved by combining NI-fECG with fetal phonocardiograph. It is possible to detect fS1 fetal heart sounds in the fetal phonocardiographic signal, which correlate with R-peaks with a slight time delay.
Based on the results presented, one can notice that the ICA-RLS-EEMD method was able to extract the fECG better and to determine the fHR more accurately than the conventional ICA method as well as all the approaches presented in our previous studies. Inaccurate extraction was the result of processing aECG recordings that were not well recorded and thus contained the fetal component of too low magnitude. Therefore, in the case of clinical application, it is essential to obtain high quality aECG signals. The extraction results can be negatively influenced, in particular, by insufficient adhesion of the measuring electrodes and their placement, fetal position and maternal movements. If the aECG signals contain fetal components with a sufficient magnitude, and do not contain a large amount of noise, it will be possible to perform a highly accurate determination of a non-invasive fHR. In addition, very promising results were achieved when performing non-invasive ST segment analysis. However, the computational complexity and the need to individually set the parameters of the RLS and EEMD algorithms for each recording, are disadvantages of the ICA-RLS-EEMD algorithm. If uniform parameters were used for all recordings, extraction would be less efficient.

Conclusion
This study develops our previous ideas on the ICA-RLS-EMD method introduced for extracting the NI-fECG from abdominal recordings. Promising results were observed, however the applied EMD method is burdened with the intrinsic limitation called mode mixing, significantly affecting the decomposition results. To prevent this phenomenon, an extended EEMD method was applied in this study. Our investigations focused on the possibility of improving extraction results using the ICA-RLS-EEMD procedure, based on the appropriate settings of the EEMD parameters. The comparison with the ICA-RLS-EMD and the other previously presented methods was shown. The evaluation was carried out by examining the ability to accurately detect the fQRS complexes and to determine the high-quality fHR. Two benchmark databases (FECGDARHA and PhysioNet Challenge 2013) were applied and the evaluation of the considered algorithms was performed using statistical parameters such as ACC, SE, PPV and F1-score. When testing the proposed ICA-RLS-EEMD on the FECGDARHA database an overall accuracy of over 80% was achieved for 11 out of 12  and F1-score = 84.08% [95% confidence interval: 80.75-86.64%] were achieved. The ICA-RLS-EEMD method provided better results than the conventional ICA method and the previously presented approaches, including the ICA-RLS-EMD. In addition, the effectiveness of the method was also demonstrated for the ST segment analysis, which was highly accurate in 7 of 12 records. However, the main disadvantage of the ICA-RLS-EEMD method is its computational complexity and the need to individually set the parameters of the RLS and EEMD algorithms for each recording. Future research will include tests and optimization of the algorithm on signals captured using our proposed NI-fECG system. All relevant data (extracted signals from both used databases) are in S1 File named Extracted signals.