A systematic scoping review to identify the design and assess the performance of devices for antenatal continuous fetal monitoring

Background Antepartum fetal monitoring aims to assess fetal development and wellbeing throughout pregnancy. Current methods utilised in clinical practice are intermittent and only provide a ‘snapshot’ of fetal wellbeing, thus key signs of fetal demise could be missed. Continuous fetal monitoring (CFM) offers the potential to alleviate these issues by providing an objective and longitudinal overview of fetal status. Various CFM devices exist within literature; this review planned to provide a systematic overview of these devices, and specifically aimed to map the devices’ design, performance and factors which affect this, whilst determining any gaps in development. Methods A systematic search was conducted using MEDLINE, EMBASE, CINAHL, EMCARE, BNI, Cochrane Library, Web of Science and Pubmed databases. Following the deletion of duplicates, the articles’ titles and abstracts were screened and suitable papers underwent a full-text assessment prior to inclusion in the review by two independent assessors. Results The literature searches generated 4,885 hits from which 43 studies were included in the review. Twenty-four different devices were identified utilising four suitable CFM technologies: fetal electrocardiography, fetal phonocardiography, accelerometry and fetal vectorcardiography. The devices adopted various designs and signal processing methods. There was no common means of device performance assessment between different devices, which limited comparison. The device performance of fetal electrocardiography was reduced between 28 to 36 weeks’ gestation and during high levels of maternal movement, and increased during night-time rest. Other factors, including maternal body mass index, fetal position, recording location, uterine activity, amniotic fluid index, number of fetuses and smoking status, as well as factors which affected alternative technologies had equivocal effects and require further investigation. Conclusions A variety of CFM devices have been developed, however no specific approach or design appears to be advantageous due to high levels of inter-device and intra-device variability.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 mortality rate (RR 2.05, 95% CI 0.95 to 4.42, 4 studies of 1,627 participants) [12]. These findings suggest the current gold-standard antenatal methods have failed to reliably detect fetal compromise. One possibility is that this is due to their intermittent nature, as these current forms of antenatal monitoring are rarely used for longer than 90 minutes, and hence only provide a 'snapshot' of fetal wellbeing, thus delivering a false sense of reassurance.
Continuous fetal monitoring (CFM) offers the potential to provide an objective and longitudinal overview of fetal wellbeing by monitoring the fetus for prolonged periods of time (i.e. longer than 90 minutes but ideally 24 hours a day). As current methods used in clinical practice do not significantly improve perinatal outcomes, hence it is hypothesised that longer-term monitoring will increase the chances that signs of fetal compromise are detected [1]. Unfortunately US-based technologies (US scans, Doppler and CTG) cannot be used for a sustained period of time due to concerns about the unknown thermal effects of high intensity ultrasonic waves [15]. Therefore, novel technologies to continuously monitor the fetus for prolonged periods of time have had to be developed. To date, non-invasive devices with limited safety concerns have primarily focused on monitoring the FHR or FM pattern for extended time periods. In order to implement such devices into clinical practice, the views of healthcare professionals and pregnant women must be considered, both of whom have indicate that the use of CFM devices would be acceptable [16,17].
The potential benefit of CFM devices in clinical practise is hypothetical and unproven; to date no clinical trials have compared CFM to intermittent monitoring. For such comparative trials to go ahead, CFM technology needs to be proven to be reliable. This ensures that the standard of antenatal care will not be affected by the use of CFM devices, and the hypothesis that it may improve clinical outcomes can be tested. Nonetheless, various CFM devices exist within the literature; however a clinical review of these technologies has not yet been performed. Therefore, this scoping review aimed to describe the available evidence to provide an overview of CFM devices developed for use in antenatal care to date, and to determine areas for improvement. This review included devices developed which can monitor the fetal heart rate and/or the pattern of FM for long periods of time. The specific aims were to: 1) describe the design and detection technology employed in the devices; 2) compare the device performance of the different CFM devices; and 3) investigate factors which affect the devices' performance.

Search strategy
A scoping review was conducted using an adaptation of the methodology provided by the Joanna Briggs Institute [18]. A preliminary literature search was conducted by the primary review author (KT) using MEDLINE and Pubmed to ensure no previous scoping reviews had been conducted to assess current CFM devices developed for use in antenatal care. This also identified relevant search terms related to CFM and antenatal care.
Following preliminary searches, a systematic search strategy was developed to identify fulltext articles using relevant headings (e.g. MeSH) and fields (e.g. ti,ab); adaptions were made according to the relevant databases. Eight electronic databases (MEDLINE, EMBASE, CINAHL, EMCARE, BNI, Cochrane Library, Web of Science and Pubmed) were searched by the primary review author (KT). The search strategy is described in detail in the S1 Table. relevant. There was no specified timeframe and articles of all languages were included; non-English papers were translated into English using a hospital-translation service. The search for guidelines was limited to national and international guidelines from the United Kingdom. Devices that were designed to monitor fetal wellbeing in labour were excluded. Further exclusion criteria were: articles not containing information relating to the aims of the review, review articles, expert opinions, commentaries, letters to the editor, or if the full-text was unable to be retrieved.
Following the removal of duplicated articles, two reviewers with clinical experience (KT and AH) independently screened all titles and abstracts to determine their eligibility with regard to the inclusion and exclusion criteria, as stated previously; the articles were then categorised into one of three groups ('included', 'excluded' or 'uncertain'). The same definition for CFM was applied as mentioned in our previous systematic review [17]; only devices which can be used in the antenatal period, are non-invasive and those which can safely be potentially used for a sustained period of time were included in this review. Ultrasound-based technologies were excluded due to heating concerns associated with prolonged use [15].
All articles in the 'included' or 'uncertain' categories underwent independent full-text assessment by two reviewer authors (KT and AH). Where disagreements between the reviewers arose, a decision was made between the reviewers following a discussion. The rationale for articles being excluded having read the full-text were recorded.

Data extraction
A data extraction form was created and both quantitative and qualitative information was recorded from each study. Relevant information extracted from each study included the country of origin, study's aim(s), clinical context, cohort population size and characteristics, duration of fetal monitoring recordings, type of fetal monitoring device used (including the name, if relevant), the device's design and detection technology, the device performance (as reported by the authors), problems identified with the device and suggested areas for development. Data extraction was performed by the primary review author (KT), apart from information regarding the signal processing which was documented by SC.
The device performance was reported in the studies as either: (1) device accuracy, (2) signal quality (SQ) or (3) success rate. For the purpose of this review, the SQ was defined as the percentage of total recording time in which a valid FHR trace was recorded, and the success rate refers to the proportion of successful traces using pre-defined study-specific criteria for success.

Data presentation and synthesis
Reporting of results split the devices into those which are primarily concerned with the recording FHR or FM.
For studies that presented data on the signal quality of FHR devices, meta-analysis was performed using the metaprop command [19] in STATA version 14 (StataCorp, TX, USA); the study signal quality and 95% confidence intervals (CI) were calculated for each study. I 2 , a statistic derived from Cochran's chi-squared statistic Q, was calculated to describe the variability between the studies that is due to between-study variability, rather than chance [20]. An I 2 value of <30% was classified as low heterogeneity between studies, 30-59.9% as moderate, 60-89.9% as substantial and �90% as considerable [21]. A random effects meta-analysis was used in anticipation of heterogeneity due to differences in study design and populations. When studies presented non-parametric data, estimated values of the mean and standard deviation were calculated using established methods [22], in order to enable comparison using meta-analysis; when data were converted it has been stated in the results. Where meta-analysis was not possible, descriptive statistics were used.

Results
The electronic search generated 4,865 hits, 22 additional studies were found by hand searching (Fig 1). Following the deletion of duplicates, 3,194 records were screened by reading their titles and abstracts. This resulted in 3,088 records being excluded, most often because they were unrelated to the topic of interest, were only applicable to intrapartum fetal monitoring or the device was unable to be used for prolonged periods of time. In total, 106 records underwent full-text review for their eligibility, resulting in 63 papers being excluded. Finally, 43 papers were included in the review. Characteristics of the included papers are shown in Table 1, and S2 Table lists the papers excluded following full-text assessment alongside their reason for exclusion.

Technical description
The technical details of the FHR devices are described in Table 2. With regards to fECG, there were 16 different reported device designs and technologies. Although all devices used different types and quantities of electrodes, ranging from 2-16 electrodes, all devices required a reference electrode. In addition, the arrangement of the sensors varied; all electrodes were placed abdominally, generally around the umbilicus, with the exception of four devices [32,35,41,43] which required additional thoracic electrodes. A variety of signal processing approaches were used; the most frequently described process involved the removal of the mECG trace from the device signals.
With regard to fPCG, there were five different device designs and detection technologies. All the devices used between one and four abdominal phonocardiographic sensors, and two studies stated a specific placement of the sensors-either directly over the fetal heart [37] or in a predetermined abdominal position [41]. Furthermore, two studies securely attached the sensors to the maternal abdomen using either belts [32] or a 3-D printed plastic harness [41]. All fPCG devices utilised different signal processing methods. ECG data was collected in parallel for three studies [32,37,40] for either validation or to aid signal recovery, with one study using a US-based Doppler instrument for reference purposes [45].The single combined fECG/fPCG device used a combination of electrodes and acoustic sensors, secured to the abdomen using a wireless belt, to monitor the FHR [46]. The FHR was independently detected by both methods and the signals combined to produce a reliable FHR trace.  Performance of fECG devices. There was no common means of assessment to determine the accuracy or success rates of the devices in the included studies. Therefore, the accuracy or success of different named devices is not directly comparable; however studies which used the same named device were compared.
Eight out of twelve studies utilising the Monica AN24 device reported the SQ and the variability. The mean SQ was 68% (95% CI 48-87%), however there was considerable heterogeneity between the studies (I 2 = 97.94%) (Fig 2). There was no significant relationship between Skin cleaned before electrode application.

PLOS ONE
Systematic scoping review of devices for antenatal continuous fetal monitoring study size and the estimated SQ (p = 0.53, r 2 = 0.06). A sub-group analysis was performed to look at the effect of converted data on heterogeneity, however no significant effect was seen (P = 0.093) between the studies with unconverted data (mean SQ 86%, 95% CI 79-94%) and those with converted data (mean SQ 59%, 95% CI 28-90%). The remaining four studies could not be included in the meta-analysis as they either did not provide an overall SQ [36,58] or they did not provide all of the required data (e.g. SD or IQR) [38,39]. Both Telefetalcare studies [29,53] reported identical values for the accuracy (91.3%) and sensitivity (92.9%) of the device at detecting fetal QRS complexes. The prototype of the Telefetalcare device had a reported accuracy of 95% following testing on a single participant [28]. The FECGV1 device detected 72% of fECG complexes (P and QRS waves) overall [35]. The Cardiolab Babycard device had both a sensitivity and specificity of 100% when detecting fetal distress in women suffering from pre-eclampsia [43].
Ten studies which utilised unnamed fECG devices described the proportion of successful traces using pre-defined criteria for success (Fig 3). The criterion was either: traces being Estimated average signal quality of fetal electrocardiography recordings from eight studies which used the Monica AN24 device. Black markers represent the signal quality with 95% confidence intervals (CI) (whiskers). The size of each grey square represents the relative weight in the meta-analysis. The diamond represents the signal quality summary value. Asterisk ( � ) represents studies which had the median and interquartile range/range data converted to the mean and standard deviation.
https://doi.org/10.1371/journal.pone.0242983.g002 above a certain level of quality [25,27,31,44,48,57]; traces with successful signal separation [55]; traces with successful estimation of the FHR [32,37]; or traces with successful measurement of the T/QRS ratio [30]. The success rates have a broad range (42.9-100%), and there is no obvious association between the year of the study and the success indicating no obvious improvement over time.
Performance of combined fECG/fPCG devices. Mhajna et al. [46] assessed the accuracy of the Invu system at recording the FHR using simultaneous CTG; a highly significant correlation was reported been the two modalities (r = 0.92, p<0.0001).
Factors which affect performance of fECG. Studies reported data on a variety of factors which could affect the performance of fECG devices: gestational age (15 studies), BMI (8 studies), time of day (5 studies), maternal movement (4 studies), fetal position (5 studies), location of recording (3 studies), maternal-fetal complications (5 studies), uterine activity (1 study), amniotic fluid index (1 study), multiple fetuses (1 study) and smoking status (1 study). These factors will be addressed in turn.
Fifteen studies investigated the association between the fetal gestational age at the time of the recording and device performance. The gestational age was broken down into distinct but unstandardized categories (e.g. 16 +0 to 19 +6 weeks' gestation) in ten of these studies [25,33,34,36,38,40,44,49,52,55], as shown in Fig 4. Overall, there was a reduction in the device performance in the middle gestational age categories, roughly from the start of the third trimester (28 +0 weeks) until 35-36 weeks' gestation. Moreover, the device performance was greatest at term or in post-term fetuses. Of the five studies not represented in Fig 4, a further two studies reported a decrease in the success between 24-34 weeks' gestation [58] and difficulty detecting a fECG signal at gestational ages 31 and 34 weeks [32]. The remaining studies reported opposing findings: one found the majority of unsuccessful traces were between 37-39 weeks' gestation [30]; a single study found a weak positive correlation between the SQ and gestational age (p = 0.05) [26]; and one study determined that the relative gap duration between successful traces significantly decreased with increasing gestational age (p = 0.04) [56].
The effect of maternal BMI on the success rates was investigated in eight studies. One study found that the majority of participants whose fECG traces were unsuccessful had a BMI greater than 24.9 kg/m 2 [30], whereas another determined that BMI had no effect on the SQ apart from fetuses with a gestational age of 20 to 25 +6 weeks, where BMI negatively correlated with the fECG SQ (p = 0.04) [34]. Furthermore, Van Leeuwen et al. [56] reported participants with higher BMIs had longer durations between valid FHR traces (p = 0.009) and a greater percentage of recording time with gaps (p = 0.03), as well as there being a trend to a lower proportion of valid fECG data being obtained (p<0.10). The remaining five studies reported that BMI had no significant effect on the device performance [26,33,36,52,54].
Three studies [26,38,51] provided comparable data investigating the association between time of day and SQ; SQ is significantly greater when recordings are taken at night in comparison to the overall SQ (Fig 5). A trend towards significance is also observed in the SQ of recordings taken 'at rest' or 'at night and at rest' compared to overall SQ. A further two studies [33,36] reported a greater SQ during the night.
When assessed, maternal movement always negatively affected the fECG device performance. Both Crawford et al. [26] and Huhn et al. [36] quantified the level of maternal movement using an arbitrary scale and reported significance levels of p<0.05 to p<0.0001. A further two studies reported the effect of signal loss due to maternal movements, although the fECG signals returned after a period of inactivity [27,29]. The position of the fetus within the maternal abdomen was found to have no significant effect on the SQ in three studies [33,34,36], however the study conducted by Graatsma et al. [33] had a trend towards statistical significance (p = 0.06). Conversely, two studies stated that the fetal position had an effect on the fECG device [28,56]. One determined that fetuses in a breech or transverse position, when compared against those with cephalic presentation, had a greater proportion of gaps in their fECG traces (mean 65±30% vs. 29±23%; p = 0.008), a longer duration between valid FHR traces (2.2±0.5 seconds vs. 1.8±0.2 seconds; p = 0.02) and a trend towards a lower proportion of valid recording time (3±3% vs. 20±22%; p = 0.06) [56]. The other study simply stated "the quality of the fECG signals strongly depends on the position of the fetus inside the maternal abdomen" [28]; however this study only used a single participant and did not provide any quantitative data alongside this statement.
Two studies found no statistical difference in the SQ of recordings taken at home and those in the hospital [33,34]; however these studies simply compared the overall SQ against location. Following analysis of the SQ throughout 24 hours, a higher SQ was found in hospital group during the day (daytime: hospital 43.3% vs home 40.2%) but was lower in the night-time (night-time: hospital 71.1% vs home 86.8%), when compared to home SQ (p<0.001) [36]. An estimated fetal weight less than the 10 th percentile was not reported to have any effect on the performance of fECG devices [36,39,56]; these three studies used different terminology for the small fetal size (fetal-growth restriction, small-for-gestational age and intrauterine growth restriction). Kapaya et al. [39] did report a large difference in the mean success rate of SGA (48.6%) fetuses compared to appropriate-gestational age fetuses (75.7%), however no statistical analysis was performed on this specific data to determine any significance. Graatsma et al. [34] stated that no maternal-fetal conditions affected the fECG device, however the authors did not fcne these conditions and this cohort of hospitalised women were compared to women who had home recordings, hence it is unclear whether this was a confounding variable. In addition, the success rate of fECG recordings was not affected by structural heart disease [58]. On the other hand, participants suffering from pre-eclampsia had a greater proportion of valid recording time (p = 0.01) and fewer gaps in the FHR trace per hour (p = 0.04) [56].
Only Crawford et al. [26] investigated the relationship of SQ and uterine activity and determined there was a strong negative correlation between these variables (p<0.001; r 2 = 0.79). One study [36] reported that women with a low amniotic fluid index (�5 th percentile), also known as oligohydramnios, had a lower SQ compared to women with a normal index, although this difference was not significant (mean SQ: 12.0% vs. 48.5%; p = 0.096). This study did not report any effect of a high amniotic fluid index (�95 th percentile) as no participants satisfied this criteria.
Taylor et al. [55] reported the fECG signal separation in singleton, twin and triplet pregnancies was successful in 85%, 78% and 93% of fetuses, respectively. All fetuses with separation success displayed clear P, Q, R and S waves, and T waves were able to be identified in 63%, 59% and 57% of successful traces, respectively. No trend, or lack of, between the number of fetuses and the success of fECG trace analysis was reported by the authors.
A single study [38] investigated the relationship between the maternal smoking status and the SQ: non-smokers had a significantly greater SQ in comparison to women who were current smokers (median SQ: 57.2% vs. 37.5%; p = 0.05). Overall, the included studies show that gestational age, maternal movement and the time of day have a clear effect on performance of fECG devices ( Table 3). The effect of the remaining factors is unclear due to conflicting data and limited evidence from single studies in other cases.

FM devices
Eight out of the 43 studies were specifically concerned with a FM device. Five of these studies utilised accelerometers, specifically the fetal movement acceleration measurement (FMAM) device [59,60,62,63] or the accelerometer-based fetal activity monitor (AFAM) device [66]. The remaining three studies utilised fVCG devices to quantify FM, all of which were unnamed [61,64,65]. Technical description. The technical details of the FM devices are described in Table 4. Two accelerometer device designs and detection technologies were presented, the FMAM device and AFAM. The devices differed in the number of sensors used, the former required two sensors and the later used four, as well as the arrangement of their sensors. Both devices required adhesive tape to attach the sensors to the maternal abdomen, however, the signal processing methods varied with the FMAM device having a sampling rate five times greater than the AFAM.
Three fVCG devices were presented, all of which appear to be from the same Dutch-based research group and hence required eight electrodes placed in a circle on the abdomen with a central reference electrode at the umbilicus, although one study [65] did not report the specific placement of electrodes The signal processing methods greatly differed, however all required removal of the MHR trace. Device performance of accelerometers. The four studies concerning the FMAM device described either the device success rate or the accuracy. Kamata et al. [59] and Ryo et al. (2018) [62] defined a successful recording as those with greater than four hours of recording time; the respective success rates being 94.3% and 75.3%. The remaining two studies [60,63] reported the accuracy using prevalence-adjusted bias-adjusted kappa (PABAK)-a measurement of the agreement between FM detected by the FMAM device and simultaneous US. Both reported similar mean PABAK values of 0.83 (SD±0.04) (n = 44) [60] and 0.79 (SD±0.12) (n = 45) [63] from 30 minute recordings. Mesbah et al. [66] reported the AFAM had an average accuracy of 55%, with a sensitivity and specificity of 59% and 54% respectively (n = 3).
Performance of fVCG devices. Two studies reported the sensitivity and specificity of fVCG devices against simultaneous US monitoring; one study [61] had a mean sensitivity of 67% (SD±24%) and a specificity of 90% (SD±8%) (n = 4) and the other [65] reported values of 47% and 87% respectively (n = 8). The remaining fVCG study [64] did not provide any quantitative information.
Factors which affect performance of accelerometers. Due to the limited number of studies including FM devices there is little data on the factors which affect the performance of these devices. Nonetheless, there is some degree of information regarding accelerometers, specifically the impact of gestational age (3 studies) and the positioning of the sensors on the maternal abdomen (1 study). Such information was not available for fVCG devices.
Two studies [60,63] which utilised the FMAM device reported increases in the accuracy (PABAK values) in late pregnancy; although neither study verified significance with statistical analysis. The study using the AFAM [66] showed a greater sensitivity achieved at 35 weeks' gestation in comparison to 32 weeks' gestation (76% vs. 50% and 52%, n = 3).
The position of the sensor on the maternal abdomen did not greatly alter the correlation (PABAK value) between gross FM detected using the FMAM device and US; a sensor positioned where the mother most strongly perceived FM had an overall correlation of 0.79 (SD ±0.12), in comparison to 0.76 (SD±0.15) on the opposite site across the abdominal midline [63].

Discussion
This is the first review which studies the use of CFM devices in antenatal care, with specific interest in the performance of the devices and relevant factors which affect this, as well as the devices' design and the technologies employed to detect FHR or FMs. Fourty-three relevant articles were included which identified 24 different devices using four suitable technologies for CFM.
This review was strengthened by the use of a systematic search strategy using multiple databases and a broad scope which enabled the inclusion of a wide range of study designs, with no restrictions placed on the language or country of origin. However, any review is susceptible to publication bias and the omission of relevant articles, and quality assessments were not performed according to the design and conduct of a scoping review. In addition, it is important to acknowledge the fact that patented technologies which are not currently available in the public domain could exist, but for this reason could not be included. The lack of common device assessment in the source publications limited statistical analysis, and where meta-analysis was performed some non-normally distributed data was converted to approximated normally distributed values; the impact of this data alternation on the pooled estimates is unknown. Furthermore, the relative weights in the meta-analysis did not consider the replication of data in studies conducted by the same clinical research group as this information was not explicitly mentioned in any studies.
A variety of device designs and technological approaches were identified, however due to the reporting of data, we were unable to deduce a technology that appeared to be advantageous in terms of the reported device performance. However, fECG devices have been more widely investigated and hence offer a greater opportunity to be optimised and implemented into routine clinical practice than the other technologies. This may be because fECG devices are well established and the included fECG studies date back to 1980 [25], in comparison to more recent advances made in alternative devices such as fPCG (2008) [45], fVCG (2008) [64], accelerometers (2011) [66] and combined fECG/fPCG devices (2020) [46]. There has been increasing interest in CFM over the years; Fig 6 shows the studies included in this scoping review only. It is anticipated future CFM research will equally investigate both FHR and FM devices.

Devices which monitor FHR
CFM devices which monitored the FHR were comprised of numerous fECG devices, fPCG devices and a single combined fECG/fPCG device. Currently only the Monica AN24 device, a fECG device, is in widespread use and was utilised in numerous studies, however its SQ exhibited high levels of variability between the different studies (mean SQ 68%, 95% CI 49-87%). Furthermore, studies with the largest cohort size showed the highest SQ suggesting that there is a possible learning curve in obtaining fECG recordings of optimal quality, however there was no statistically significant relationship between study size and SQ (p = 0.53, r 2 = 0.06). Whilst other FHR devices appear to have a better device performance, many have only been tested for short periods of time (median of 30 minutes) and hence their results must be viewed with caution with respect to CFM which would require longer-term monitoring. Various variables which could alter device performance were only evaluated in fECG device studies. A widely documented reduction in the performance of fECG devices was noticed between roughly 28 to 36 weeks' gestation. This has been attributed to electrical impedance from the vernix caseosa, a protective insulating layer which surrounds the fetus from 28 to 32 weeks' gestation and completely dissolves by 37 weeks' gestation [67]. The vernix caseosa has an impedance factor of 500-1000 higher than amniotic fluid [68,69] and hence significantly reduces the fECG amplitude, however the FHR can still be weakly detected due to signal transmission through the umbilical cord, oronasal cavity and gaps within the vernix caseosa [67,69]. Therefore, as the impact of gestational age on fECG trace quality is widely understood, further studies are not required to test this specifically but gestational age should be taken into consideration when evaluating the impact of other factors. Regarding novel fECG devices, ideally these should be initially tested after 37 weeks' gestation to minimise signal disruption, as this will elucidate whether the devices can accurately and reliably extract FHR and/or FM data.
In the majority of studies, BMI had no effect on the signal quality of fECG devices, hence fECG appears advantageous to conventional US-based technologies which are negatively affected by increasing maternal BMI [70]. Critically some studies deviated from this finding stating an opposing effect.
Maternal movement is a clear limitation of current CFM technology. Abdominal wall muscle contractions cause fECG noise interference with frequencies up to 500 Hz, often preventing reliable detection of FHR traces [36,71]. This further clarifies the observed decrease in daytime success rates, compared to night-time, due to high levels of maternal movement in a woman's day-to-day lives. Moreover, although the location (hospital vs. home) of fECG recordings does not affect the overall success rates, day-time recordings are more successful in the hospital setting as women will move less and are often confined to their bed, whereas in the night-time recordings are less successful in hospital than at home as women could have a lower quality of sleep due to the novel environment, ambient noise and interruptions by health care professionals and other patients [36,72].
Antenatal fECG devices have been widely extrapolated from those developed for use in intrapartum care, thus in theory the devices should be able to provide a sufficient SQ regardless of the level of uterine activity. Nonetheless, Crawford et al. [26] reported a strong negative correlation between uterine activity and SQ (p<0.001; r 2 = 0.79). One explanation could be that that uterine activity may mimic the effect of maternal physical activity causing disruption of the FHR signals. However, uterine contractions can trigger FHR accelerations [73], a reassuring sign of fetal wellbeing. It is important that CFM FHR devices have the ability to detect these significant changes in the FHR variability, as lack of accelerations indicate fetal compromise requiring possible clinical intervention. Whilst other factors may or may not affect the quality and success of fECG devices, further research is required to deduce those which have a significant impact.
All FHR devices in the presented studies had different designs and technological approaches. The studies failed to provide a clear rationale for the quantity of electrodes or sensors and the specific arrangement used. With respect to fECG devices, the electrode configuration can significantly influence the FHR signal quality; the number of electrodes must be optimised to maximise the signal-to-noise ratio whilst minimising power consumption, as well as considering the electrode orientation and placement [71]. A Dutch research group have proposed the use of five electrodes is optimal in the third trimester, whilst an extra electrode is required earlier in pregnancy due to a variable fetal position [71].
Whilst fECG devices have been the subject of a large proportion of studies, this technology does not appear to be advantageous to fPCG, which is relatively understudied. Mhanja et al. [46] successfully designed and tested a device which incorporated both fECG and fPCG technologies, the Invu system. Although this device successfully demonstrated the ability to combine two FHR technologies and showed a highly significant correlation with CTG (r = 0.92, p<0.0001), further evaluation is required with larger sample sizes for longer durations to determine feasible clinical utility.

Devices which monitor FM
FM devices comprised of fVCG technology and accelerometers, however the device performance was not comparable between the two technologies. In addition, the factors which affect the device performance were only assessed in accelerometers.
The correlation of FM detected by accelerometers compared to simultaneous US were greater in late pregnancy. This relationship has been attributed to the increasing strength of FM causing greater abdominal wall oscillations, which occurs throughout the progression of pregnancy [63]. On the other hand, a recent study conducted by Verbruggen et al. [74] reported the fetal kick force increases throughout pregnancy until 30 weeks' gestation and subsequently reduces until birth due to mechanical stress and strain. The included studies suggest FM devices may be more effective towards the end of pregnancy, although this was observed in a small cohort (n = 92) with a limited gestational age categories. Future use of larger cohorts will provide a clear overview of the association (or lack of) between the performance of accelerometers and fetal gestational age. Although not investigated using accelerometers, concerns have been raised about the impact of respiratory artefacts caused by sleep apnoea, a common co-morbidity of obesity, on the quality of accelerometer recordings [60].
The studies concerning the FMAM device [59,60,62,63] provided a clear rationale for the use of two sensors and their specific placement on the maternal abdomen and thigh. The other accelerometer device, the AFAM, and the fVCG devices did not justify the specific device design. Currently, the only FM device which has been studied for prolonged periods of time is the FMAM, which was used overnight in three studies [59,62,63]. Therefore the FMAM appears to be more technologically advanced to have the ability to record FM overnight, compared to the remainder of FM devices which have only been studied for a maximum of 30 minutes. Substantial changes in the FM pattern can be indicative of fetal distress and often act as a 'warning sign' prior to stillbirth [1,2]. However, FM patterns vary significantly between individuals and can alter weekly as pregnancy progresses [60]. At present, this demonstrates a key drawback of FM devices, and that considerable research is required to develop reliable and widely applicable FM count indexes to ensure CFM devices can accurately detect fetuses whose FM pattern deviates from normality.

Further advancements required in CFM
With increasing interest in CFM devices it would be beneficial to develop a standardised and systematic format of device assessment and reporting to aid comparison between the various devices. One proposition would be to determine the sensitivity, specificity and accuracy against concurrent use current gold standard methods using short recording periods (e.g. 20-30 minutes), and to report the signal quality in longer (i.e. >90 minutes) recordings. This approach would be applicable to both FHR and FM devices and will determine the suitability of devices for CFM, ensuring the FHR or FM pattern can be reliably recorded for long periods of time. It is anticipated that a standardised method of reporting will highlight favourable approaches. For CFM devices to be implemented into clinical practise, additional issues must first be addressed. Whether the device is primarily analysing the FHR or FM pattern, analysis must be individualised to each fetus and account for changes which occur in these parameters as pregnancy progresses [60], however theoretically 'normal' patterns must also be known. This includes the normal fetal sleep patterns, response to uterine contractions and the normal fetal movement pattern throughout the day. This awareness will ensure any deviation from the normal FHR or FM pattern is detected via the CFM device.
A significant reduction or sudden alteration in FM acts as a 'warning' sign prior to fetal death [75]; this pathological change is detected in 31-55% of cases by maternal instinct in the preceding week of stillbirth [75][76][77]. Therefore, another potential important development could be adding an interactive component for mothers to report significant events via a mobile phone application to aid clinical analysis and provide maternal reassurance that their possible concerns are being monitored. This could include detailing periods of gross fetal movement or lack of, instances of uterine contractions and other symptoms such as abdominal pain.
The current use of intermittent monitoring throughout high-risk pregnancies does undoubtedly provide reassurance to pregnant women, and helps to relieve anxiety. The implementation of CFM devices into clinical practice could reduce or potentially replace the need for current methods. However, concerns have been raised by women about the sole use of CFM in antenatal care, and many currently perceive CFM as an 'add-on' form of monitoring to current methods [26]. Nonetheless, the experimental use of CFM devices has already prevented adverse outcomes in a number of cases, specifically using the Monica AN24 [50] and the FMAM [78]. In addition to this, as well as providing reassurance to women that their baby is being actively monitored [17], the use of a CFM device increases maternal awareness of fetal wellbeing [60]. This demonstrates the multiple benefits which can be achieved through the use of CFM in clinical practice, enabling timely identification of compromised fetuses which in turn could assist in the reduction of the stillbirth and neonatal mortality rates. Nonetheless, whilst clinical studies are still undergoing to assess the reliability of CFM it is also important that intermittent monitoring continues to take place so there is no deviation from current 'gold-standard' forms of practice.

Conclusions
In conclusion, CFM could alleviate the intermittent nature of current antenatal fetal monitoring methods, providing an objective and longitudinal overview of fetal wellbeing. To date, numerous different CFM devices have been developed to address this need; however there is a high level of inter-device and intra-device variability and currently no approach appears to be advantageous. In addition, there appear to be numerous factors which affect the quality of CFM recordings, although these have only been investigated in fECG monitors and accelerometers. It is clear that gestational age, maternal movement and the time of day clearly alter device performance, however the evidence base for other factors such as the impact of BMI, uterine activity and the amniotic fluid index is sparse. Consequently, additional studies are required to specifically highlight the impact of such factors as this will help to aid the development of better devices and highlight certain pregnancies where the device's quality and diagnostic ability is reduced.
Overall, although CFM appears to be a viable form of fetal monitoring, at present the utility of CFM devices in routine clinical care cannot be strongly recommended due to the wide disparities between studies alongside the unclear impact of certain maternal and fetal factors. In order for this recommendation to be reviewed, first the devices must have reduced device performance variability and undergo further rigorous testing to ensure they can detect alterations in the FHR and/or the FM pattern, enabling prompt detection of fetal compromise.
Supporting information S1