Ensemble learning-based feature engineering to analyze maternal health during pregnancy and health risk prediction

Maternal health is an important aspect of women’s health during pregnancy, childbirth, and the postpartum period. Specifically, during pregnancy, different health factors like age, blood disorders, heart rate, etc. can lead to pregnancy complications. Detecting such health factors can alleviate the risk of pregnancy-related complications. This study aims to develop an artificial neural network-based system for predicting maternal health risks using health data records. A novel deep neural network architecture, DT-BiLTCN is proposed that uses decision trees, a bidirectional long short-term memory network, and a temporal convolutional network. Experiments involve using a dataset of 1218 samples collected from maternal health care, hospitals, and community clinics using the IoT-based risk monitoring system. Class imbalance is resolved using the synthetic minority oversampling technique. DT-BiLTCN provides a feature set to obtain high accuracy results which in this case are provided by the support vector machine with a 98% accuracy. Maternal health exploratory data analysis reveals that the health conditions which are the strongest indications of health risk during pregnancy are diastolic and systolic blood pressure, heart rate, and age of pregnant women. Using the proposed model, timely prediction of health risks associated with pregnant women can be made thus mitigating the risk of health complications which helps to save lives.


Introduction
Maternal health issues for women arise during pregnancy, childbirth, and the postnatal period [1]. At the time of pregnancy, women are at a higher risk of health complications which may lead to miscarriage and death in many cases. Every pregnancy stage must have a positive experience to ensure the good health of babies and women. Women who maintain their health women's pregnancy. In addition, an ensemble model, BiLTCN comprising BiLSTM and TCN is also used.
• The maternal health exploratory data analysis (MHEDA) is performed to study the health conditions which serve as the strongest indicators to predict different maternal health risks during pregnancy.
• The performance of the DT-BiLTCN is further enhanced by two factors in essence including data resampling for data balancing and fine-tuning of hyperparameters of the proposed model. The synthetic minority oversampling technique (SMOTE) is utilized to balance the health dataset in this study while hyperparameters are tuned to select the set of best-fit hyperparameters for better performance.
• A comparative analysis of several existing models is carried out with the proposed approach to analyze its efficiency. The receiver operating characteristic (ROC) accuracy curve analysis is conducted to analyze the performance of the proposed BiLTCN model at different risk thresholds.
The rest of this study is organized as follows. Related work is discussed in Section 2. Section 3 describes the architecture and methodological details of the BiLTCN model. Results and discussions are given in Section 4. In the end, Section 5 provides the conclusion.

Related work
The importance of maternal health led many researchers to devise models and approaches for the timely prediction of health risks during pregnancy using both traditional and machine learning techniques. Some studies focus on observing pregnant women's conditions for analyzing and recording health risk factors while other aims at the prediction. For example, a pregnancy risk detection system (PRDS) is constructed to detect the pregnancy risk level based on the pregnant woman's experienced symptoms [16]. The research observations are conducted continuously over time at Panimbang health center. Several health risk factors are observed like age over 35 or under 20 years old, having given birth before, history of disorder during pregnancy, and miscarriage. The pregnancy risk level is divided into three categories of high risk, low risk, and moderate risk.
Pregnant women's health risk levels are predicted and monitored by using cloud-based machine learning techniques in Bangladesh [17]. The technique identified the maternal health risk intensity by analyzing the pregnant women's health factors. A total of 1014 samples have been collected from numerous sources by utilizing wearable sensing devices. Python and WEKA toolkit platforms are used for model building. The maternal health risk levels are divided into low risk, mid risk, and high risk with the help of medical experts. The health factors like age, blood pressure, and blood sugar are identified as the key factors for high risk. Experimental results show that DT achieved a 97% accuracy score using hyperparameter tuning [18]. The pregnancy data is analyzed and utilized to build a machine learning classifier or health risk prediction during the pregnancy of women in [19]. The pregnancy data of 600 samples are collected from three medical centers in Bangalore. The employed DT model achieved a 71% accuracy score. The relative absolute error value is 99% and root related squared error value is 99% for the DT classifier. [20] predict health complications by using two machine learning-based classifiers. The study aims to reduce the fetal and maternal mortality rate by analyzing the pregnancy-related dataset. The classifiers are applied to identify the maternal health risk of pregnant women. Similarly, Naive Bayes (NB) and decision tree (DT) are employed in [21]. DT achieved an accuracy score of 66% while the relative absolute error value is 74% and root related squared error value is 110%.
Maternal mortality rate health risk factors are identified and observed using different data repositories in [22]. The study uses the root mean square error (RMSE) based on k-fold crossvalidation to evaluate the proposed model performance. The world indicator 2015 (WDI-2015) dataset is utilized for the model building which contains 1350 samples from 1960 to 2015. The linear regression showed an RMSE of 0.709. The study asserts the need of reducing the rural population growth and ensuring necessities for better child care to reduce the pregnancy-related mortality rate. An efficient classification and regression tree (CART) binary DT approach is proposed to predict the high pregnancy risk using fetal health status in [23]. The cardiotocography dataset from UCI consisting of 2126 fetal cardiotocograms is utilized for risk prediction. Using a 5-fold cross-validation, it achieves an 88% accuracy.
Besides using the machine learning models on the collected data, some approaches devise telemedicine or electronic online monitoring systems for data collection. For example, a telemedicine framework is proposed to help health doctors in predicting pregnancy health risks in the Philippines [24]. The cloud-based dataset [25] of 97 samples was utilized to train and test four machine learning techniques such as DT, random forest (RF), k nearest neighbor (KNN), and support vector machine (SVM). RF achieved a 90% accuracy score by employing hyperparameter tuning. The telemedicine framework on imputing pregnant women data displayed the negative or positive cases concerning the possibility of pregnancy high-risk factors. Similarly, an Internet of things (IoT) based integrated system is developed in [26] to monitor the fetal and maternal signals for high-risk pregnancy predictions. IoT devices are used for fetal and maternal monitoring. A 1D convolutional neural network (CNN) [27] is utilized to predict the health risks during pregnancy. The maternal clinical factors and the heart rate of the fetal are monitored. The utilized maternal factors are diastolic/systolic blood pressure, oxygen saturation, temperature, heart rate, and uterine tonus activity. The proposed system achieved an accuracy score of 92% for fetal and maternal emergencies. Similarly, CNN is reported to have superior performance for distracted driver detection in [28].
Ensemble models show better performance than individual models, so they have been adopted for pregnancy health risk prediction as well. An ensemble classifiers-based [29] approach is utilized to predict the birth mode in this study [30]. The study provides the proper identification of health risk levels accomplice with pregnant woman delivery and helps reduce the mortality rate in Bangladesh. The research dataset consisting of 4493 samples is collected from the Demographic and Health Survey (BDHS-2014) in Bangladesh. The ensemble classifier achieved an 86% of accuracy score for birth mode classification. The maternal health risk prediction based on electronic health registries using an RF is proposed in [31]. The relevant features are selected by the recursive feature elimination (RFE) for the decision-making task. The systematic process of data analysis, preparation, and modeling are developed. The study uses several machine learning-based models with a grid search strategy using k-fold cross-validation and hyperparameters tuning. RF achieves an accuracy score of 93% in terms of maternal health risk prediction.
The discussion of the above-cited studies indicates that several aspects need further research efforts. First, the prediction accuracy is comparatively low for the majority of the works, except for a few single and ensemble models. Second, predominantly, studies focus on binary classification and do not consider the risk levels. Third, deep learning models are seldom studied. This study employs two deep learning models and predicts the health risks related to pregnant women into low, medium, and higher levels.

Materials and methods
This section elaborates on steps related to employed techniques and methods to predict maternal risk. The flow diagram of the adopted methodology is visualized in Fig 1. It contains several steps which are followed sequentially.
• Step 1: The maternal health-related dataset is acquired in this step. The dataset is obtained from the Kaggle repository and is publicly available. Originally, Marzia et al. created the dataset using the IoT-based risk monitoring system [32]. The doctors structured the dataset and made it available for research purposes. The collected data contain various prominent health factors related to maternal health risk prediction.
• Step 2: MHEDA is applied to examine the pregnancy risk factors and get useful insights from the collected data. MHEDA contains numerous analysis graphs. The statistical data analysis, violin graphs, pie charts, feature relation analysis, and many more graphs are analyzed to get useful health factors and health conditions during pregnancy.
• Step 3: This step contains the dataset resampling technique implementation. The collected health dataset is imbalanced which hinders the full potential of the proposed approach.
Training of the proposed model may become overfit due to class imbalanced data distributions. To balance the dataset, SOMTE is utilized.
• Step 4: The encoding of the category of health dataset is utilized in this step. The low risk, medium risk, and high-risk labels are converted to numeric forms 0, 1, and 2, respectively. The label encoding is applied to convert the target label into machine-readable labels. Now the dataset is in a structured format for the next research steps.
• Step 5: The dataset splitting is carried out in this step. The splitting depends upon the split of two portions of data one for proposed model training and the other portion of data for model testing and evaluations. The 80% of data is used for the train portion and 20% data for model evaluations.
• Step 6: The DT-BiLTCN deep learning-based model is built in this step. The DT-BiLTCN is a hybrid of DT, Bidirectional LTSM, and TCN models. It is tested and evaluated with 20% of the dataset. The DT-BiLTCN model is used for feature extraction. The extracted feature set is later used to train all the machine learning models and the evaluation is carried out using accuracy, classification report, F1 score, ROC curve accuracy, precision, and recall score.
• Step 7: Based on the model performance evaluation result, hyperparameter tuning is employed in this step. The model hyperparameter tunning is applied to get more accurate results from the proposed model. The number of epochs, model layers, vocabulary size, loss function, accuracy matrix, loss optimizer, and the total trainable parameters involved are the hyperparameters of the proposed model.

Maternal health dataset
The maternal health data has been collected from different hospitals, community clinics, maternal health cares through the IoT-based risk monitoring system [33]. Marzia et al. from the Institution of Daffodil International University, Dhaka, Bangladesh created the dataset and it is publicly available [32]. The benchmark dataset is publicly available on Kaggle [34]. The dataset contains 7 features including age, SystolicBP, DiastolicBP, BS, BodyTemp, HeartRate, and RiskLevel as target classes. Table 1 describes the maternal health dataset using featuresrelated information. The total maternal health data sample size is 1218.

Maternal health exploratory data analysis
MHEDA refers to the process of discovering dataset patterns, hypothesis tests, and assumption checks by utilizing the graphical data representations and statistics summary of the dataset. MHEDA helps to summarize the main dataset characteristics and features relation analysis. The feature relation analysis and data visualization methods help in the proposed model's prediction process. Results of statistical dataset features analysis are given in Table 2. The feature statistic is based on the count, mean, standard deviation (std. dev.), minimum (min), 25%, 50%, 75%, and maximum (max) values. The analysis demonstrates that the dataset contains 1218 rows for each feature. The minimum values show the lowest limit, and the maximum values show the highest limit for all features. The minimum age of involved patients is 10, and the maximum age is 70 years. The analysis demonstrates that with the maximum age of 70 there is

PLOS ONE
Ensemble learning-based feature engineering to analyze maternal health during pregnancy an indication of low risk during pregnancy. The systolic BP has a minimum value of 70, and the highest value is 160. Diastolic BP has a minimum value of 49, and the highest value is 100. Patients' minimum and maximum heart rate values are 70 and 90. All the dataset features are analyzed statistically to check the behavior of each patient's values in predicting the maternal health risk.

PLOS ONE
Ensemble learning-based feature engineering to analyze maternal health during pregnancy that from the age of 10 to 13 there is a high level of risk during pregnancy. In the age range of 18 to 30, the risk level is normal in all aspects. Between the age of 40 to 60 pregnant women face a high-risk level. The distribution analysis between the systolic blood pressure of pregnant women and the risk level is analyzed in Fig 2b. The analysis describes that systolic blood pressure of 140 causes a high-risk level. The 80 to 100 has a low-level risk of systolic blood pressure. The distribution analysis between the diastolic blood pressure of pregnant women and the risk level is analyzed in Fig 2c. The analysis shows that diastolic blood pressure at 100 causes high pregnancy risk levels.
The correlation analysis is a statistical technique utilized to find out one feature's change concerning the other feature of the dataset. We found the degree of the meaningful relationship and association among different dataset feature variables. The correlation analysis gives an association measure value between different dataset variables. The analysis demonstrates that the body temperature features are highly negatively correlated with all dataset features. The features systolic BP and diastolic BP have a low negative correlation with only the heart rate feature. Other dataset features have a positive correlation. A high positive correlation is between the systolic BP and diastolic BP features in context to the classification problem. The maternal health dataset feature correlation analysis is shown in Fig 3. The 3-dimensional (3D) relation analysis of prominent dataset features with maternal pregnancy risk level and the age of the women are examined. The 3D scatter plots are used to plot data points on three axes to show the relationship between three variables. The conducted analysis is visualized in Fig 4. The blood glucose (BS)levels in terms of the molar concentration, age, and risk level are analyzed in Fig 4a. We take the age, BS, and risk levels on the x, y, and z-axis, respectively to show the maternal health risk. It is observed that the risk level is low when the BS is between 6 to 8 and the age varies from 10 to 70. Risk level changes to medium when the BS is between 8 to 18 and age is between 10 to 50. The risk level is high when the BS is between 10 to 18 and the age is between 10 to 50.
In Fig 4b age, heart rate, and risk level is taken on the x, y, and z-axis, respectively. The graph shows that the risk level is low when the heart rate is above 60 and the age is from 10 to 70. There is a low-risk level when the age is 10 to 35. The risk level is medium when the heart rate is above 80 and the age is between 10 to 50. The risk level is high when the heart rate is above 80 and the age is between 10 to 70.
In Fig 4c age, diastolic blood pressure (BP), and risk level are taken on the x, y, and z-axis, respectively to show the maternal health risk. The risk level is low under the age of 70 and DBP is above 50. There are higher chances to remain at risk level low when the age is between 10 to 30 and DBP is above 60. When the age is under 50 and DBP is above 50 the risk level is medium. The risk level is high when the DBP is above 70 and the age is under 60. In Fig 4d age, body temperature, and risk level are taken on the x, y, and z-axis, respectively. When the body temp is 97 and the age is between 10 to 70 the risk level is low. The graph also shows that when the age is under 30 the body temperature varies from 96 to 103 however the risk level remains low. When the body temperature is 97 and the age is under 40 then the risk level is medium and the graph also shows that the body temperatures vary from 97 to 103 under the age of 40 the risk level remains medium. The risk level is high when the body temperature is above 100 and under the age of 30.
The pair plot is utilized to find the best set of data features that elaborate a relationship between two features and is used to form the most separated clusters. The pair plot analysis among all the dataset features with the risk levels are visualized in Fig 5. This analysis demonstrated the distributions of every single feature with all other dataset features. The feature plots are in the matrix form in which the column represents the y-axis and the row represents the x-axis. For each feature, the main diagonal is the univariate distribution. In this analysis, the hue parameter is set to the value risk level feature. The feature scatters subplots are utilized to find the pairwise relationships among the distributions

Maternal health data resampling
The maternal data resampling is conducted to achieve the best results from the proposed BiLTCN approach. Data resampling is applied to generate samples of equal weights and quantify the uncertainty of risk levels. SMOTE [35] is utilized to balance the maternal health dataset. From Fig 6, it can be observed that the number of samples for each class is balanced. The actual dataset contains 406 instances for low risk, 336 for mid risk, and 272 for high risk. By applying the data balancing technique, all classes have an equal 406 instances.

Maternal risk label encoding
The label encoding is performed to convert the target category (label) into a machine (model) readable form [36]. The maternal risk Level categories low risk, mid risk, and high risk are encoded as 0,1,2 respectively. These encoded labels are used for maternal risk prediction by the BiLTCN model.

Proposed DT-BiLTCN feature engineering
The proposed approach combines DT and BiLTCN for feature engineering on two rationales. First, DT and BiLTCN individually perform well for the task at hand. Owing to their performance, these models are combined in the proposed approach. In addition, we experimented with other combinations, however, the performance of those combinations was inferior. Secondly, it is empirically found that the use of prediction probabilities as features for model training yields better results as compared to using raw features. Therefore, this study proposes the use of prediction probabilities as the features for maternal health risk prediction.
Machine learning and deep learning-based feature engineering approaches are deployed in this study to improve the performance of models. Feature engineering is performed using a combination of DT and BiLTCN. BiLTCN is the hybrid of Bidirectional LSTM [37] and TCN deep learning techniques. DT-BiLTCN approach is utilized as a feature engineering technique in this study. The feature set extraction and formation mechanism from the dataset using the DT-BiLTCN is shown in Fig 7. The use of the DT-BiLTCN model for feature extraction, in the perspective of maternal health risk prediction, is illustrated in Fig 8. DT-BiLTCN approach is based on the hybrid of DT and BiLTCN models, so a brief description of each of these models is provided here.
Decision tree. DT is a supervised machine learning model and is commonly used for classification tasks.Classification or prediction is based on decision rules inferred from training data. DT has a tree-structured model, where the features of data are represented on internal nodes; the decision rules are represented by branches, and leaf nodes represent the outcomes. DT compares the values of the root attribute of the tree with the record attribute to predict the target class. The comparison of values continues until it reaches the leaf node of the tree. DT model is selected based on the reported performance in existing literature [38,39]. Tree split criteria are important which determine the level of impurity and vary regarding classification and regression tasks. For example, Gini and Entropy are the commonly used criteria for classification while for regression tasks, mean squared error and half Poison deviance are used. This study uses Entropy with the following equation BiLTCN model. A deep learning-based novel BiLTCN model is also proposed in this study for performance comparison with the DT-BiLTCN model. The BiLTCN is the hybrid of Bidirectional LSTM and TCN deep learning networks. LSTM is an improved form of recurrent neural network (RNN) [40,41] that allows information patterns to persist for a long time. There are three parts of LSTM which are referred to as gates. Forget gate is the first part, the input gate is the second part, and the output gate is the last part of LSTM they are defined as OutputGate where X t shows the input for the current timestamp, U f is the weight attached with the input, (t − 1) is the hidden state for the previous timestamp, W f is the weight matrix attached with the hidden state, U i is the input weight matrix, H (t − 1) is the hidden state of the previous timestamp, and W i is the weight matrix of input attached with a hidden state. A BiLSTM [42] model contains two LSTM models where one LSTM takes the input information sequence in the forward direction (past to future) and the other LSTM in the backward direction (future to past). The BiLSTM input flow in both directions is used to preserve the future and past sequence information. The input X information sequence is computed in the forward and backward direction shown in Eqs 6, and 7. The final output cell Y at time t is constituted in both directions as in Eq 8.
The TCN [43] model is a variation of the CNN and is commonly used for sequence modeling tasks. The TCN working is to encode information from an input sequence. The TCN model has longer memory as compared to RNN with equal capacity. TCN utilizes causal convolutions, where at time t an output is convolved with only elements from the time t and the prior from the previous layer [44]. TCN utilizes a fully convolutional network (FCN) architecture [45]. The input layer has the length of each hidden layer. In the causal convolutional layer zero padding is added to the length (kernel size-1) to keep consequent layers the same length as the previous layer.
The dataset samples are fed into the DT and BiLTCN models for training which provide the predicted class probabilities as the output. However, instead of using class probabilities for prediction, they are used to train machine learning models. The feature set obtained from DT and BiLTCN are combined to make the hybrid feature set and is represented by V hf where DT f and BiLTCN f are the feature sets provided by the DT and BiLTCN, respectively, and are calculated as where P(c 1 ) DT and P(c 1 ) Bi show the probability of class 1 from the DT and BiLTCN models, respectively when the input data X is fed into these models. The X is the input dataset features (independent variables) that contain the whole records. The proposed BiLTCN model build configuration parameters are analyzed in Table 3. The table contains an analysis of the number of layers, the neuron units, layer activation function, output shape of layer, and the total parameters used. The first layer of the model is the embedding layer [46] which contains 50000 neuron units, (None, 6, 64) output shape, and 3200000 total parameters used. The second layer is the BiLSTM layer having 128 neuron units, (None, 6, 256) output shape, and 197632 total parameters are used. The next layer is the TCN layer having 64 neuron units, (None, 64) output shape, and 201536 total parameters are used. The next layers are the family of dense layer networks. The 3 dense layers have 32, 16, 8 neuron units, (None, 32), (None, 16), (None, 8) output shape, ReLU is activation function, and 2080, 528, 136 total parameters are used, respectively. The last layer is the output layer having 3 neurons output units with Softmax as activation function, (None, 3) output shape, and 27 are the total parameters.
The layered architectural analysis [47] of our proposed approach is illustrated in Fig 9. The first layer of the architecture is the input layer having the (None, 6) input and output shape. The second layer is the Embedding layer having the (None, 6) input shape and (None, 6, 64) output shape. Then in the layer stack, the Bidirectional LSTM layers are involved. The Bidirectional LSTM layer has an input shape of (None, 6, 64) and an output shape is (None, 6, 256). The next layer in the stack is the Temporal Convolutional Network layers that have an input shape of (None, 6, 256) and output shape of (None, 64). Then a family of 4 dense layers is involved. The last layer is the output layer of the architecture. The output layer has the input shape of (None, 8) and the output shape of (None, 3). This analysis demonstrated the layered stack of the proposed BiLTCN approach. The hyperparameter tuning is applied to find out a set of best-fit hyperparameters [48] for the proposed BiLTCN model. The involved hyperparameter is the number of epochs, number of layers, model vocabulary size, the loss function, accuracy matrix, loss optimizer, and the total trainable parameters. The final best-fit hyperparameters for the BiLTCN model are given in Table 4. The proposed model is trained for 50 epochs. The model contains seven layers and the vocabulary size is 50000. The categorical cross-entropy is used as the loss function. The loss optimizer utilized is Adam with a learning rate of 0.001. The total number of model parameters is 3601939.

Results and discussions
This section contains the experimental results of machine learning models with respect to different perspectives. The performance of models is evaluated using imbalanced and balanced datasets. BiLTCN and DT are both evaluated as feature extractors, as well as classifiers. The performance of the proposed DT-BiLTCN is compared with BiLTCN as well regarding the feature extractor.

Experiment setup
All experiments are performed using HP EliteBook 8440p with CPU 2.53GHz 2 cores, Intel core i5, having a RAM of 4.00 GB. The Python TensorFlow tool [49] is utilized for proposed model building and performance evaluations. Data splitting helps to optimize hyperparameters of models, estimate the model generalization performance, and avoid model overfitting [50]. We have split the maternal health data into 0.8 for training purposes and 0.20 for testing purposes. The random state instance value is 42 for data spiting. The utilized evaluation metrics are recall, precision, accuracy, F1 score, and ROC curve accuracy score values.

Performance metrics
The accuracy shows the total number of accurate predictions from the predictive model out of all the predictions. It is calculated as where TPR is the true positive rate and shows correctly identified by our predictive model, FPR is the false positive rate and incorrectly identified positive samples, TNR is the true negative rate showing the correctly predicted negative samples and FNR is the false negative rate and indicates the incorrectly identified negative samples. Precision is the proportion of accurately predicted samples by the proposed model to all expected positive data samples. It is calculated using Recall shows the preciseness of the model and is calculated using The F1 sore performance metric is based on the combination of recall and precision and is calculated by taking the harmonic mean of recall and precision.
The ROC curve accuracy performance metric is the probability curve that shows the TPR in comparison to FPR at different threshold values. The higher values of the ROC, the superior the proposed learning technique classification performance.

Results of machine learning models using original features
Initial experiments are carried out using the imbalanced data with different train and test split ratios to analyze the performance of deployed models. SVM, ETC, LR, and DTC are employed for these experiments, and results are provided in Table 5. Among the employed models, the DTC achieves the highest accuracy of 80 each for 80:20 and 85:15 train-test split ratios. Due to an imbalanced dataset, predominantly the models perform poorly for maternal health risk prediction. Further experiments are performed using the SMOTE balanced dataset for making equal samples for three classes of the dataset and results are displayed in Table 6. Indicating an accuracy of 84%, BiLTCN outperforms other models when used with the balanced dataset. It is a high individual accuracy as compared to models' performance with the imbalanced dataset, so for further experiments, only the balanced dataset is used.
Performance evaluation of BiLTCN. Experiments are performed to analyze the accuracy and other performance metrics for the proposed BiLTCN model. Epoch-wise results regarding the training accuracy and loss are presented in Table 7 indicating an average accuracy of 0.88 for the BiLTCN. Experimental results in Table 8 demonstrate the classification report for each maternal health risk category. The low-risk level category (0) has an 80% precision score, 82% recall score, and 81% F1 score. The medium risk level category (1) has an 80% precision score, 76% recall score, and 78% F1 score. The high-risk level category (2) has a 90% precision score, 89% recall score, and 90% F1 score. Averaged scores for precision, recall, and F1 are also provided. The precision score for micro, macro, and weighted average is 83% each while the samplesaverage score is 82%. Averaged scores for recall and F1 also correspond to the precision scores and show better results using the balanced dataset.
The BiLTCN approach evaluation results with hyperparameter tuning are demonstrated in Table 9. The 50 epochs are used for model training. The training accuracy achieved by the model is 88% while the testing accuracy, precision, recall, and F1 scores are 84% each. BiLTCN obtains a ROC accuracy (micro) score of 97% for maternal health risk prediction.
The ROC accuracy curve analysis of the BiLTCN approach is examined in Fig 10.It shows that the low-risk class achieves a 95% ROC accuracy, the mid-risk class achieves a 94% ROC accuracy, and the ROC accuracy for the high-risk class is 99%. These results show that the BiLTCN approach is better than other employed machine learning models for predicting maternal health risks.
The results of the BiLTCN are summarized in Fig 11 regarding the average recall, precision, F1 score, and ROC. The higher the ROC accuracy score, the better the classification model. The micro-ROC accuracy score of the employed model is 97% which demonstrates the best performance score.

Performance analysis using proposed DT-BiLTCN feature engineering
The performance analysis with the proposed DT-BiLTCN technique is carried out in this section. Experiments are performed using both the imbalanced and balanced datasets. Table 10 shows the results with the imbalanced dataset. Results indicate that SVM shows superb performance as compared to other models even with the imbalanced dataset using the proposed approach and obtains a 95% accuracy. The feature set is extracted by using DT-BiLTCN which is input to machine learning models for maternal risk prediction. Table 11 shows the comparative analysis of the accuracy score among the applied learning techniques. Results show that using the DT-BiLTCN-based features, the performance of the models is substantially improved. The highest accuracy is obtained by the SVM which is 98%. Similarly, its precision, recall, and F1 scores are also 98% each which shows superior performance as compared to other models.
The category-wise classification report of the SVM is given in Table 12. Results show that SVM achieves a 100% precision for the low-risk class, while 98% and 97% for medium-risk and high-risk categories of maternal health risks. Recall and F1 scores also correspond to high performance for all categories of maternal health risk which indicates the superiority of the proposed DT-BiLTCN feature engineering approach.
In addition to using the proposed DT-BiLTCN approach, DT and BiLTCN are also used as feature engineering approaches to carry out the performance comparison and the results are provided in Table 13. Results suggest that models perform superbly better when trained on the features extracted using the DT model. The class probabilities used by the DT are used to train the model, which improves their learning process and elevates their maternal health risk prediction performance. Results using DT features are far better than using the BiLTCN features.
Despite the better performance of machine learning models with DT features, the performance of DT-BiLTCN features is still better, as shown in Fig 12. Results show that the highest accuracy of 98% is achieved when DT-BiLTCN features are used. Since the proposed approach combines features from better-performing DT and BiLTCN models, it contains an appropriate feature set that improves the training of the models and elevates their performance.

Validation of proposed DT-BiLTCN
The proposed approach is validated using 10-fold cross-validation using the superior performing SVM model which obtained a 98% accuracy score. In addition, the results of all employed models are also provided in Table 14. As per the given results, the superior performance of SVM is validated where it obtains a 94% accuracy with a standard deviation of only ±0.02.

Comparison of probability-based and original features
The original maternal health dataset features are small in size and are not linearly separable to a great extent. This non-linearly separable behavior results in poor performance of machine learning models. The proposed feature engineering technique is utilized to design the probability-based feature set by using DT and BiLTCN. The generated feature set is more linearly separable and distinguishes the target classes with a higher margin.
The experimental results demonstrate that by using the DTBiLTCN technique, the performance of the machine learning models is enhanced greatly. A visual representation of feature distribution is analyzed to verify the enhanced performance, as given in Fig 13. Fig 13a demonstrates that the dataset with the original feature is not linearly separable. However, by using the proposed feature engineering technique the feature space becomes more linearly separable as We also determined the computational cost for the applied machine learning model with the original and probability-based feature set. The analysis demonstrates that the proposed approach is significant in terms of accuracy as well as efficiency. The computational cost for the proposed technique is significantly better than other applied techniques. The highest accuracy is achieved with the lowest computational time, as analyzed in Table 15.

Performance comparison with existing approaches
The comparative analysis of the existing state-of-the-art approaches in the context of maternal health risk prediction is demonstrated in Table 16. For this purpose, recent research works are compared with the proposed approach. The selected works include both machine learning and deep learning models. Performance analysis corroborates the superior performance of

Discussions
This study proposes a novel deep neural network architecture that leverages DT, Bi-LSTM, and TCN networks to make an ensemble for feature engineering. Based on the individual superior performance of these models, they are combined for feature engineering to predict maternal health risk during pregnancy. Empirically, the performance of prediction probability is found superior to that of original features. So, the feature set comprises the prediction probabilities from these models. For class imbalance, SMOTE is utilized. Experiments are performed using imbalanced and balanced data, with original features from the data and the features extracted using the proposed approach. Extensive experiments and analyses show that the proposed approach is superior regarding maternal health risk prediction. SMOTE-balanced data tend to show better results for a few models while the performance of the proposed approach is substantially elevated. Experimental results using DTC, LR, ETC, KNN, RFC, and SVM with the proposed DT-BiLTCN feature engineering approach show that accuracy, precision, recall, and F1 scores of 0.98 can be obtained when SVM is used with the proposed approach. This performance is significantly better than using the original features from the dataset. In addition, k-fold cross-validation results validate this performance. Performance comparison with existing state-of-the-art approaches further corroborates these results.

Conclusions and future work
Maternal health analysis and risk detection during pregnancy are very important to reduce the probability of health complications during pregnancy, childbirth, and postpartum periods. This study presents a novel approach for the automatic prediction of maternal health risks associated with pregnancy. A novel approach DT-BiLTCN is proposed for feature extraction which provides better features for machine learning models' effective training.Of the employed models, SVM shows an excellent accuracy of 98% for low, medium, and high-risk categories of maternal health risks. When compared with the performance of DT and BiLTCN features, the proposed approach outperforms them regarding ROC and other metrics. Regarding, maternal health exploratory data analysis, the diastolic and systolic blood pressure, heart rate, and age of pregnant women are found to be the key health factors that can cause high-risk levels during pregnancy. We believe that the findings of this study are helpful for doctors, maternal health professionals, and the government to predict the risk factors associated with maternal mortality and can be used to alleviate such outcomes.