Skip to main content
Advertisement

< Back to Article

Fig 1.

A summary of the study workflow.

a) Data cleaning, where the datasets in tabular form were merged and instances not meeting the mentioned criteria were removed and the remining instances were split into a development and a test set. b) Feature filtering and selection, where the five most relevant features associated with the outcome variable were selected for the learning process. c) Internal validation, where two learners compete to predict the occurrence of plasma leakage based on the selected sets of features. The learners were then trained using the whole development set to create an ensemble model based on average stacking as a final model. d) Evaluation and interpretation, where the final model is evaluated using the test set and the contribution of the selected features were elucidated using the Shapley method.

More »

Fig 1 Expand

Table 1.

Summary statistics of cohort grouped by dengue and non-dengue patients.

More »

Table 1 Expand

Table 2.

Benchmarking results for random forest (RF), gradient boosting machine (LightGBM), and their average stacking (Ensemble) using the nested cross validation on the development set.

More »

Table 2 Expand

Fig 2.

An overview of the machine learning results.

a) ROC curve for gradient boosting machine (LightGBM), random forest (RF) and their average stacking (Ensemble) using varying prediction probability thresholds from 0 to 1 (step size = 0.01). b) PR curves for the learners using the same thresholding. c) ROC for the final model, DENV5F-AS: 5-Featured Average Stacking of LightGBM and RF, on the test set (AUC = 0.80). d) PR curve for the final model on the test set (PRAUC = 0.69). e) Confusion matrix for DENV5F-AS on the test set (green: all instances, orange: earliest instances of patients), percentages in each direction provide the proportion of instances of each row or column. PL: plasma leakage and noPL: no plasma leakage. The colour intensity is proportional to the ratio of the instances of a matrix cell to the total number of instances. The percentages indicate proportion of classified instances in each cell to the instances in neighbouring cells by row or column, f) Prediction probabilities during the observation period for patients in each class where each point indicate an instance and the connected points indicate that the instances belong to the same patient. The trends of the predicted probabilities for each class are shown by fitting linear regression.

More »

Fig 2 Expand

Table 3.

Prediction performance of the final model to predict plasma leakage on test set.

More »

Table 3 Expand

Fig 3.

An overview of the model fairness of the final model: DENV5F-AS.

a) ROC curves for the dengue and Non-dengue subsets of the test set. b) Confusion matrix (in blue) for the DENV5F-AS on the dengue and non-dengue subsets of the test set, percentages in each direction provide the proportion of instances that belong to each row or column of the confusion matrix. PL: plasma leakage and noPL: no plasma leakage. The colour intensity is proportional to the ratio of the instances of a matrix cell to the total number of instances, c) confusion matrix (in orange) only including the earliest instance of patients for dengue and non-dengue subsets of the test set.

More »

Fig 3 Expand

Fig 4.

SHAP decision plot for DENV5F-AS on the test set for correct and incorrect predictions.

The features are sorted from top to bottom by their mean absolute SHAP values in correct predictions. Each point represents an instance and the connected lines across features belong to the same instance. For each feature the points are scattered perpendicular to the horizontal line to minimise overlapping. Feature values are normalised to [0 1] by the min-max normalisation method and colour-coded (grey points are missing values), outliers were squished to the range using Hampel filter. The colour of each line is the same as the value of the feature connected to in downwards direction. X-axis is the SHAP value computed for each instance in terms of log-odds of the predicted plasma leakage probabilities. The plot is centred on the x-axis at the baseline level of SHAP value determined by the algorithm.

More »

Fig 4 Expand