Glycemic-aware metrics and oversampling techniques for predicting blood glucose levels using machine learning

doi:10.1371/journal.pone.0225613

Fig 1.

A sample of a few hours of CGM data from one patient.

The horizontal red line indicates the boundry between normoglycemia and hyperglycemia according to [15]. Note (i) the two gaps in the trace, one shorter and one longer; and (ii) maximum possible sensor reading of 400 mg/dl, even though glucose levels can exceed this amount. This patient experienced hypoglycemia just after 1am followed by severe hyperglycemia later in the morning.

More »

Expand

Table 1.

Number of examples in each training set after processing the CGM traces with the sliding window technique, along with the start and end dates (randomly shifted) for each dataset.

More »

Expand

Fig 2.

Frequency histogram showing counts of CGM sensor readings for all patients in the training data.

Different colours indicate whether or not the sensor reading is normoglycemic or not.

More »

Expand

Fig 3.

Illustration of a decision tree used for regression.

Intermediate nodes represent tests of the features and leaf nodes are predictions for .

More »

Expand

Fig 4.

Illustration of a MLP with a single hidden layer of size five.

More »

Expand

Fig 5.

Illustration of SMOTE’s artificial example generation technique.

More »

Expand

Fig 6.

Clarke error grid analysis, reproduced from [28].

More »

Expand

Table 2.

Examples of feature vectors constructed from the CGM traces.

Features x₁ to x₂₄ are consecutive CGM sensor readings occurring over a period of 120 minutes; bg_t+30 is the glucose value observed 30 minutes after x₂₄. The pseudolabel for each example, which is only used if an oversampling method is employed, is also shown.

More »

Expand

Table 3.

Top five model and resampler combinations based on overall MARD.

More »

Expand

Table 4.

Top five model and resampler combinations based on alternative MARD metrics.

More »

Expand

Table 5.

Top five model and resampler combinations based on EGA metrics (excludes the dummy predictor).

More »

Expand

Table 6.

Friedman test p-values after testing for the null hypothesis that all classifiers perform equally well.

More »

Expand

Fig 7.

Examples of a prediction made by linear SVR for one patient.

The first 120 minutes of the plot (unfilled circles) are the inputs to the model; the last reading at 150 minutes is the prediction (unfilled) and what actually happened (filled). This figure depicts a Type A error.

More »

Expand

Fig 8.

Similar example to that depicted in Fig 7, but depicting a Type E error.

More »

Expand