Machine learning-based glucose prediction with use of continuous glucose and physical activity monitoring data: The Maastricht Study

Background Closed-loop insulin delivery systems, which integrate continuous glucose monitoring (CGM) and algorithms that continuously guide insulin dosing, have been shown to improve glycaemic control. The ability to predict future glucose values can further optimize such devices. In this study, we used machine learning to train models in predicting future glucose levels based on prior CGM and accelerometry data. Methods We used data from The Maastricht Study, an observational population‐based cohort that comprises individuals with normal glucose metabolism, prediabetes, or type 2 diabetes. We included individuals who underwent >48h of CGM (n = 851), most of whom (n = 540) simultaneously wore an accelerometer to assess physical activity. A random subset of individuals was used to train models in predicting glucose levels at 15- and 60-minute intervals based on either CGM data or both CGM and accelerometer data. In the remaining individuals, model performance was evaluated with root-mean-square error (RMSE), Spearman’s correlation coefficient (rho) and surveillance error grid. For a proof-of-concept translation, CGM-based prediction models were optimized and validated with the use of data from individuals with type 1 diabetes (OhioT1DM Dataset, n = 6). Results Models trained with CGM data were able to accurately predict glucose values at 15 (RMSE: 0.19mmol/L; rho: 0.96) and 60 minutes (RMSE: 0.59mmol/L, rho: 0.72). Model performance was comparable in individuals with type 2 diabetes. Incorporation of accelerometer data only slightly improved prediction. The error grid results indicated that model predictions were clinically safe (15 min: >99%, 60 min >98%). Our prediction models translated well to individuals with type 1 diabetes, which is reflected by high accuracy (RMSEs for 15 and 60 minutes of 0.43 and 1.73 mmol/L, respectively) and clinical safety (15 min: >99%, 60 min: >91%). Conclusions Machine learning-based models are able to accurately and safely predict glucose values at 15- and 60-minute intervals based on CGM data only. Future research should further optimize the models for implementation in closed-loop insulin delivery systems.


Introduction
The increasing prevalence of diabetes entails an increase in debilitating complications, such as retinopathy, neuropathy, and cardiovascular disease [1][2][3]. Maintaining plasma glucose levels within the reference range is essential for the prevention of diabetes-related complications, which are generally attributable to chronic hyperglycaemia, although hypoglycaemia has been suggested to contribute to cardiovascular disease risk as well [3][4][5]. One of the most promising developments to minimize hyperglycaemia and hypoglycaemia-and, hence, to increase time in range-in individuals with diabetes who require insulin treatment is a closed-loop insulin delivery system (also known as the artificial pancreas). Such a system integrates continuous glucose monitoring (CGM), insulin (with or without glucagon) infusion, and a control algorithm to continuously regulate blood glucose levels [6,7]. Multiple studies have shown the merit of incorporating the artificial pancreas into clinical care of individuals with type 1 or type 2 diabetes [8,9].
Despite prior efforts, there are still numerous points that need to be addressed in order to improve the individual components of closed-loop systems [6,10]. With regard to CGM, this includes overcoming sensor delay (i.e., the inherent~10-minute discrepancy between interstitially measured and actual plasma glucose values), and sensor malfunctions (i.e., periods during which no glucose values are recorded) [6,10,11]. Continuous glucose prediction is a potentially viable strategy to both handle sensor delay and bridge periods of sensor malfunction. The use of machine learning has yielded encouraging glucose prediction accuracy results in relatively small study populations (mostly individuals with type 1 diabetes) or in silico studies, as extensively reviewed elsewhere [12]. Large, human-based study populations are now needed to reliably assess to what extent and within what time interval (i.e., prediction horizon) glucose values can be accurately predicted by use of machine learning. Additionally, incorporation of physical activity, which is considered an important factor for glucose control in daily life, could further improve glucose prediction [6].
In this study, we investigated to what extent glucose values can be accurately predicted at intervals of 15 and 60 minutes by a machine learning model that has been trained with a sliding time window of glucose values preceding the predicted values at a fixed interval. Additionally, we studied whether glucose prediction can be further improved by incorporation of accelerometer-measured physical activity, and to what extent the results differ in a subgroup analysis of individuals with type 2 diabetes only. For this, we used a large population of individuals with either normal glucose metabolism (NGM), prediabetes, or type 2 diabetes who simultaneously underwent CGM and continuous accelerometry during a one-week period. Last, we used the publicly available OhioT1DM Dataset to explore whether CGM-based prediction models would translate to individuals with type 1 diabetes, the primary target population for closed-loop insulin delivery.

Study population and design
We used data from The Maastricht Study, an observational, prospective, population-based cohort study. The rationale and methodology have been described previously [13]. In brief, The Maastricht Study focuses on the aetiology, pathophysiology, complications and comorbidities of type 2 diabetes, and is characterized by an extensive phenotyping approach. All individuals aged between 40 and 75 years and living in the southern part of the Netherlands were eligible for participation. Participants were recruited through mass media campaigns and from the municipal registries and the regional Diabetes Patient Registry via mailings. For reasons of efficiency, recruitment was stratified according to known type 2 diabetes status, with an oversampling of individuals with type 2 diabetes. In general, the examinations of each participant were performed within a time window of three months. From 19 September 2016 until 13 September 2018, participants were invited to also undergo CGM [14]. During this period, a selected group of recently included participants were invited to return for CGM. In these participants only, there was a median time interval of 2.1 years between CGM and all other measurements. The present report includes cross-sectional data of the 851 participants who had at least 48h of CGM data available and were classified with NGM, prediabetes, or type 2 diabetes. The Maastricht Study has been approved by the institutional medical ethical committee (Medisch-ethische toetsingscommissie aZM/UM [METC]; NL31329.068.10) and the Minister of Health, Welfare and Sports of the Netherlands (Permit 131088-105234-PG). All participants gave written informed consent.

Continuous glucose monitoring
The rationale and methodology of CGM (iPro2 and Enlite Glucose Sensor; Medtronic, Tolochenaz, Switzerland) have been described previously [14]. In brief, the CGM device was worn abdominally and recorded subcutaneous interstitial glucose values (range: 2.2-22.2 mmol/L) every five minutes for a seven-day period. For calibration purposes, participants were asked to perform self-measurements of blood glucose four times daily (Contour Next; Ascensia Diabetes Care, Mijdrecht, the Netherlands). Participants were blinded to the CGM recording, but not to self-measured values. Diabetes medication use was allowed and no dietary instructions were given. We only included individuals with at least 48h of CGM, but excluded the first 24h of CGM from analysis because of insufficient calibration. For the glucose prediction analyses, all remaining glucose data points were used. We additionally calculated mean sensor glucose, standard deviation (SD), and coefficient of variation (CV) with the use of Glycemic Variability Research Tool (GlyVaRT; Medtronic) software.

Accelerometry
As described previously, daily physical activity was measured with use of the triaxial activPAL3 accelerometer (PAL technologies; Glasgow, United Kingdom) [13,15]. The accelerometer was, just as the CGM device, attached during the first research visit; participants wore the accelerometer on the front of the right thigh for eight consecutive days. No physical activity instructions were given. PAL Software Suite version 8 (PAL technologies) was used to convert the event-based accelerometry data files into 15-second interval data files. We used the composite of X, Y, and Z accelerations for each 15-second interval as the measure of physical activity.

Assessment of participant characteristics
As described previously [13], we classified glucose metabolism status (GMS) as either NGM, prediabetes, or type 2 diabetes based on both a standardized 2-hour 75 gram oral glucose tolerance test and use of glucose-lowering medication [16]. We assessed medication use as part of a medication interview. Additionally, we determined smoking status and history of diabetes based on questionnaires, measured weight and height-to calculate body mass index (BMI)and office blood pressure during a physical examination, and measured HbA 1c as well as lipid profile in fasting venous blood.

Dataset construction
An overview of data preprocessing, model development, and model evaluation is given in Fig 1. In order to train our models in predicting future glucose values, we constructed two separate datasets (Fig 1, panel a). The first dataset consisted of only the participants' six-day, fiveminute interval CGM data (n = 851). The second dataset consisted of both CGM and accelerometry data (n = 540). To synchronize CGM (determined at 5-minute intervals) and accelerometry data (determined at 15-second intervals) in the second dataset, we linearly interpolated glucose values between two glucose data points with a frequency of 15 seconds. Consistent and aligned frequency intervals across these parameters are a statistical precondition for this type of model development [17]. The study populations were randomly split into a training (70%), tuning (10%), and evaluation (20%) dataset such that data from a given individual were present only in one set. The training set was used to train the proposed models. The tuning set was used to iteratively improve the models by selecting the best model architectures and hyperparameters. Finally, the best models were evaluated on the independent evaluation set that was retained during model development.

Model development and design
Our proposed predictive model operates sequentially over CGM and accelerometry data ( Fig  1, panel b). At each individual time point, 30 minutes of prior time series data were provided to the statistical model (e.g., six CGM-based glucose values), based on which it predicted glucose values at specified time intervals. For this study, we set these time intervals at 15 and 60 minutes. The nature of this prediction task can be solved by a variety of statistical and machine learning models. In the current study, we assessed autoregressive integrated moving average, support vector regression, gradient-boosting systems, shallow and deep multi-layer perceptron neural networks, and several recurrent neural network (RNN) architectures, including classical RNN [18,19], gated recurrent units [20], long-short term memory (LSTM) networks [21], and all of its bi-directional variants [22,23] (S1 File).

Model selection and training
The classical RNN architecture had superior performance at the 15-minute prediction interval ( . Considering the performance of the LSTM network at a 15-minute prediction interval was nearly as good as the classical RNN, we selected the multi-task LSTM network among several alternatives as architecture of choice to continue our investigations(S1 File and Table 1). This architecture runs sequentially over time series data and is able to implicitly model the historical context of an individual by modifying an internal state through time. Specifically, we designed Overview of data preprocessing, model development and evaluation. Data was used from The Maastricht Study, an observational population-based cohort that comprises individuals with normal glucose metabolism (NGM), prediabetes, or type 2 diabetes (panel A). We included 851 individuals who underwent continuous glucose monitoring (CGM), most of whom simultaneously wore an accelerometer to assess physical activity (X, Y, and Z accelerations). Models developed with the long-short term memory (LSTM) architecture were trained in predicting glucose levels at 15-and 60-minute intervals with either CGM data only (1) or both CGM and accelerometer data (2) (panel B). Finally, model performance was evaluated by glucose profile analysis, performance metrics (root-mean-square error [RMSE]; Spearman's correlation coefficient [rho]; proportions), and clinical error grids (panel C).
https://doi.org/10.1371/journal.pone.0253125.g001 this architecture to predict both time intervals simultaneously, often referred to as "multi-task learning", which aims to share knowledge amongst prediction tasks.
Next, we evaluated a broad spectrum of hyperparameter combinations for this network (S1 Table). This resulted in a multi-task LSTM architecture, consisting of three layers, including a dropout layer with a total of 56-104 neurons (S2 Table). During training, we used exponential learning-rate decay via the Adam optimization scheme [24]. The best validation results were achieved by use of an initial learning rate with a decay of 0.001 every 1,000 training steps, with a batch size of 1024, and a back-propagation through a time window of 30 minutes. This defines the amount of historic data the model uses, which in our case translates to six (first dataset) or 120 (second dataset) glucose data points, for the model to provide a prediction. The loss function during training was the mean average of the mean-squared error function of all predictions. The maximum amount of epochs was 50.000 with an early stopping criterion (based on 20% hold-out data) set to 250 epochs. We performed data preprocessing, model development, selection, and training using Python programming language (version 3.7.1) with the use of packages Numpy (version 1.

Translation of the prediction models to the OhioT1DM Dataset
We used data from the OhioT1DM Dataset to explore whether our CGM-based prediction models would translate to individuals with type 1 diabetes. The OhioT1DM Dataset is freely available for scientific purposes and contains data of 6 individuals with type 1 diabetes who were all using insulin pump therapy and CGM [25]. The participants provided interstitial glucose values every five minutes for an eight-week period. First, in order to also include 30-minute prediction, we retrained our main CGM-based models on the main study population with identical hyperparameters and settings (S2 Table). Then, we evaluated the main CGM-based model on the test portion of the OhioT1DM Dataset (20%). Next, we aimed to optimize our main CGM-based model by training it on the train portion of the OhioT1DM Dataset. Specifically, we trained the model using an Adam optimizer with a learning rate of 10 −4 , a batch size of 1024, a maximum of 10.000 epochs and an early stopping criterion (based on 20% of the training data) set to 100 epochs. Last, we evaluated this optimized model on the test portion using performance metrics and safety error grids, as described previously.

Model evaluation and statistical analysis
Model evaluation was performed in the independent evaluation sets of individuals that were not used during model development (Fig 1, panel c). We employed several metrics to assess the performance of our models: root-mean-square error (RMSE), proportion of predicted values within 5% or 10% of actual glucose values, and Spearman's rank correlation coefficient (rho) (S2 File). Bootstrapping was performed to obtain 95% confidence intervals for each of these metrics [26]. In addition, we used error grids that are classically used for assessment of blood glucose monitor safety (i.e., surveillance error grid, Parkes error grid) to evaluate the safety of our glucose prediction models [27,28]. Last, we performed several sensitivity analysis in our main study population by stratifying model performance for: (1) GMS (i.e., separate results for NGM and prediabetes); (2) day (06.00 to 24.00h) and night (24.00 to 06.00h); and (3) low or high glucose variability, defined as the 97.5th percentile of CGM-assessed SD in individuals with NGM (SD > 1.37 mmol/L) [14].
Normally distributed data are presented as mean ± SD, non-normally distributed data as median and interquartile range, and categorical data as n (%). Statistical analyses were performed using the Statistical Package for Social Sciences (version 25.0; IBM, Chicago, Illinois, USA) and the Python programming language (version 3.7.1).

Main study population characteristics
In total, 896 individuals underwent CGM as part of The Maastricht Study's extensive phenotyping approach. We included participants with at least 48h of CGM data and either NGM, prediabetes, or type 2 diabetes. This resulted in the final study population of 851 individuals. Of this population, 540 participants (63.5%) simultaneously underwent CGM and accelerometry. Table 2 shows the overall and type 2 diabetes-stratified characteristics of the two study populations (CGM-based as well as CGM-and accelerometry-based glucose prediction). The overall participant characteristics of both populations were generally comparable with regard to age, sex, BMI, glycaemic indices, blood pressure, and lipid profile, although the latter contained fewer participants with prediabetes or type 2 diabetes. Additionally, the participants with type 2 diabetes in the CGM-and accelerometry-based glucose prediction population were more often newly diagnosed with type 2 diabetes. Accordingly, these participants less often used glucose-lowering medication. Participant characteristics of the NGM and prediabetes subgroups are described in S3 Table. Overall performance of machine learning-based glucose prediction We trained two machine learning models (i.e., CGM-based; CGM-and accelerometry-based) in predicting glucose levels at 15-and 60-minute intervals. Visually, both models appeared capable of accurately predicting the real glucose profiles, as illustrated by the representative examples in S1 and S2 Figs. Next, we assessed the performance of our models in our evaluation datasets with a variety of metrics, including an average error term (RMSE), the proportion of predictions within 5% or 10% deviation of the actual value, and correlation (rho). The evaluation datasets comprise 20% of the original or stratified study populations and thus vary in sample size (n = 13-170).
Overall, our models demonstrated high prediction accuracy, supported by low RMSE values and high proportions of predicted glucose values within 5% and 10% deviation (Table 3).   Model performance in the type 2 diabetes subgroup was generally lower compared to the overall group, except for correlation coefficients, which were often higher in individuals with type 2 diabetes. This phenomenon can be largely attributed to the lower correlation coefficients of individuals with NGM and prediabetes (S4 Table), which is caused by range restriction (i.e., smaller glucose ranges attenuate the correlation coefficients) [29]. Consequently, the correlation coefficients are valid for the comparison of CGM-based glucose prediction to CGM-and accelerometry-based glucose prediction, but not for comparison of the overall study population to the type 2 diabetes subgroup. In addition, we observed short-to-moderate time lags for the 15-and 60-minute predictions (S5 Table).
In general, incorporation of accelerometry data in the models only slightly improved performance metrics at both prediction intervals (Table 3). S4 Table shows the model performance in NGM and prediabetes subgroups. Glucose prediction was most precise in individuals with NGM. Of note, the ML-based models substantially outperformed a naive approach that used t 0 as predicted glucose value (S6 Table,

Safety evaluation with clinical error grids
We assessed the safety of our machine learning-based glucose prediction using two clinical error grids (i.e., surveillance and Parkes error grids). Fig 2 depicts the safety results for individuals with type 2 diabetes according to the surveillance error grid. At the 15-minute interval, almost all predictions (>99.9%) were clinically safe (i.e., a risk score between 0 and 1.0) (Fig 2,  panels A and B). At the extended prediction window of 60 minutes, clinical safety was slightly lower (98.4-99.2%) (Fig 2, panels C and D). Parkes error grid assessment yielded similar results (S5 Fig). Of note, less accurate predictions were more often in the vertical B-D zones than in the horizontal B-E zones (e.g., S4 Fig, panel C: 11.80% versus 4.24%), which suggests a model tendency to underestimate rather than overestimate actual glucose values, the latter of which being more dangerous.

Additional analyses
To further obtain insights into our model predictions, we assessed performance metrics stratified by day and night (S7 Table). Fifteen-minute predictions did not materially differ between day and night. By contrast, accuracy of 60-minute predictions was lower during the day than Table 3 at night. In addition, we stratified the results by high or low glucose variability (i.e., SD cut-off of 1.37 mmol/L) (S8 Table). Model performance was slightly lower at higher glucose variability, at both time intervals of 15 and 60 minutes.

Translation of the prediction models to the OhioT1DM Dataset
The prediction accuracy of the CGM-based model that was developed with our main study population was moderate in individuals with type 1 diabetes (RMSEs at 15 Table). Accordingly, clinical safety was substantial as shown by the high percentages of clinically safe predictions (15-minute: >99%, 30-minute: >97%, and 60-minute: >91%; Fig 3).

Discussion
In this study with 851 individuals and almost 1.4 million glucose measurements, we investigated whether glucose values can be accurately predicted by using machine learning-based models that utilise recently measured CGM and physical activity data with the prospect of improving closed-loop insulin delivery systems. Our study has several important findings and unique characteristics. First, the machine learning-based models are capable of accurately predicting the actual glucose profiles at 15 minutes, as reflected by several objective performance metrics (e.g., RMSE, rho; Table 2) and visual illustrations (S1 and S2 Figs). Despite prediction accuracy being moderately lower at 60 minutes, more than 98% of the predicted values remained sufficiently accurate to be deemed clinically safe based on surveillance error grids (Fig 2). Second, glucose prediction only improved slightly when accelerometer-assessed physical activity data was incorporated in the models. Third, translation of our CGM-based glucose prediction models to individuals with type 1 diabetes yielded encouraging results (i.e., ample prediction accuracy and clinical safety).
Although most research has thus far focused on type 1 diabetes [12], several efforts have been made to use machine learning for glucose prediction in individuals with type 2 diabetes [30][31][32][33][34]. Most of these studies assessed technical aspects of glucose prediction in relatively small (n = 1 to 50) or even virtual, in silico populations. Such studies provide valuable comparisons of models, but show suboptimal and highly variable performance in predicting glucose values. To our knowledge, this is the first study to report this level of performance in a large, populationbased sample of individuals with NGM, prediabetes, or type 2 diabetes. Our CGM-based models were able to accurately predict glucose values at 15 (RMSEs, overall/type 2 diabetes: 0.19/0.29 mmol/L) and 60 minutes (RMSEs, overall/type 2 diabetes: 0.59/0.70 mmol/L). These results surpass previously reported RMSE values for a sample of 50 individuals with type 2 diabetes, which were 0.65 and 1.50 mmol/L for 15-and 60-minute CGM-based glucose prediction, respectively [34]. We expect this difference to, in part, stem from our much larger sample size. To our knowledge, our exploratory translation to individuals with type 1 diabetes (S9 Table) showed that our models perform equally well as recent publications in the field [12,[35][36][37][38]. For example, the best performing model of the Blood Glucose Level Prediction Challenge 2018, which was also based on a LSTM architecture as well as was trained on and evaluated in the OhioT1DM Dataset, reported 30-minute and 60-minute RMSEs of 1.05 and 1.74 mmol/L [35]. Additionally, Kriventsov et al. recently described large-scale application of glucose prediction in a smartphone app (Diabits) and reported a comparable RMSE at 30 minutes (1.04 mmol/L) [36]. We anticipate that further technical development of our prediction models, while using a larger sample of individuals with type 1 diabetes, will advance performance even more.
We integrated physical activity, which we assessed via accelerometry, into our glucose prediction model, because of its short-and long-term effects on daily glucose patterns. Whereas an acute bout of physical activity can either decrease or increase serum glucose levels, prolonged exercise improves insulin sensitivity, and thus insulin-stimulated glucose uptake [39]. While it should be noted that CGM-and accelerometry-based glucose prediction yielded larger improvements relative to CGM-based glucose prediction for the 60-minute interval, most notably during the day (S7 Table) and in individuals with higher glucose variability (S9 Table), incorporation of physical activity generally only marginally improved glucose The risk score values translate to the following degrees of risk: 0-0.5, none; 0.5-1.0, slight (lower); 1.0-1.5, slight (higher); 1.5-2.0, moderate (lower); 2.0-2.5, moderate (higher); 2.5-3.0, great (lower); 3.0-3.5, great (higher); > 3.5 extreme [27]. https://doi.org/10.1371/journal.pone.0253125.g003 prediction. This can be explained by the observation that the models based on CGM data only already performed very well, which limits the ability to achieve additional improvements [40]. Also, the effect of physical activity on serum glucose levels is relatively small in people with betacell function that is either normal or only mildly deficient. Given the absence of pancreatic glucoregulation in individuals with type 1 diabetes, it is conceivable that incorporation of accelerometry data leads to more substantially improved model performance in this patient group [40], which, at present, we were not able to further explore. In addition, a time interval of 15 or 60 minutes could be too short to incorporate long-term physical activity effects into the prediction model.
The closed-loop insulin delivery system has been shown to improve glycaemic control in individuals with type 1 or type 2 diabetes [8,9,41]. Nevertheless, several aspects of the artificial pancreas require further enhancement [6,10]. Our results demonstrate that machine learningbased glucose prediction has the promise of being a valid and safe strategy to both overcomẽ 10-minute sensor delay and bridge prolonged periods of sensor malfunction. Not only are more than 99% of the predicted glucose values in clinically safe zones (i.e., Parkes error grid zone A and B), the model also tended to slightly underestimate rather than overestimate the actual glucose values. In case the prediction model were to be implemented, this would further reduce the risk of iatrogenic hypoglycaemia. Nevertheless, future research is needed to assess whether incorporation of these prediction models in a closed-loop insulin delivery system safely improves glycaemic control.
This proof-of-principle study has several strengths and limitations. Strengths are 1) the largest well-characterized, population-based study sample thus far, which ensured sufficient statistical power; 2) the unique large-scale combination of CGM and continuous accelerometry, which enabled us to study to what extent incorporation of data on physical activity would improve prediction in this population; 3) the gold-standard assessment of GMS, which allowed for the comparison of performance in NGM, prediabetes and type 2 diabetes; 4) the broad and solid evaluation of various statistical and machine learning architectures for this prediction task; and 5) result robustness, as reflected by the consistency of several statistical and clinical performance metrics.
Our research had certain limitations. First, the main study population comprised individuals with NGM, prediabetes, or type 2 diabetes, who are generally not the target population for closed-loop insulin delivery systems. We, therefore, exploratively investigated whether our prediction models would translate to individuals with type 1 diabetes using the OhioT1DM Dataset, which yielded encouraging results. Nevertheless, we underscore the importance of extensive evaluation of the models in a larger sample of individuals with type 1 diabetes, insulin-treated type 2 diabetes, or both. Second, we were unable to factor in other important elements pertaining to glycaemic control (e.g., diet or medication use) [6]. In automated, selfregulatory closed-loop systems, utilization of these kinds of data requires manual input, which is less convenient and reliable than CGM. In addition, since glucose prediction was only slightly improved by incorporating physical activity, we expect relatively little gain from including such factors into our models, at least in individuals with type 2 diabetes. However, given the results of several small studies that have incorporated diet and medication use [12], we acknowledge that this may not hold true for individuals with type 1 diabetes. In this regard, large-scale studies are required to reach more definitive conclusions. If diet, medication use, or other factors were to be incorporated, it is necessary to evaluate whether LSTM remains the best-performing machine learning architecture.

Conclusion
In this study, we show that our machine learning-based models are able to accurately and safely predict glucose values for up to 60 minutes in individuals with, NGM, prediabetes, or type 2 diabetes. In addition, translation of our prediction models to individuals with type 1 diabetes showed encouraging results. We observed particularly high precision at a 15-minute prediction window, which is a clinically relevant timespan to align interstitially measured glucose values by continuous glucose measurement systems with actual plasma glucose values. As such, the prediction model can be used to improve closed-loop insulin delivery systems by overcoming sensor delay. In addition, longer prediction intervals may be used to safely bridge periods of sensor malfunction. Last, our current findings question the use of accelerometry to substantially improve prediction. Future research should validate our findings by replicating the results in a larger sample of individuals with type 1 diabetes and studying the effects of implementing the prediction model in a closed-loop insulin delivery system.