Machine learning prediction of non-attendance to postpartum glucose screening and subsequent risk of type 2 diabetes following gestational diabetes

Objective The aim of the present study was to identify the factors associated with non-attendance of immediate postpartum glucose test using a machine learning algorithm following gestational diabetes mellitus (GDM) pregnancy. Method A retrospective cohort study of all GDM women (n = 607) for postpartum glucose test due between January 2016 and December 2019 at the George Eliot Hospital NHS Trust, UK. Results Sixty-five percent of women attended postpartum glucose test. Type 2 diabetes was diagnosed in 2.8% and 21.6% had persistent dysglycaemia at 6–13 weeks post-delivery. Those who did not attend postpartum glucose test seem to be younger, multiparous, obese, and continued to smoke during pregnancy. They also had higher fasting glucose at antenatal oral glucose tolerance test. Our machine learning algorithm predicted postpartum glucose non-attendance with an area under the receiver operating characteristic curve of 0.72. The model could achieve a sensitivity of 70% with 66% specificity at a risk score threshold of 0.46. A total of 233 (38.4%) women attended subsequent glucose test at least once within the first two years of delivery and 24% had dysglycaemia. Compared to women who attended postpartum glucose test, those who did not attend had higher conversion rate to type 2 diabetes (2.5% vs 11.4%; p = 0.005). Conclusion Postpartum screening following GDM is still poor. Women who did not attend postpartum screening appear to have higher metabolic risk and higher conversion to type 2 diabetes by two years post-delivery. Machine learning model can predict women who are unlikely to attend postpartum glucose test using simple antenatal factors. Enhanced, personalised education of these women may improve postpartum glucose screening.


Introduction
Gestational diabetes mellitus (GDM) is associated with adverse maternal and offspring outcomes both in the short and long-term. Incidence of type 2 diabetes (T2D) in women with history of GDM using real-world data can be 20-fold higher [1]. The conversion to T2D following the pregnancy seems to happen early in the postpartum period with highest risk within 3-6 years of index pregnancy; by 10-14 years, 50% of women have dysglycaemia [2,3]. GDM is also associated with at least two-fold greater risk for developing hypertension and cardiovascular disease (CVD) [1,4]. This highlights the importance of early identification of women at risk to implement any preventive strategies [5,6]. Most international guidelines recommend 75-g oral glucose tolerance test (OGTT) or HbA1c for all GDM women in the postpartum period, and then at least once every 1-3 years [7,8].
In 2015, the National Institute of Health and Care Excellence (NICE) changed the screening guidelines from every 3 years to every year and recommended a fasting plasma glucose (FPG) test at 6 to 13 weeks after delivery or HbA1c at 13 weeks onwards instead of OGTT [9]. Despite these recommendations and evidence of diabetes risk, postpartum testing for dysglycaemia (by any form of glucose testing; OGTT, FPG or HbA1c) is low and reaches only about 30% by year three following a GDM pregnancy [1,10]. While some centres were able to improve their uptake to 70% by dedicated coordinators, this has not been replicated by others and uptake remained poor [11,12]. System and patient barriers including the lack of awareness ('it does not affect me'), discomfort and duration of the test (especially for OGTT), other socio-economic factors and poor transition of information from secondary to primary care are primary reasons for poor uptake of postpartum screening [13,14].
We had earlier shown that women who did not attend postpartum glucose testing (ppGT) in three centres from our region had higher metabolic risk factors [10]. Subsequently, in one of the centres, we introduced a dedicated advanced nurse practitioner to encourage women to attend the postpartum testing and send personalized letters to women and their GPs for annual screening. However, there has been limited data on an individualised approach for targeting women who are unlikely to attend the ppGT. The primary aim of this study was to identify patient characteristics of women who did not attend the immediate ppGT in a real-world setting and to assess their subsequent T2D risk within 24-months post-delivery. Further, we built a predictive model to identify women who are less likely to attend ppGT using machine learning.

Study population
Detailed demographic, clinical and anthropometric data for all pregnant women who had GDM and ppGT due between January 2016 and December 2019 (n = 607) in a district general hospital (George Eliot Hospital NHS Trust, Nuneaton, UK) were collected. No personal information of the subjects was obtained. This study was conducted as a service evaluation audit and ethical approval was not necessary. The audit was approved by the GEH Diabetes and Audit department. A selective screening was done based on NICE 2015 criteria, using 75g OGTT between 24 and 28 weeks of gestation. GDM diagnosis was made if FPG �5.6 mmol/L or 2-hour plasma glucose �7.8 mmol/L. The risk assessment for screening includes women with higher pre-pregnancy BMI (�30 kg/m 2 ), family history of diabetes, previous GDM, previous macrosomic baby (birth weight � 4500g) and from ethnic minority groups [9]. At the time of discharge, all GDM women were instructed to schedule a ppGT at 6 to 13 weeks and received a phone reminder in the previous week of appointment by an advanced nurse practitioner.
In addition to the detailed antenatal data, data at birth including initiation of breastfeeding and birth centile of the infant, assessed by ethnicity specific UK reference chart were collected [15]. Women received a personalized letter to attend the ppGT (OGTT for those who had normal FPG and abnormal 2-hr glucose during antenatal OGTT or HbA1c for other abnormalities) (S1 Appendix). If they had difficulty in attending OGTT, they were invited again for postpartum HbA1c test. Following the results, GDM women and their General Practitioner (GP) received a second letter indicating their results and highlighting their future risk of T2D and CVD and advising them to visit their GP for annual HbA1c tests. All the follow-up glucose/HbA1c data (up to 24 months from the date of delivery) were extracted from the hospital electronic database. During follow-up, dysglycaemia (diabetes and prediabetes) was identified using American Diabetes Association (ADA) criteria [diabetes: HbA1c �48 mmol/mol (� 6.5%) or FPG �7.0mmol/L and 2-hr �11.1mmol/L post 75g OGTT; prediabetes: HbA1c �39 and <48 mmol/mol (�5.7 and 6.4%) or FPG �5.6 and �6.9mmol/L and 2-hr �7.8 to �11.0 mmol/L post 75g OGTT] [16].

Statistical analysis
Statistical analysis was performed using IBM SPSS Statistics for Windows, version 27 (IBM Corp., Armonk, NY, USA). Baseline characteristics were expressed in percentages for categorical variables and mean ± standard deviation (SD) for continuous variables. Univariate and multivariate hazard ratios (HR) were estimated using Cox proportional hazard model in a subgroup of women who were followed for two years from the date of delivery. All the potential predictors of dysglycaemia during the two years follow-up period were adjusted in the final model (S1 Table).

Machine learning analysis
Machine Learning (ML) analysis was performed in Python version 3.7 (www.python.org). Lasso regularization was used for feature selection. Nested standard 5-fold cross validation (CV1) was used for model evaluation [17]. An internal stratified 10-fold cross validation (CV2) was performed on each of the five training folds of CV1 for optimizing the shrinkage parameter in lasso (S1 Fig). Missing values were imputed using Multivariate Imputation by Chained Equations (MICE) technique, using other non-missing covariates, separately for the training and testing folds of CV1 to avoid leakage of information from the testing data into the training data. The training folds in CV1 were resampled using adaptive synthetic resampling technique to ensure equal representation of both the binary classes. The resampled training data was normalized.
Logistic regression model was fitted on the training folds in CV1 using the selected features from lasso. The model predictions on each of the test folds in 5 iterations of CV1 were aggregated and the Receiver Operating Characteristic (ROC) curve was plotted for this aggregated set. The area under the ROC curve (AUROC) was used to assess the performance of the method. The concept diagram of this method is illustrated in S2 Fig. After getting assurance of acceptable performance of this method, logistic regression was applied stepwise on the full data to obtain the final model for deployment. Stepwise details of our proposed method are given below: Step 1: Lasso hyperparameter optimization-Lasso regularization embeds feature selection in the form of an L1 penalty on the magnitude of the feature coefficients, causing irrelevant feature coefficients to shrink to zero. The shrinkage hyperparameter, i.e., the magnitude of the penalty, is optimized using a stratified 10-fold cross validation (CV2) on the training folds of CV1 in each iteration i. This process gives us an optimal hyperparameter value C i for each fold i.
Step 2: Feature selection-All baseline features (except history of alcohol in current pregnancy and previous pregnancy GDM history) were considered for feature selection. Converting the categorical features to binary using one-hot-encoding, there were 27 features in total. Lasso with the tuned hyperparameter C i from step 1 was used to select the best feature set f i from the training data in iteration i.
Step 3: Model training-Logistic regression model m i was learned from the features f i , selected in step 2, for each fold i of CV1.
Step 4: Model evaluation-The logistic regression model m i in step 3 was used to make predictions on the held-out test data of the corresponding fold i of CV1. The predictions on each of the 5 test folds of CV1 were aggregated to plot the ROC and calculate the AUROC.
Finally, Steps 1 to 3 were applied on the full data to obtain the final model for practical use. That is, lasso regularization hyperparameter was optimized using stratified 10-fold cross validation on the full data, features were selected from the full data using lasso with optimized C, and logistic regression model was learned on the selected features of the full data.
Analysis of Specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV), F1-score, and Accuracy was performed for five predetermined values of Sensitivity (60%, 70%, 75%, 80%, and 90%) for the optimal selected model. Using the coefficients from the final fitted logistic regression model on the full data, a composite risk scoring system was developed using the best selected antenatal predictors to predict the probability of GDM women to miss ppGT. Composite risk score was calculated from the equation where b 0 is the intercept and b n coefficient of n th predictor (x n ), respectively. A Decision Curve Analysis (DCA) was used to evaluate and compare the performance of our model in comparison to 'target all' and 'target none' approaches [18]. Finally, the correctly identified non-attenders (sensitivity) vs follow-ups avoided (the true negatives + false negatives, obtained from the optimal selected model) were used to calculate the number of women requiring enhanced care, to maximize the postpartum follow-up care.

Results
In total, 607 pregnant women were diagnosed with GDM. Diagnosis of GDM was made at mean gestational week of 27.9±4.4 weeks. 7.4% (n = 45) of women were diagnosed by the FPG alone, 58.3% (n = 354) by 2-hrs glucose alone and 12.4% (n = 75) had abnormal values for both. Further, 21.9% (n = 133) GDM diagnosed women had missing data on antenatal OGTT which were imputed using MICE technique. The prevalence of large for gestational age (LGA) was 13.8% and small for gestational age (SGA) was 14.7%.
Overall, 64.9% (n = 394) GDM women attended the ppGT, including postnatal OGTT (n = 340) and HbA1c (n = 54). Subsequently, 233 women had follow-up glucose testing at least once within 24-months of index delivery. At the immediate ppGT, 21.6% had dysglycaemia/ impaired glucose regulation (abnormal impaired fasting glucose (IFG), impaired glucose tolerance (IGT) or HbA1c �39 and <48 mmol/mol) and 2.8% had been diagnosed with T2D ( Fig  1). Comparison of the anthropometric, clinical, and demographic characteristics of women who did and did not attend the ppGT is shown in Table 1.
Women who did not attend ppGT were younger, unmarried, multiparous, had higher BMI and continued to smoke during pregnancy at the antenatal booking visit compared to those who attended ppGT ( Table 1). The factors independently associated with non-attendance of ppGT are shown in Table 2. The factors associated with non-attendance of ppGT, selected in each of the 5 iterations of cross validation are shown in S3 Fig.
The AUROC computed from aggregating the test predictions from 5 test folds is 0.72 (Fig 2). The optimal threshold was determined as 0.46 (sensitivity of 0.70, specificity of 0.66 and, maximal F1 score). Forty six out of 100 women were above this threshold of 0.46 and focusing on these women could improve the ppGT testing (S4 Fig). Table 3 shows the sensitivity, specificity, PPV, NPV, F1 score and accuracy at other probability thresholds. The F1 graph is shown in S5 Fig. In the decision curve analysis, by comparing the 'target all' and 'target none' approaches, ML algorithm obtained higher standardized net benefit as compared to the follow-up of all  GDM women for ppGT (Fig 3). Our proposed ML prediction model has the highest benefit across various probability thresholds and specifically identified 46% GDM women who are unlikely to attend ppGT (optimal threshold = 0.464). Based on our proposed final model, the composite risk score, P (non-attendance to ppGT), was calculated as 1/ During the subsequent follow-up period between 4 and 24 months post-delivery, two hundred and thirty-three GDM women had at least one HbA1c tested value (Fig 1). Among those tested (n = 233 out of 607), additional 24% (n = 56) women had dysglycaemia. Twelve women had converted to T2D between ppGT and 24 months with higher proportion in women who did not attend the immediate ppGT (attended vs non-attended: 2.5 vs. 11.4%; p = 0.005). Survival analysis showed that women from south Asian ethnicity, those with higher booking BMI and antenatal HbA1c had increased hazard ratio for dysglycaemia in both ppGT groups (S1 Table).

Discussion
Our study highlights that unmarried status, younger age, higher BMI, multiparity, and continued smoking during pregnancy are risk factors of poor postpartum attendance for glucose testing, following a GDM pregnancy. Using an unbiased, data-driven machine learning approach, we propose a composite risk score based on easily available antenatal parameters for women who are less likely to attend postpartum screening. Worryingly, it appears that those who did

PLOS ONE
not attend the immediate postpartum screening, have higher conversion rate to T2D within two years of index pregnancy, although the numbers were small. The attendance to postpartum glucose tolerance screening in our centre has improved since the introduction of individualised letter at the time of discharge in 2015 (64.9% vs 49%) [10]. Although the uptake is similar [12], or better than others [1,19], it is still not satisfactory. Interestingly, 33% of women who did not attend the immediate testing, subsequently were tested at least once within two years of index pregnancy. This suggests that the message of annual screening to the patient and/or their GP is filtering through, albeit in a small proportion of individuals. However, we did not have the information on how many women were invited for this subsequent testing.
Previous studies have attempted to understand the barriers for poor attendance. Lack of awareness and competing challenges with childcare are some of the common barriers [14]. In addition, younger women may perceive their risk is low [20]. While obese women may understand their risk, they might not attend for other reasons including fear or the stigma of diagnosis of T2D [21]. Antenatal educational interventions highlighting the subsequent risk of T2D and flexible postnatal lifestyle services by incorporating health visitors for glucose testing have been suggested as potential strategies to improve these barriers [13,22]. Although implementation of a recall register system and/or coordinator institution enhanced the ppGT uptake rate in some countries, our ML based risk stratification could provide a better understanding for personalised education specifically, to capture non-attenders at the time of hospital discharge https://doi.org/10.1371/journal.pone.0264648.g003 [11,12]. While our letter is designed to have individualised approach, and seem to have improved the overall testing, it is possible that some women felt the information provided is standard and is 'not specific to them'. Our proposed machine learning based individualised probability threshold identified 46% of women who are unlikely to attend postpartum testing using simple, routinely available characteristics. Healthcare professionals can identify and target these women antenatally and provide enhanced education on the importance of the postpartum glucose testing both during pregnancy and at the time of discharge from the hospital. This approach may also facilitate healthcare professionals' perception of ppGT by emphasizing the significance of screening that may help women to identify their risk and seek professional care for timely intervention. Alternatively, doing FPG just before discharge could also be carried out in these women, to ensure some form of glucose testing is done [23]. To our knowledge this is the first study to attempt to create an individualised composite risk score for postpartum non-attendance, using a machine learning algorithm. A simple, Microsoft Excel based individualised risk score can be easily calculated by the healthcare professionals to identify the women who is less likely to attend that may benefit from enhanced education (S2 Appendix).
The overall conversion rate to abnormal glucose tolerance in our study was high i.e., 24.4% at the ppGT and an additional 24% in the subsequent 18 months. This is worryingly higher than previous reports [1,3], and highlights the importance of ensuring testing happens not only in immediate post-delivery, but annually thereafter. Recent evidence also suggests that the cardiovascular risk is higher in these women [4,22] and, perhaps the annual screening should include other common CVD risk factors such as smoking, blood pressure and lipid profile. Individualised risk calculation for abnormal glucose tolerance similar to the one proposed for non-attendance could potentially improve attendance, utilize the resources effectively and enable targeted education for prevention. As our composite risk score can be calculated in a simple Excel based calculator (S2 Appendix) our approach can also be easily implemented in low resource settings.
Our study had two key strengths: follow-up data on women who did not attend the immediate post-partum testing and machine learning based individualised composite risk score for non-attendance. However, we note that although this work included all the women who were diagnosed to have GDM in the study period, this was a retrospective study. We therefore were limited by the routinely collected data. In addition to some missing data, we did not have information about treatment modalities, which has been shown to be associated with postpartum screening uptake, albeit with mixed results [24][25][26].
Other factors could conceivably be relevant in predicting women's participation in followup, for example, proximity of the healthcare centre, their employment status and the nature of their employment, flexibility in having an appointment. However, in this study, all participants were proximal to the hospital, had their antenatal care and delivery in the hospital, women were able to reschedule and pick appointment date/time that is suitable for them. We also found no statistical difference between employed and unemployed women. But these and other socioeconomic parameters may prove relevant in larger studies.
While this information might have improved the predictive power of the ML algorithm, we believe that using our model it is easy to incorporate additional maternal characteristics based on their availability. We also did not have information on how many women were invited for the subsequent annual testing. Finally, while we are confident about the beneficial role of advanced nurse practitioner for the improvement in our postpartum testing, we were unable to ascertain the role of the standard letter sent to the women.

Conclusion
Our study highlights that a simple machine learning algorithm can accurately identify women who are unlikely to attend postpartum testing, based on routinely collected clinical parameters. Such knowledge can enable healthcare professionals to provide enhanced education throughout the antenatal period, at the time of discharge, and during the immediate postpartum period using health visitors. However, this will require an RCT to confirm the usefulness of ML based risk prediction in the evaluation of postpartum glucose screening uptake rate. In addition, with the improved accuracy of point of care HbA1c testing kits, this can also be utilized to improve the testing and identification of abnormal glucose tolerance [27]. The cost effectiveness of such strategies will also require well designed prospective studies. The processed data is divided into 5 folds. The light grey region shows the training data, and the dark grey shows the testing data. Within each iteration i of the outer 5-fold CV, the training folds further undergo internal 10-fold CV in Step 1 for lasso hyperparameter optimization. This is known as nested cross validation. In step 2, feature selection is performed on the training folds in iteration i using lasso with optimized regularization hyperparameter. Logistic regression model with selected features is fit on the training folds in iteration i in Step 3. The fit model is used for prediction on the exclusively held out test data in iteration i. The test predictions from all 5 iterations are aggregated to plot and calculate the area under the ROC curve for evaluating the performance of our method.