Patient clusters based on HbA1c trajectories: A step toward individualized medicine in type 2 diabetes

Aims To identify clinically meaningful clusters of patients with similar glycated hemoglobin (HbA1c) trajectories among patients with type 2 diabetes. Methods A retrospective cohort study using unsupervised machine learning clustering methodologies to determine clusters of patients with similar longitudinal HbA1c trajectories. Stability of these clusters was assessed and supervised random forest analysis verified the clusters’ reproducibility. Clinical relevance of the clusters was assessed through multivariable analysis, comparing differences in risk for a composite outcome (macrovascular and microvascular outcomes, hypoglycemic events, and all-cause mortality) at HbA1c thresholds for each cluster. Results Among 60,423 patients, three clusters of HbA1c trajectories were generated: stable (n = 45,679), descending (n = 6,084), and ascending (n = 8,660) trends, which were reproduced with 99.8% accuracy using a random forest model. In the clinical relevance assessment, HbA1c levels demonstrated a J-shape association with the risk for outcomes. HbA1c level thresholds for minimizing outcomes’ risk differed by cluster: 6.0–6.4% for the stable cluster, <8.0% for the descending cluster, and <9.0 for the ascending cluster. Conclusions By applying unsupervised machine learning to longitudinal HbA1c trajectories, we have identified clusters of patients who have distinct risk for diabetes-related complications. These clusters can be the basis for developing individualized models to personalize glycemic targets.


Introduction
Intensive glycated hemoglobin (HbA1c) control is not always recommended given the inconsistent evidence that lowering HbA1c levels may be associated with increased risk for mortality and other type 2 diabetes-related outcomes [1][2][3][4][5][6]. Recent guidelines from the American Diabetes Association and the European Association for the Study of Diabetes suggest creating individualized targets to account for the heterogeneity within the population of people with type 2 diabetes [7,8].While the variability in glycemic values over time has been shown to be an important independent risk factor for mortality [9], cardiovascular complications [10], and cognitive performance [11], current guidelines do not address how to incorporate glycemic level trajectories when characterizing individual risk.
Creating sub-groups of this complex population may help to identify underlying common characteristics for improving the appropriateness of treatment goals. This is supported by a growing body of evidence showing that stratification of a population into more homogenous sub-groups can achieve better prediction of individualized models [12,13]. There are multiple methodologies to do so, including clustering using laboratory data trajectories with demonstrated utility in differentiating stages of chronic diseases [14,15]. Machine learning algorithms have also been used to discover hidden patterns in complex datasets through unsupervised methodologies, which can yield clusters of individuals with similar behaviors or characteristics. These techniques can be valuable in identifying groups of patients with type 2 diabetes who have distinct risk profiles that are different from previous findings.
As part of a larger study to create a tool that provides individualized HbA1c targets for optimal long-term type 2 diabetes risk management, this sub-study aimed to: (1) characterize clusters of similar patients based on HbA1c trajectories over three years and (2) evaluate the clinical relevance of these clusters by assessing the associated risk for type 2 diabetes outcomes for each cluster.

Setting and data source
All-cause mortality, demographic, clinical, and laboratory data were obtained from the Clalit Health Services (Clalit) healthcare data warehouse. Clalit is the largest of the four payer/provider health funds in Israel, providing healthcare services to over four million patients, approximately 53% of the total Israeli population. This study was performed using deidentified electronic health record (EHR) data from Clalit's fully integrated database, which centralizes data from community clinics, hospital visits, laboratory tests and results, and medication dispensing. This study was approved by Clalit's internal ethics review board.

Study design
This study is a longitudinal retrospective cohort study among patients with type 2 diabetes having disease duration of three to seven years prior to January 1, 2010 (index date). The first part of this study derived clusters employing unsupervised machine learning techniques using each patient's HbA1c history taken from the three years prior to the index date. The second part, also addressing the first objective, uses a supervised machine learning model to determine the clusters' reproducibility. Finally, we tested the clinical relevance of the derived clusters by evaluating multivariable five-year risk for type 2 diabetes outcomes with the baseline period from January 1, 2003 through December 31, 2009 and the follow-up period from January 1, 2010 through December 31, 2014 (Fig 1).

Study population
The study population consists of Clalit members included in the Clalit diabetes registry who are identified via an algorithm previously described in detail [16]. These patients with short to medium duration type 2 diabetes were selected to allow sufficient time for disease-related complications to develop, but not too long to have developed irreversible damage. Additionally, patients included had at least three years of continuous membership at Clalit prior to the index date. Patients were excluded if they had concurrent chronic conditions, such as cancer, chronic infectious disease (AIDS, Hepatitis B/C/Delta), or hepatic cirrhosis.

Longitudinal HbA1c measures
We assembled HbA1c trajectories based on available and imputed laboratory data, excluding extreme outliers (defined as seven standard deviations from the cohort mean). HbA1c trajectories were aggregated into four time periods of nine months (t1, t2, t3, t4) and the average were used when there were multiple measurements in each time frame. The decision to use ninemonth time frames was based on the observation that more than 80% of the study cohort had two consecutive HbA1c tests within nine months.
Only patients who had HbA1c measures in at least three of the four nine-month time periods were included in the unsupervised cluster analysis; otherwise, they were designated as a separate group labeled 'Undefined Cluster.' Among patients with HbA1c measures in only three of the four time periods, missing time frame values were imputed using linear models based on the other three HbA1c values. The imputed dataset was randomly divided into a training set (60% of the dataset) and a validation set (40%). Four separate models were created for imputation, one for each nine-month time period (S1 Fig).

Clusters generation
A longitudinal unsupervised trajectory clustering methodology was implemented using the "traj" R package (version 1.2) [17,18]. The methodology includes feature engineering by generating 24 different features derived from HbA1c trajectories (S1 Table). The most relevant of these HbA1c measures were selected using factor analysis. The selected measures were then used for cluster generation. The optimal number of clusters were calculated using the "NbClust" algorithm (R package) as described by Charrad et al. [19]. The calculation of the optimal number of clusters was based on the most frequent number recommended by 26 different methods [19]. This is the standard number of methods included in the analysis package, which assumes a voting approach in determining the optimal number of clusters.
This study utilized K-means clustering of the selected features, similar to how Jacob et al. [20] used K-means clustering to uncover patterns in metabolite levels of pregnant patients' data. The K-means default parameters were defined by the traj functions. K-means is an algorithm that characterizes the clusters by searching for the optimal centers of the data points on a multidimensional space, using randomly and iteratively resampled data points until the distance between those centers and the other points in the same cluster is minimized. The stability of the clusters was assessed using fixed point cluster analysis as described by Hennig [21]. A Jaccard similarity index of 0.95 or higher, indicates that the cluster is highly stable [22,23].

Reproducibility of clusters using a supervised algorithm
One limitation of unsupervised algorithms is that they cannot always be reproduced on new data. In order to support the reproducibility of our findings on any new dataset, a supervised random forest algorithm was developed on a training subset, 60% of the original data, and the resulting algorithm was validated using a test dataset. This algorithm assembles multiple iterations of decision trees to determine how accurately it can yield the predicted clusters that were produced with the unsupervised algorithm. Through discovery of the rules that were used by the unsupervised K-means algorithm, the random forest algorithm classifies the individuals into clusters. The accuracy was assessed as the number of correctly predicted clusters divided by the number of patients in the dataset. The advantage of this kind of algorithm is that it yields very accurate models and prevents overfitting.

Clinical relevance assessment of clusters
The clinical relevance of the clusters was determined by comparing the five-year risk for type 2 diabetes outcomes at various levels of HbA1c across the cluster groups. This HbA1c was defined as the first test value after the index date (post-index HbA1c) during the follow-up period. Patients without a post-index HbA1c value were excluded from this clinical relevance assessment.
Type 2 diabetes outcomes were defined as a composite outcome of macrovascular and microvascular complications, hypoglycemic events and all-cause mortality, and the first event to occur in the follow-up period indicated an outcome. Macrovascular outcomes were any incident event of one of the following conditions: myocardial infarction (MI), unstable angina pectoris (UAP), coronary artery bypasses graft (CABG), percutaneous transluminal coronary angioplasty (PTCA), and cerebrovascular accident (CVA). Microvascular complications were the first recorded new diagnosis of diabetic retinopathy (DR), diabetic neuropathy (DNeu), diabetic nephropathy (DNeph), a lower extremity ulcer (LEU), or a lower extremity amputation (LEA) (S2 Table). All diagnoses prior to the index date were considered prevalent comorbidities and were not considered outcomes.
In addition to post-index HbA1c, covariates included in our analyses were age (in years), sex, socio-economic status (SES, as low, medium, and high categories), obesity (BMI of 30 kg/ m 2 or higher), smoking status (current smokers, former smokers, non-smokers, and unknown), diagnosis of hypertension, diagnosis of congestive heart failure, history of hypoglycemic episodes, and if chronic disease medications were dispensed (at least one or more dispensed medication of any of the following: cholesterol lowering drugs, agents acting on the renin-angiotensin system, insulin, and hypoglycemic agents). All these variables were collected as of the index date, with the last measure of multiple values used when relevant.

Statistical analysis
For continuous variables, p values where calculated using ANOVA. In case of heteroscedasticity (measured using the Bartlett test), White correction was applied. For categorical variables, Chi square test was used if all the cells were higher than 5. In cases where one cell or more had 5 or less observations, the Kruskal-Wallis test was used. Multivariable logistic regression models were generated by adjusting post-index HbA1c categories for all covariates. All analyses were performed using the R statistical software version 3.2.2 and the previously named packages [24].

Results
We identified 85,783 patients meeting the inclusion criteria, of which 60,423 patients (70.4%) with 217,133 associated HbA1c valid measures were included in the sample for clustering. Table 1 shows the main demographic and clinical characteristics of the overall study population and each of the patient clusters. The mean age of the study cohort was 63.6 years, 52.6% of the patients were female, 28.3% had a low SES, and 30.0% had a high SES. The mean postindex HbA1c was 7.5% (58 mmol/mol).

Cluster generation
Of the 24 measures generated through feature engineering [17,18], the most relevant measures selected by factor analysis were the change in HbA1c values from t1 to t4, mean of the absolute first differences in HbA1c values, and the ratio of the maximum absolute second difference to mean absolute first difference of HbA1c values. The NbClust algorithm indicated that the recommended number considered as the optimal number of clusters was three (S1 Text and S3 Table).
The distribution of the clusters was as follows: stable cluster: 45,679 patients with a stable HbA1c trend over time; decreasing cluster: 6,084 patients with a descending trend over time; and ascending cluster: 8,660 patients with an ascending trend over time. undefined cluster included 25,360 patients and was also included in the analyses for comparison. The Jaccard similarity indexes for the resulting clusters were 0.99 for the stable cluster, 0.99 for the decreasing cluster, and 0.98 for the ascending cluster. Fig 2 shows the median trajectory (with the lower and upper 10%) for each cluster.

Reproducibility of the clusters using a supervised algorithm
The random forest model for classifying patients into specific clusters had an accuracy of 99.8% in the test dataset.   no treatment compared to the 8.0% in both the descending and ascending clusters, while the undefined cluster had the highest proportion of untreated patients (46.6%).

Clinical relevance assessment by clusters
The descending and ascending clusters showed the higher proportion of baseline prevalent and incident outcomes for micro and macrovascular complications compared to the stable cluster (p<0.001) (Tables 1 and 2). Hypoglycemic events were more frequent in patients in the descending cluster in both the baseline and follow-up periods (p<0.001). Mortality was also higher in the descending cluster (15.3% vs 11.0% and 12.3% for the stable and ascending clusters, respectively; p<0.001). The undefined cluster showed relatively low levels of micro and macrovascular complications, but had higher mortality rates (14.8%). Fig 3 shows the risk levels for the composite outcome by the post-index level of HbA1c after adjustment for potential confounders. There were 3,900 (4.5%) patients with a missing postindex HbA1c value who were not included in this analysis, of which 82% pertain to the NA group. A J-shape was observed among all clusters with higher risk for lower levels of HbA1c. The risk increased with the increase of HbA1c levels in all clusters. For the stable cluster, the risk was significant at an HbA1c level below 6.0% (42 mmol/mol) and for levels of 7.0% (53.0 mmol/mol) and higher. For the descending cluster risk was significant at 8.0-8.4% (64-69 mmol/mol) and at 9.0% (75 mmol/mol) and higher, and for the ascending cluster, at 9.0% (75 mmol/mol) and higher. The undefined cluster had a significantly higher risk at an HbA1c level of 7.5% (58 mmol/mol) or higher, similar to the stable cluster.

Discussion
By stratifying a population with type 2 diabetes according to patients' HbA1c trajectories, reproducing these clusters through supervised learning techniques, and testing these clusters for clinical relevance in terms of risk for outcomes, this study offers empirically-derived patient groups that can be used as a first step toward modeling individualized HbA1c targets. This methodology differentiates clusters of patients with distinct baseline characteristics and differential risk patterns for type 2 diabetes outcomes. The results also emphasize the importance of examining risk factors for chronic diseases like type 2 diabetes as trajectories of the course they take over time, rather than as single measurements. The reproducibility and stability of these generated clusters provides the ability to translate these clusters to other populations that have similar characteristics to the population used in this study, such as moderate diabetes disease duration.
Employing an unsupervised machine learning clustering technique offers an advantage over using population-based risk models, in which the most common characteristics influencing the vast majority of patients are identified, at the expense of potentially masking important characteristics relevant to smaller sub-groups of individuals. Such population-based models have been shown to fail when applied to some individual patients to determine individualized risk [25]. As an alternative, it has been proposed that "personalized" predictive models be built for patients based on the information of clinically similar patients [26]. Therefore, the strength of employing an unsupervised algorithm technique to determine patient clusters is that the patterns of features in the data are not predetermined, but rather, derived from what can be uncovered in the data through this methodology.
The groups of patients generated in the clustering analysis make sense clinically, as the progression of the disease may be different for those who are stable compared to those whose HbA1c level is increasing (unstable) or decreasing (responding to treatment). While international guidelines generally recommend targeting HbA1c to a value below 7.0%, except for older and the most comorbid patients, this study has identified at least two groups of patients for whom an HbA1c value associated with the lowest risk profile deviates from this recommendation. the descending and ascending clusters are classifications of patients with type 2 diabetes whose associated risk for outcomes indicates wider target HbA1c ranges. Risk of complications among the ascending cluster patients was only significant at the higher levels of HbA1c, which may signal that HbA1c is not the most important risk factor in this group of patients, thereby warranting further exploration to identify more relevant factors. The narrowest range of target HbA1c levels from 6.0-7.0%, was found for the stable cluster, in agreement with the upper bound of guideline recommendations. However, the significantly higher risk of type 2 diabetes complications associated with HbA1c <6.0% in this cluster, deviates from the guideline recommendation and is consistent with J-shaped risk curves found in previous studies [3].
There are several limitations in the study that should be taken into consideration. One limitation is that the algorithm applied in the Clalit population with type 2 diabetes requires at least one HbA1c measurement to be taken every nine months, over a period of three years, but the time between measures may vary among other populations, which should be checked. Another limitation is the large number of patients in the database who had missing HbA1c values and therefore not enough data to determine trajectories. We decided to analyze these patients separately, as the undefined cluster, and not to impute missing HbA1c values because the underlying analysis relied on the observed HbA1c value trajectories, and we did not want to introduce excessive bias based on too many imputed values. A third limitation is that this study only takes into account the five-year risk for type 2 diabetes outcomes (macrovascular, microvascular, hypoglycemic events, or all-cause mortality) among patients with relatively short duration (3-7 years) of type 2 diabetes. For some of the patients, the risk of outcomes may be higher with a longer follow-up period and thus, may generate a more comprehensive risk score. Finally, the study period was not long enough for us to study fluctuations in the trajectories of HbA1c, and with longer study periods, these clusters would likely be further refined.
Our results confirm the importance of stratifying the heterogeneous population with type 2 diabetes into more homogeneous groups through the discovery of new patterns in data. To identify relevant clusters of patients with moderate diabetes disease duration in a different context, this three-step process of conducting unsupervised learning, reproducing results in supervised learning models, and testing these clusters for clinical relevance can be replicated. This methodology can be built upon to develop more precise models that identify any individual's HbA1c target ranges with the lowest associated risk of future complications.   Table. The resulting indexes obtained by running 26 different methods available in the "NbClust" algorithm in R (those result are retrieved using the following code 'nbcl$All. index').