Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

Etiologies underlying subtypes of long-standing type 2 diabetes

  • Riad Bayoumi ,

    Roles Conceptualization, Funding acquisition, Project administration, Supervision, Writing – original draft, Writing – review & editing

    riad.bayoumi@mbru.ac.ae

    Affiliation College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE

  • Muhammad Farooqi,

    Roles Data curation, Investigation, Methodology

    Affiliation Dubai Diabetes Center, Dubai Health, Dubai, UAE

  • Fatheya Alawadi,

    Roles Data curation, Investigation, Methodology

    Affiliation Endocrinology Department, Dubai Hospital, Dubai Health, Dubai, UAE

  • Mohamed Hassanein,

    Roles Data curation, Formal analysis, Methodology

    Affiliation Endocrinology Department, Dubai Hospital, Dubai Health, Dubai, UAE

  • Aya Osama,

    Roles Data curation, Formal analysis, Investigation, Methodology

    Affiliation College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE

  • Debasmita Mukhopadhyay,

    Roles Formal analysis, Methodology, Software, Supervision

    Affiliation College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE

  • Fatima Abdul,

    Roles Data curation, Formal analysis, Methodology, Supervision

    Affiliation College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE

  • Fatima Sulaiman,

    Roles Formal analysis, Methodology, Resources

    Affiliation College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE

  • Stafny Dsouza,

    Roles Formal analysis, Investigation, Methodology, Project administration

    Affiliation College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE

  • Fahad Mulla,

    Roles Data curation, Methodology

    Affiliation College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE

  • Fayha Ahmed,

    Roles Formal analysis, Investigation, Methodology

    Affiliation Pathology Department, Dubai Hospital, Dubai Health, Dubai, UAE

  • Mouza AlSharhan,

    Roles Data curation, Investigation, Methodology

    Affiliation Pathology Department, Dubai Hospital, Dubai Health, Dubai, UAE

  • Amar Khamis

    Roles Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliation College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE

Abstract

Background

Attempts to subtype, type 2 diabetes (T2D) have mostly focused on newly diagnosed European patients. In this study, our aim was to subtype T2D in a non-white Emirati ethnic population with long-standing disease, using unsupervised soft clustering, based on etiological determinants.

Methods

The Auto Cluster model in the IBM SPSS Modeler was used to cluster data from 348 Emirati patients with long-standing T2D. Five predictor variables (fasting blood glucose (FBG), fasting serum insulin (FSI), body mass index (BMI), hemoglobin A1c (HbA1c) and age at diagnosis) were used to determine the appropriate number of clusters and their clinical characteristics. Multinomial logistic regression was used to validate clustering results.

Results

Five clusters were identified; the first four matched Ahlqvist et al subgroups: severe insulin-resistant diabetes (SIRD), severe insulin-deficient diabetes (SIDD), mild age-related diabetes (MARD), mild obesity-related diabetes (MOD), and a fifth new subtype of mild early onset diabetes (MEOD). The Modeler algorithm allows for soft assignments, in which a data point can be assigned to multiple clusters with different probabilities. There were 151 patients (43%) with membership in cluster peaks with no overlap. The remaining 197 patients (57%) showed extensive overlap between clusters at the base of distributions.

Conclusions

Despite the complex picture of long-standing T2D with comorbidities and complications, our study demonstrates the feasibility of identifying subtypes and their underlying causes. While clustering provides valuable insights into the architecture of T2D subtypes, its application to individual patient management would remain limited due to overlapping characteristics. Therefore, integrating simplified, personalized metabolic profiles with clustering holds greater promise for guiding clinical decisions than subtyping alone.

1 Introduction

Diabetes is a heterogenous disease [1, 2], with well-defined categories such as type 1 diabetes (T1D), latent autoimmune diabetes in adults (LADA), and monogenic types. The remaining patients are pooled together under T2D. However, patients with T2D present with a wide spectrum of clinical symptoms, and a range of variables that have a direct impact on glucose homeostasis. Patients may develop T2D at an early age or late in life [3]. They may be lean, overweight, obese, or morbidly obese [46]. The disease may be characterized by failure of insulin secretion or insulin resistance or both; may progress either rapidly or slowly and may be mild or severe. It may lead to one or more complications with a variety of outcomes [2]. Consequently, the clinical paradigm of one-size-fits-all leads to management and treatment failures in many patients [7]. Thus, there is a need to subtype T2D into distinct, well-defined groups [8] to better understand the underlying mechanisms, treatment responses, and prognoses associated with the disease. Several studies have attempted to identify T2D subtypes in patients of white European origin using various approaches, such as statistical clustering algorithms, clinical characteristics, genetics, and biomarkers [917]. T2D clustering was also replicated in other ethnic groups [1822]. In most of these studies, subtyping of T2D was based on newly diagnosed patients [917] with a few exceptions where the temporal stability of clusters was tested in patients with short-term disease [13, 16, 23]. Few studies have attempted T2D subtyping in long-standing disease [3, 2124] to avoid the different rates of disease progression and the impact of complications [3, 2529].

Most studies have also used unsupervised K-means hard clustering methods with definitive assignment of data points to single clusters and reported distinct subtypes [9, 1113, 1822]. However, some other studies employed unsupervised soft clustering methods with the likelihood of data points belonging to more than one cluster and reported T2D subtypes with considerable overlap [1417]. In this study, we continue to investigate the heterogeneity of T2D by cluster analysis of Emirati Arab T2D patients with long-standing disease, using unsupervised soft clustering algorithms.

2 Methods

2.1 Study design

This retrospective, cross-sectional, non-interventional study was conducted at the Dubai Diabetes Centre and Dubai Hospital of Dubai Health, Dubai, UAE. Dubai Diabetes Centre is dedicated to the specialized care of patients with diabetes. Dubai Hospital is a specialty hospital equipped with 600 beds and provides surgical and medical facilities. Both follow the American Diabetes Association Standards of Medical Care for Diabetes [2].

This study was approved by the Dubai Scientific Research Ethics Committee of Dubai Health Authority. Approval No. DSREC-12/2019-05 was issued on January 23, 2020. Further IRB extensions were granted by DSREC on 28th April 2021 and 10th May 2022. Written informed consent was obtained from patients during the face-to-face interviews. All relevant clinical and laboratory data were obtained from the Dubai Health Information System “SALAMA”. Information gathered was anonymized to maintain patient privacy and confidentiality. Clinical management and treatment protocols, all laboratory methods, radiographic imaging, and data obtained from the “SALAMA” hospital information system adhered to relevant Dubai Health regulations and guidelines and conformed to the provisions of the Declaration of Helsinki (as revised in Fortaleza, Brazil, October 2013).

2.2 Patients

We aimed to collect enough data to identify genuine underlying disease clusters and avoid creating random ones. We anticipated our analysis to yield 4–5 subgroups, each containing a minimum of 20–30 observations. This ensured sufficient data points to effectively define the characteristics of each cluster.

A cohort of 348 Emirati patients with T2D were recruited from a database of 620 patients who underwent random screening between January 24th, 2020 and December 31st, 2022, at the outpatient departments of the Dubai Diabetes Centre and Dubai Hospital. The selected patients had complete data for all clustering parameters. Patients were tested for GAD antibodies (ELISA Test Kit; Demeditec Diagnostics, GmbH, Germany) to exclude T1D and LADA. The selected patients ranged in age from 18 to 87 years and included 167 men and 181 women. They had an average T2D duration of 14 years and at least two co-morbidities or complications. Each patient had been on two or more medications (metformin, thiazolidines, SGLT2 inhibitors, and GLP-1 agonists) for a minimum of two years. Patients with conditions causing secondary diabetes were excluded. For each patient, the clinical and laboratory data were obtained from the SALAMA electronic health record system used by all health facilities affiliated to Dubai Health. The recorded medical history, comorbidities, and complications of the disease were confirmed through face-to-face interviews with the patients.

2.3 Statistics

2.3.1 Cluster analysis.

IBM SPSS Modeler (IBM North America, New York, USA) was used for clustering analysis. The software provides several machine-learning algorithms that can be used for classification, regression, clustering, and anomaly detection. These algorithms are based on artificial neural networks and deep learning.

The Auto Cluster model in IBM SPSS Modeler was utilized in the exploratory phase to determine the optimal clustering solution for the dataset comprising 348 Emirati patients with long-standing T2D. Five T2D variables (FBG, FSI, BMI, HbA1c and age at diagnosis) were standardized and employed as predictors to identify the suitable number of clusters and their characteristics. To mitigate the influence of confounding factors, such as comorbidities and complications associated with the long-standing T2D phenotype, the predictor variables were limited to those five parameters that directly or indirectly influence the disease’s pathophysiology. FBG and FSI levels are two crucial etiological factors that directly reflect the underlying pathophysiology of T2D. We used them in the initial exploratory technique for grouping the data. Furthermore, FBG and FSI were used in the explanatory process to assess peripheral insulin resistance (HOMA-IR) and/or impaired insulin secretion (HOMA-B), that served as descriptors of the clusters, not as part of the clustering process itself.

The BMI is used as a predictor for T2D because it is strongly linked to insulin resistance, a key factor in T2D pathophysiology. While HbA1c isn’t strictly an etiological variable for T2D, it plays a pivotal role in its diagnosis, monitoring, and prognosis. It is a valuable tool for providing long-term insights into glycemic control and the risk of developing long-term complications. Age at diagnosis of T2D reflects the cumulative effect on metabolic dysfunction and the duration of risk factors for complications. Early-onset T2D is associated with a more aggressive course and higher risk of complications, while late-onset disease is usually more benign.

The Auto Cluster model operates as a Bayesian Network Model for classification purposes. It sequentially employs three unsupervised soft clustering algorithms:

  1. A two-steps process: The initial step involves a single pass through the data to condense the raw input into a manageable set of subclusters. Subsequently, a hierarchical clustering method is utilized in the second step to progressively merge these subclusters into larger clusters. The two-step approach offers the advantage of automatically estimating the optimal number of clusters.
  2. The K-means clustering algorithm: This method defines a fixed number of clusters and iteratively assigns records to clusters while adjusting the cluster centers until further refinement does not enhance the model. Unlike predictive modelling, k-means employs unsupervised learning to uncover patterns within the input fields.
  3. The Kohonen algorithm: This generates a neural network capable of clustering the dataset into distinct groups. Once fully trained, similar records should be closed together on the output map, while dissimilar records will be positioned farther apart. This process also aids in determining an appropriate number of clusters.

Following sufficient iteration for each model, the Auto Cluster will produce a Silhouette index, with the model exhibiting the highest index being selected. The Auto Cluster node prioritizes algorithms and allocates data points into the relevant clusters accordingly. For both the two-steps and the K-means algorithms the same Silhouette Index of 0.64 was generated.

The Bayesian network assigns probabilities of membership to the participants in the five identified clusters. Each participant is represented as a node in the network by predictor variables and becomes an additional node that influences the main cluster assignment node. Conditional probability distributions model probabilistic dependencies, allowing computation of the likelihood of a participant belonging to a specific cluster, given their observed variables. Conditional probability distributions enable accurate inferences and yield probabilities of membership for each participant in one or more clusters. Therefore, the model computes the overlap of an individual within multiple clusters, as displayed in a heatmap [S1 Table].

The model also assigns membership and measures the degree of overlap between clusters using the silhouette coefficient. A high silhouette coefficient indicates that the data point is well-matched to its own cluster and poorly matched to neighboring clusters, with less overlap between the clusters. The silhouette coefficient ranges from -1 to 1. A value of 1 indicates that the data point fits perfectly into a single cluster, while a value of -1 indicates that the data point does not fit into any cluster. A value of 0 indicates that the data point is equally suited for two or more clusters. To highlight the clinical characteristics of clusters, data for individuals with a silhouette coefficient of 1.0, and/or probability of 1.0, on the Bayesian Network were selected, as they sat in the non-overlapping apices of clusters. They exhibited the highest degree of dysfunction in the etiological processes governing cluster membership. As one lowers the silhouette coefficient values in a sliding scale below 1.0, the degree of overlap decreases and the number of individuals in the non-overlapping apices of clusters increases but their clinical homogeneity and discreetness drops.

Principal component analysis was used to identify the linear combinations of the original variables that explained most of the variance in the data and to extract the features that were most correlated with the clustering variables. Principal component analysis has also been used to visualize the dataset and help in identifying clusters, as it transforms the dataset into a lower-dimensional space where the clusters are more easily separated.

2.3.2 Multinomial logistic regression.

Multinomial logistic regression in SPSS 29 was used in the explanatory phase of the study to validate the results of clustering performed by the Auto Cluster model in IBM SPSS Modeler and predict the probability of categorical dependent variables (Clusters 1–5), given a set of the five independent predictor variables (FBG, FSI, BMI, HbA1c, and age at diagnosis). The model was refined using a maximum likelihood procedure to determine the values of the model parameters that maximized the likelihood of the observed data. The relationship between the predictors and category of the dependent variable was modelled using the log-odds of each category relative to a reference category (Cluster 5). The outcome of the multinomial logistic regression model was a set of regression coefficients (B) for each predictor variable. The coefficients were then used to rank the importance of the predictors in each cluster. The odds ratio (OR) is a measure of the association between a predictor variable and a cluster, calculated by exponentiating the B value. The larger the B value, the stronger the association. An OR > 1 indicated that the predictor variable was associated with an increased risk of the outcome falling into a particular cluster relative to the reference category (Cluster 5).

2.3.3 HOMA assessment.

FBG, FSI and other chemistry assays were performed using a Cobas 6000 Analyzer (Hoffmann La Roche Diagnostics, CA, US). Insulin resistance (IR) and β-cell dysfunction (B) were evaluated by homeostatic model assessment for IR (HOMA-IR) and β-cell dysfunction (HOMA-B) [30]. The HOMA indices were derived from FBG and fasting serum insulin levels using the following equations: (1) (2)

The higher the HOMA-IR, the greater the peripheral resistance to insulin, while the lower the HOMA-B, the greater the β-cell dysfunction. Generally, a HOMA-IR value < 1 indicates optimal insulin sensitivity. Levels above 1.9 indicate early resistance; levels above 2.9 indicate significant resistance. A HOMA-B value < 100 indicates β-cell dysfunction.

3 Results

3.1 Demographics

The mean age of the 348 Emirati patients with T2D was 56 years, and the mean duration of diabetes was 14 years. The mean BMI was 31 and the mean age at diagnosis was 42 years. Gender-wise demographic characteristics are shown in Table 1.

thumbnail
Table 1. Demographic characteristics of 348 Emirati T2D patients selected for subtyping of the disease.

https://doi.org/10.1371/journal.pone.0304036.t001

Owing to the long duration and chronicity of T2D, considerable deterioration in the metabolic profiles of selected patients was observed. Of all patients, 90 (26%) had HOMA-IR > 3.0, indicating peripheral insulin resistance, while 140 (40%) had HOMA-B < 100, indicating pancreatic secretion dysfunction. The remaining 118 (34%) exhibited both pathophysiological dysfunctions. The prevalence of comorbidities and complications observed were also high, with hypertension at 62%, peripheral neuropathy at 53%, retinopathy at 33%, and coronary artery disease at 15% (Table 2). In most patients, at least two comorbidities or complications of diabetes were observed.

thumbnail
Table 2. Prevalence of comorbidities and complications of T2D in 348 Emirati patients recruited for subtyping of the disease.

https://doi.org/10.1371/journal.pone.0304036.t002

3.2 Cluster analysis

Results of the cluster analysis of the cohort of 348 Emirati patients with T2D with long-standing disease, are shown in Table 3. No significant differences in cluster results were observed between male and female T2D patients. Therefore, results were reported for the total cohort throughout the manuscript. Five Clusters were identified in this study. The first four matched Ahlqvist et al [11] subgroups. Cluster 1 had severe insulin-resistant diabetes (SIRD) in 8% of patients. Cluster 2 had severe insulin deficient diabetes (SIDD) in 16%. Cluster 3 had mild age-related diabetes (MARD) in 25%. Cluster 4 had mild obesity-related diabetes (MOD) in 21%. A fifth new subtype of mild early onset diabetes (MEOD) was identified in 30% of mostly lean patients.

thumbnail
Table 3. Results of clustering analysis of 348 Emirati T2D patients using IBM SPSS modeler.

https://doi.org/10.1371/journal.pone.0304036.t003

However, there was extensive overlap between clusters. Cluster 1 (SIRD), with a positive average silhouette score, did not significantly overlap with any of the other clusters. The other four clusters, with negative average silhouette scores, seemed to overlap extensively. There were 151 patients (43%) with membership in cluster peaks with no overlap, as confirmed by a Silhouette Index and Bayesian probability of 1.0 (Table 3). The remaining 197 patients (57%) showed extensive overlap between clusters confirmed by a Silhouette Index and Bayesian probability of <1.0 (S1 Table) with individuals appearing in two or more clusters (Fig 1).

thumbnail
Fig 1. Distribution of five clusters, among 348 Emirati T2D patients, with and without overlap: Distribution of patients is shown as percentage of total patients (N = 348) into overlapping clusters in grey, and percentage of patients exclusive to 1 cluster as light blue (Cluster 1- SIRD), red (Cluster 2- SIDD), dark blue (Cluster 3- MARD), green (Cluster 4- MOD), and yellow (Cluster 5- MEOD).

https://doi.org/10.1371/journal.pone.0304036.g001

Principal component analysis (PCA) was used to visualize the dataset and identify the five T2D clusters, as it transformed the dataset into a lower-dimensional space where the clusters were more easily separated (Fig 2). Multinomial logistic regression was used to explain the relationship between predictor variables and categorical outcomes (clusters) and validate clustering results. We identified predictor variables that were significantly associated with the clusters and quantified the strength of these associations. The higher the regression coefficients (B) and the odds ratio (OR), the stronger the contribution to the cluster (Table 4). The model coefficients ranked the importance of the predictor variables in each cluster [S2 Table].

thumbnail
Fig 2. Display of principal component analysis of clusters of 348 Emirati T2D patients: Each circle represents a patient in a cluster.

The colors represent each cluster as light blue (Cluster 1- SIRD), red (Cluster 2- SIDD), dark blue (Cluster 3- MARD), green (Cluster 4- MOD), and yellow (Cluster 5- MEOD).

https://doi.org/10.1371/journal.pone.0304036.g002

thumbnail
Table 4. Estimation of the contribution of the five independent variables (Age at diagnosis, body mass index, fasting insulin, fasting blood glucose and HbA1c) as predictors for T2D clusters using multinomial logistic regression.

https://doi.org/10.1371/journal.pone.0304036.t004

3.3 Cluster characteristics

The pathophysiological characteristics, and laboratory data of the five clusters identified in the cohort of 348 Emirati patients with long-standing T2D is shown in Table 5. Characteristics of 4 clusters matched that of Ahlqvist et al [11] subgroups. Cluster 5 patients had a novel subtype of mild early-onset diabetes (MEOD) in mostly lean patients.

thumbnail
Table 5. The pathophysiological characteristics of T2D subtypes in Emirati patients with long standing disease in Dubai, UAE.

https://doi.org/10.1371/journal.pone.0304036.t005

3.4 Etiological processes governing T2D subtypes

To highlight the major etiological processes governing membership of subtypes, we selected data of patients at the non-overlapping apices of cluster distributions (N = 151), confirmed by a Silhouette Index ≥ 1.0 (Table 6).

thumbnail
Table 6. Data of the major etiological processes governing T2D subtypes in subsets of individuals at the non-overlapping apices of distribution of clusters.

https://doi.org/10.1371/journal.pone.0304036.t006

In Cluster 1 (SIRD), the primary dysfunction was a markedly increased insulin resistance associated with moderate obesity. The patients had a normal insulin secretory capacity and moderately abnormal glucose homeostasis. Most of the patients had peripheral neuropathy (Table 7).

thumbnail
Table 7. Diabetes comorbidities and complications in five T2D subtypes of 151 Emirati patients in the non-overlapping apices of distribution of clusters.

https://doi.org/10.1371/journal.pone.0304036.t007

In Cluster 2 (SIDD), the primary dysfunction was severe insulin secretory deficiency accompanied by high insulin resistance and the highest level of HbA1c. Patients were obese, with the most severe uncontrolled glucose homeostasis and a higher frequency of complications such as retinopathy, peripheral neuropathy, and ischemic heart disease (Table 7).

Cluster 3 (MARD) patients developed diabetes late in life and had the highest mean age at diagnosis. They were overweight and characterized by moderate insulin resistance, normal insulin secretory capacity, and mildly abnormal glucose homeostasis. However, these patients also had nephropathy, peripheral neuropathy, and ischemic heart disease (Table 7).

Patients in Cluster 4 (MOD) had the highest BMI and developed diabetes early in life. They were characterized by moderate insulin resistance but normal insulin secretory capacity, with mildly abnormal glucose homeostasis. However, these patients also had retinopathy and nephropathy (Table 7).

In Cluster 5 (MEOD), a novel T2D subtype, the patients were lean/overweight and developed the disease early in life. They had moderate insulin secretory dysfunction and mild insulin resistance with mildly abnormal glucose homeostasis (Table 5). These patients had retinopathy, peripheral neuropathy, and ischemic heart disease (Table 7).

4 Discussion

We attempted subtyping T2D in 348 Emirati patients using unsupervised soft cluster analysis by Auto Cluster IBM Modeler in the SPSS software, employing five etiological predictor variables: FBG, FSI, BMI, HbA1c and age at diagnosis. Multinomial logistic regression was used to validate the clustering process and to rank the importance of the predictor variables in each cluster. Five clusters were identified; the first four matched Ahlqvist et al [11] subgroups: SIRD, SIDD, MARD, and MOD. A fifth new subtype MEOD was identified in our dataset.

However, there was extensive overlap between clusters. Individuals in the non-overlapping apices of distribution of clusters were identified in only 151/348 patients (43%), with individuals appearing only once in a single cluster. The remaining 197/348 patients (57%) showed varying degrees of overlap, with individuals appearing in two or more clusters. As one lowers the silhouette coefficient values in a sliding scale below 1.0, the degree of overlap decreases and the number of individuals in the non-overlapping apices of clusters increases but their clinical homogeneity and discreetness drops. This extensive degree of overlap has been previously reported by different study groups: the Broad Institute of MIT [14, 17]; the Oxford Center of Diabetes, UK [15] and the Exeter Research Group, UK [16].

Ahlqvist et al. [11] employed a data-driven, unsupervised hard clustering method to identify mutually exclusive patient subgroups within large, newly diagnosed T2D cohorts. Their subtyping scheme, although replicated in multiple studies [913, 1824]; including our current investigation, has been challenged by soft, unsupervised clustering techniques that revealed overlapping and alternative subgroupings [1417]. Both approaches rely on continuous (non-discrete) clinical characteristics. Unlike discrete data with distinct categories, continuous data exists on a spectrum, hindering the definition of clear-cut cluster boundaries. Overlapping clusters inherently arise with such data, challenging the traditional concept of distinct, well-separated patient groups. Fuzzy boundaries further complicate cluster interpretation and labeling [1417]. Hard unsupervised methods like Ahlqvist’s, rely on pre-defined subtypes, potentially overlooking unseen biological variations or dynamic processes. Conversely, soft unsupervised clustering avoids preconceived notions, potentially uncovering the true underlying data structure and heterogeneity. But, overlapping clusters, as observed in healthcare domains, can impede clinical decision-making due to ambiguous patient group assignment [3133].

Previously, most T2D subtyping studies recruited newly diagnosed patients [917]. In contrast, in the present study, we tested the feasibility of clinical subtyping in T2D patients with long standing disease. The mean age of the patients was 56 years with mean diabetes duration exceeding 14 years. They were mostly obese, had various premorbid conditions, and had developed various complications of diabetes. In all five subtypes, the combination of the basic etiological dysfunction could still be identified despite co-morbidities and complications of the disease with advancing age. Our results agree with several studies where subtyping of T2D in long-standing disease have been successfully performed [3, 2123, 26]. Despite temporal changes in lifestyle and environmental exposure causing decline in β-cell function and/or worsening of insulin resistance with increased frequency of complications, subtyping of long standing T2D is not obscured [21, 22, 2426]. This is probably due to genetically determined factors that do not change over a lifetime.

To highlight the major etiological processes governing T2D subtypes, individuals in the non-overlapping apices of the cluster distribution (151/348) were selected to identify the major etiological determinants of a subtype. The four T2D subtypes, SIRD, SIDD, MARD, and MOD, which were identified in this cohort with long-standing disease, were mapped back to the four subtypes of newly diagnosed diabetes patients by Ahlqvist et al [11] in the Scania (ANDIS) study. Patients with SIRD and SIDD suffered severe abnormal glucose homeostasis, whereas patients with MARD and MOD had mild disease. The fifth type is a novel subtype of mild early onset T2D (MEOD) in mostly lean individuals.

In patients with SIRD (Cluster 1), the identified etiological dysfunction was severe peripheral insulin resistance. This is similar to the SIRD in the ANDIS study [11] and Group C in the IMI DIRECT study [15]. In patients with SIDD (Cluster 2), the identified etiological dysfunction was severe β-cell dysfunction, heightened by moderate/severe insulin resistance [29]. This subtype is similar to the SIDD described in the ANDIS Study (11) and the global archetype D, which had the worst glucose control, in the IMI-DIRECT study (15). In previous studies some patients with early-onset T2D had worse clinical outcomes and are at higher risk of stroke and myocardial infarction [34]. Patients with SIDD had early-onset and displayed characteristics of severe diabetes. Interestingly, the average age of onset for this subgroup (37.9 years) was comparable to the mild early-onset diabetes (MEOD) subgroup (36.5 years). However, the MEOD subgroup exhibited a distinctly milder clinical profile compared to the SIDD subgroup.

In patients with MARD (Cluster 3), the identified etiological dysfunction appeared to be mild peripheral resistance to insulin owing to advancing age. This subtype is similar to the MARD described in the ANDIS Study [11] and archetype A described in the IMI-DIRECT study [15]. In patients with MOD (Cluster 4), the identified etiological dysfunction was an obesity-driven peripheral resistance to insulin. This subtype is similar to the MOD described in the ANDIS Study [11] and archetype C described in the IMI-DIRECT study [15]. MEOD (Cluster 5) is a novel subtype that has not been previously reported in earlier studies. The patients were mostly lean, had early onset disease, mild/moderate β-cell dysfunction, and mild insulin resistance. This subtype of T2D has been previously identified in non-Caucasian ethnic groups in developing countries [3537]. It is not surprising, therefore, that this cluster was identified among the Emiratis.

In summary, only the severe SIRD subtype appeared to be an independent disease entity. The statistical properties and clinical characteristics of the patients are distinct. Membership of most patients is restricted to this subtype and does not overlap with that of other subtypes. The next highest probability of being a distinct entity is the severe SIDD subtype. The other three mild subtypes, MARD, MORD, and MEOD did not qualify as independent disease entities. They exhibited an extensive overlap in subtype membership and high heterogeneity in their clinical characteristics.

The main aim of clustering is to identify patient subtypes with similar characteristics within a larger group of individuals with T2D, to enable clinicians to gain insights into the mechanisms of disease development and progression. This can potentially lead to personalized clinical management and improved patient outcomes. However, in our study, as in some other subtyping studies, it has been recognized that clustering based on continuous variables does not result in mutually exclusive subtypes [1417, 32]. Therefore, integrating simplified, personalized metabolic profiles with clustering holds greater promise for guiding clinical decisions than subtyping alone [8, 32]. As per our results, the T2D specific phenotype profile: age at diagnosis, BMI, FBG, HbA1c, HOMA-B, and HOMA-IR could predict specific outcome for individual patients:

  • Age at T2D diagnosis: Young age indicates a strong genetic predisposition. The younger the age at diagnosis, the more severe the disease and the higher the risk of complications. The older the age at diagnosis, the milder the disease [3].
  • BMI: The higher the BMI, the higher the peripheral resistance to insulin, the more severe the disease, and the higher the risk of complications [38].
  • FBG: The higher the Impaired Fasting Glucose level, the higher the hepatic insulin resistance and hepatic glucose production [1, 3942].
  • HbA1c: The higher the HbA1c, the greater the disease severity (2).
  • HOMA-B: The lower the HOMA-B score, the more severe the β-cell dysfunction [30].
  • HOMA-IR: The higher the HOMA-IR, the higher the whole-body peripheral resistance to insulin [30].

Our study has some limitations. We used FBG and FSI in the exploratory phase where cluster analysis was performed and HOMA indices in the explanatory phase where cluster outcomes and characteristics were identified. We acknowledge the presence of moderate collinearity between the HOMA indices used in the explanatory phase. This can lead to difficulty in separating the true effect of insulin resistance from the effect of insulin secretion. To mitigate these effects, we used principal component analysis introducing new uncorrelated variables from the original set. It is also important to note that the computer models generating these indices incorporate additional parameters such as glucagon secretion and liver glucose production. Furthermore, these indices continue to be used in diabetes research for the lack of perfect alternatives. We used them only for broader trends and cluster identification and not for precise rigorous measurements.

We also used FBG and FSI in the exploratory phase of analysis which is less concerned with individual variable effects. It focuses on pattern identification and data relationships without trying to isolate the impact of specific variables on an outcome. They are about finding the underlying structure, not establishing causal links. It does not suffer from collinearity of the variables used. Therefore, while the same parameters were used in both exploratory and explanatory techniques, collinearity is only a concern within the explanatory models and not between the two phases of analysis.

The strength of this study is in confirming that T2D subtyping can be performed at any stage of the disease. This provides insight into the stability and evolution of clusters. Although the number of patients is small, the study provided proof-of-principle that soft, unsupervised clustering techniques reveal overlapping subgroupings of T2D and uncover further aspects of heterogeneity of T2D [43]. Because this was a retrospective study, we relied on existing data collected from medical records and patient interviews, leading to a potential recall bias or missing information. The small number of participants, broad inclusion criteria, and potential bias in data selection may limit the generalizability of the findings. The long duration of illness and unmeasured or unknown confounders, such as diabetes complications and drug responses, make it difficult to establish a clear temporal relationship between exposure and outcome. Yet, despite all that noise, clusters of T2D could be identified.

5 Conclusions

Despite the complex picture of long-standing T2D with comorbidities, complications and varied therapy, our study demonstrates the feasibility of identifying subtypes and their underlying causes. Five clusters were identified: the first four matched Ahlqvist et al [11] subgroups. Two subtypes were characterized by severe disease and two by mild disease. A fifth novel subtype, identified among mostly lean individuals is usually seen in non-white populations. While clustering provides valuable insights into the architecture of T2D subtypes, its application to individual patient management would remain limited due to overlapping characteristics. Therefore, integrating simplified, personalized metabolic profiles with clustering holds greater promise for guiding clinical decisions than subtyping alone. Future studies on the pathogenesis of subtypes and the prognosis of drug therapy are needed. Further longitudinal investigations are also required to clarify subtype stability over time, elucidate the factors influencing transitions between subtypes, and translate these findings into concrete clinical applications.

Supporting information

S1 Checklist. Human participants research checklist.

https://doi.org/10.1371/journal.pone.0304036.s001

(PDF)

S1 Table. Heat map of probabilities of cluster membership showing overlap between clusters.

A Silhouette Index of 1.0 indicates no overlap and <1.0 indicates overlap between clusters.

https://doi.org/10.1371/journal.pone.0304036.s002

(DOCX)

S2 Table. The contribution of the five primary independent variables (Age at diagnosis, BMI, fasting insulin, Fasting Blood Glucose (FBG) and HbA1c) as predictors for T2D clusters using multinomial logistic regression.

https://doi.org/10.1371/journal.pone.0304036.s003

(DOCX)

Acknowledgments

We are indebted to the Mohammed Bin Rashid University of Medicine and Health Sciences for their support of our research and to Dubai Academic Health Corporation for granting access to patients and their records. The contributions of the medical and nursing staff at Dubai Diabetes Centre and Dubai Hospital are highly appreciated.

References

  1. 1. Faerch K, Hulmán A, Solomon TP. Heterogeneity of Pre-diabetes and Type 2 Diabetes: Implications for Prediction, Prevention and Treatment Responsiveness. Curr Diabetes Rev. 2016; 12: 30–41. pmid:25877695.
  2. 2. American Diabetes Association. Standards of Medical Care in Diabetes-2022 Abridged for Primary Care Providers. Clin Diabetes. 2022; 40: 10–38. pmid:35221470.
  3. 3. Bancks MP, Casanova R, Gregg EW, Bertoni AG. Epidemiology of diabetes phenotypes and prevalent cardiovascular risk factors and diabetes complications in the National Health and Nutrition Examination Survey 2003–2014. Diabetes Res Clin Pract. 2019; 158:107915. pmid:31704094.
  4. 4. Mathews AE, Mathews CE. Inherited β-cell dysfunction in lean individuals with type 2 diabetes. Diabetes. 2012; 61:1659–1660. pmid:22723272.
  5. 5. Yaghootkar H, Scott RA, White CC, Zhang W, Speliotes E, Munroe PB, et al. Genetic evidence for a normal-weight "metabolically obese" phenotype linking insulin resistance, hypertension, coronary artery disease, and type 2 diabetes. Diabetes. 2014; 63:4369–4377. pmid:25048195.
  6. 6. Stefan N, Fritsche A, Schick F, Häring HU. Phenotypes of prediabetes and stratification of cardiometabolic risk. Lancet Diabetes Endocrinol. 2016; 4: 789–798. pmid:27185609.
  7. 7. Williams DM, Jones H, Stephens JW. Personalized Type 2 Diabetes Management: An Update on Recent Advances and Recommendations. Diabetes Metab Syndr Obes. 2022; 15: 281–295. pmid:35153495.
  8. 8. McCarthy MI. Painting a new picture of personalised medicine for diabetes. Diabetologia. 2017; 60: 793–799. pmid:28175964.
  9. 9. Misra S, Wagner R, Ozkan B, Schön M, Sevilla-Gonzalez M, Prystupa K, et al. Precision subclassification of type 2 diabetes: a systematic review. Commun Med (Lond). 2023; 3: 138. pmid:37798471.
  10. 10. Li L, Cheng WY, Glicksberg BS, Gottesman O, Tamler R, Chen R, et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci Transl Med. 2015; 28: 7(311):311ra174. pmid:26511511.
  11. 11. Ahlqvist E, Storm P, Käräjämäki A, Martinell M, Dorkhan M, Carlsson A, et al. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol 2018; 6: 361–369. pmid:29503172.
  12. 12. Slieker RC, Donnelly LA, Fitipaldi H, Bouland GA, Giordano GN, Åkerlund M, et al. Replication and cross-validation of type 2 diabetes subtypes based on clinical variables: an IMI-RHAPSODY study. Diabetologia 2021; 64:1982–1989. pmid:34110439.
  13. 13. Christensen DH, Nicolaisen SK, Ahlqvist E, Stidsen JV, Nielsen JS, Hojlund K, et al. Type 2 diabetes classification: a data-driven cluster study of the Danish Centre for Strategic Research in Type 2 Diabetes (DD2) cohort. BMJ Open Diabetes Res Care 2022; 10: e002731. pmid:35428673.
  14. 14. Udler MS, Kim J, von Grotthuss M, Bonàs-Guarch S, Cole JB, Chiou J, et al. Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis. PLOS Med 2018; 15: e1002654. pmid:30240442.
  15. 15. Wesolowska-Andersen A, Brorsson CA, Bizzotto R, Mari A, Tura A, Koivula R, et al. Four groups of type 2 diabetes contribute to the etiological and clinical heterogeneity in newly diagnosed individuals: an IMI DIRECT study. Cell Rep Med 2022; 3: 100477. pmid:35106505.
  16. 16. Nair ATN, Wesolowska-Andersen A, Brorsson C, Rajendrakumar AL, Hapca S, Gan S, et al. Heterogeneity in phenotype, disease progression and drug response in type 2 diabetes. Nat Med 2022; 28: 982–988. pmid:35534565.
  17. 17. Kim H, Westerman KE, Smith K, Chiou J, Cole JB, Majarian T, et al. High-throughput genetic clustering of type 2 diabetes loci reveals heterogeneous mechanistic pathways of metabolic disease. Diabetologia. 2023; 66: 495–507. pmid:36538063.
  18. 18. Anjana RM, Pradeepa R, Unnikrishnan R, Tiwaskar M, Aravind SR, Saboo B, et al. New and unique clusters of Type 2 diabetes identified in Indians. J Assoc Physicians India 2021; 69: 58–61. pmid:33527813.
  19. 19. Zaghlool SB, Halama A, Stephan N, Gudmundsdottir V, Gudnason V, Jennings LL, et al. Metabolic and proteomic signatures of type 2 diabetes subtypes in an Arab population. Nat Commun 2022; 13: 7121. pmid:36402758.
  20. 20. Abdul-Ghani T, Puckett C, Migahid O, Abdelgani S, Migahed A, Adams J, et al. Type 2 diabetes subgroups and response to glucose-lowering therapy: results from the EDICT and Qatar studies. Diabetes Obes Metab 2022; 24: 1810–1818. pmid:35581905.
  21. 21. Xue Q, Li X, Wang X, Ma H, Heianza Y, Qi L. Subtypes of Type 2 diabetes and incident cardiovascular disease risk: UK Biobank and all of us cohorts. Mayo Clin Proc 2023; 98: 1192–1204. pmid:37422735.
  22. 22. Hwang YC, Ahn HY, Jun JE, Jeong IK, Ahn KJ, Chung HY. Subtypes of type 2 diabetes and their association with outcomes in Korean adults—A cluster analysis of community-based prospective cohort. Metabolism. 2023; 141:155514. pmid:36746321.
  23. 23. Zaharia OP, Strassburger K, Strom A, Bönhof GJ, Karusheva Y, Antoniou S, et al. Risk of diabetes-associated diseases in subgroups of patients with recent-onset diabetes: a 5-year follow-up study. Lancet Diabetes Endocrinol 2019; 7: 684–694. pmid:31345776.
  24. 24. Xing L, Peng F, Liang Q, Dai X, Ren J, Wu H, et al. Clinical Characteristics and Risk of Diabetic Complications in Data-Driven Clusters Among Type 2 Diabetes. Front Endocrinol (Lausanne). 2021; 12: 617628. pmid:34276555.
  25. 25. Fonseca VA. Defining and characterizing the progression of type 2 diabetes. Diabetes Care. 2009;32 Suppl 2: S151–156. pmid:19875543.
  26. 26. Imamura F, Mukamal KJ, Meigs JB, Luchsinger JA, Ix JH, Siscovick DS, et al. Risk factors for type 2 diabetes mellitus preceded by β-cell dysfunction, insulin resistance, or both in older adults: the cardiovascular Health Study. Am J Epidemiol 2013; 177: 1418–1429. pmid:23707958.
  27. 27. Halban PA, Polonsky KS, Bowden DW, Hawkins MA, Ling C, Mather KJ, et al. β-cell failure in type 2 diabetes: postulated mechanisms and prospects for prevention and treatment. Diabetes Care 2014; 37:1751–1758. pmid:24812433.
  28. 28. Wysham C, Shubrook J. Beta-cell failure in type 2 diabetes: mechanisms, markers, and clinical implications. Postgrad Med 2020; 132: 676–686. pmid:32543261.
  29. 29. Ikegami H, Babaya N, Noso S. β-Cell failure in diabetes: Common susceptibility and mechanisms shared between type 1 and type 2 diabetes. J Diabetes Investig. 2021; 12: 1526–1539. pmid:33993642.
  30. 30. Matthews DR, Hosker JP, Rudenski AS, Naylor BA, Treacher DF, Turner RC. Homeostasis model assessment: insulin resistance and beta-cell function from fasting plasma glucose and insulin concentrations in man. Diabetologia. 1985; 28: 412–419. pmid:3899825.
  31. 31. van Smeden M, Harrell FE Jr, Dahly DL. Novel diabetes subgroups. Lancet Diabetes Endocrinol. 2018; 6: 439–440. pmid:29803262.
  32. 32. Dennis JM, Shields BM, Henley WE, Jones AG, Hattersley AT. Disease progression and treatment response in data-driven subgroups of type 2 diabetes compared with models based on simple clinical features: an analysis using clinical trial data. Lancet Diabetes Endocrinol 2019; 7: 442–451. pmid:31047901.
  33. 33. Dennis JM. Precision Medicine in Type 2 Diabetes: Using Individualized Prediction Models to Optimize Selection of Treatment. Diabetes. 2020; 69: 2075–2085. pmid:32843566.
  34. 34. Hillier TA, Pedula KL. Complications in young adults with early-onset type 2 diabetes: losing the relative protection of youth. Diabetes Care. 2003; 26: 2999–3005. pmid:14578230.
  35. 35. George AM, Jacob AG, Fogelfeld L. Lean diabetes mellitus: An emerging entity in the era of obesity. World J Diabetes. 2015; 6: 613–620. pmid:25987958.
  36. 36. Yaghootkar H, Whitcher B, Bell JD, Thomas EL. Ethnic differences in adiposity and diabetes risk—insights from genetic studies. J Intern Med 2020; 288: 271–283. pmid:32367627.
  37. 37. Salvatore T, Galiero R, Caturano A, Rinaldi L, Criscuolo L, Di Martino A, et al. Current knowledge on the pathophysiology of lean/normal-weight Type 2 diabetes. Int J Mol Sci 2022; 24: 658. pmid:36614099.
  38. 38. UK Prospective Diabetes Study (UKPDS) Group. Effect of intensive blood-glucose control with metformin on complications in overweight patients with type 2 diabetes (UKPDS 34). UK Prospective Diabetes Study (UKPDS) Group. Lancet 1998; 352: 854–865. pmid:9742977.
  39. 39. Bayoumi RAL, Khamis AH, Tahlak MA, Elgergawi TF, Harb DK, Hazari KS, et al. Utility of oral glucose tolerance test in predicting type 2 diabetes following gestational diabetes: towards personalized care. World J Diabetes 2021; 12: 1778–1788. pmid:34754378.
  40. 40. Ha J, Sherman A. Type 2 diabetes: one disease, many pathways. Am J Physiol Endocrinol Metab 2020; 319: E410–E426. pmid:32663101.
  41. 41. Abdul-Ghani MA, Jenkinson CP, Richardson DK, Tripathy D, DeFronzo RA. Insulin secretion and action in subjects with impaired fasting glucose and impaired glucose tolerance: results from the Veterans Administration Genetic Epidemiology Study. Diabetes 2006; 55: 1430–1435. pmid:16644701.
  42. 42. Knowler WC, Barrett-Connor E, Fowler SE, Hamman RF, Lachin JM, Walker EA, et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med 2002; 346: 393–403. pmid:11832527.
  43. 43. Bowman P, Flanagan SE, Hattersley AT. Future roadmaps for precision medicine applied to diabetes: rising to the challenge of heterogeneity. J Diabetes Res 2018; 2018: 3061620. pmid:30599002.