Development and multi-cohort validation of a clinical score for predicting type 2 diabetes mellitus

Background and aims Many countries lack resources to identify patients at risk of developing Type 2 diabetes mellitus (diabetes). We aimed to develop and validate a diabetes risk score based on easily accessible clinical data. Methods Prospective study including 5277 participants (55.0% women, 51.8±10.5 years) free of diabetes at baseline. Comparison with two other published diabetes risk scores (Balkau and Kahn clinical, respectively 5 and 8 variables) and validation on three cohorts (Europe, Iran and Mexico) was performed. Results After a mean follow-up of 10.9 years, 405 participants (7.7%) developed diabetes. Our score was based on age, gender, waist circumference, diabetes family history, hypertension and physical activity. The area under the curve (AUC) was 0.772 for our score, vs. 0.748 (p<0.001) and 0.774 (p = 0.668) for the other two. Using a 13-point threshold, sensitivity, specificity, positive and negative predictive values (95% CI) of our score were 60.5 (55.5–65.3), 77.1 (75.8–78.2), 18.0 (16.0–20.1) and 95.9 (95.2–96.5) percent, respectively. Our score performed equally well or better than the other two in the Iranian [AUC 0.542 vs. 0.564 (p = 0.476) and 0.513 (p = 0.300)] and Mexican [AUC 0.791 vs. 0.672 (p<0.001) and 0.778 (p = 0.575)] cohorts. In the European cohort, it performed similarly to the Balkau score but worse than the Kahn clinical [AUC 0.788 vs. 0.793 (p = 0.091) and 0.816 (p<0.001)]. Diagnostic capacity of our score was better than the Balkau score and comparable to the Kahn clinical one. Conclusion Our clinically-based score shows encouraging results compared to other scores and can be used in populations with differing diabetes prevalence.


Introduction
Diabetes mellitus is an important cause of morbidity, mortality and costs [1]. According to the NCD Risk factor Collaboration, the number of adults with diabetes worldwide increased from 108 million in 1980 to 422 million in 2014 [1]. Most new cases of diabetes will occur in low and middle-income countries, mainly due to the escalating prevalence of adiposity, rapidly changing dietary and physical activity behaviors, and lack of or late identification of people at risk of diabetes. Indeed, in low-income countries such as India, the prevalence of diabetes could be as high as 19.9% [2].
Early identification of people at risk of diabetes is paramount for adequate prevention by changes in lifestyle, and, if necessary, complemented by treatment. Thus, it is important to have an easily obtainable, inexpensive and reliable diabetes risk score. A review conducted in 2011 [3] identified as many as 145 diabetes risk models or scores and suggested that this number increases monthly. Among the 94 risk prediction models studied, 40 were based on biological variables [3]. Inclusion of biological variables in diabetes risk scores has a dual effect: it slightly improves the scores' performances [4], but it also increases costs (staff and laboratory) and time (blood sampling, waiting for results). In a previous study, we have shown that the use of a diabetes risk score including biological variables cost an additional US $ 12.02 per patient compared to a score based on clinical data only [5]. Hence, in countries most affected by the current diabetes epidemic, the use of diabetes risk scores including biological variables may not be possible due to financial or laboratory constraints.
Thus, our first aim was to derive a diabetes risk score based solely on easily obtainable clinical data, and to validate it in a similar European population by comparing it with existing clinical scores. Our second aim was to examine the validity of our score regarding clinical utility and application on populations with different diabetes and obesity prevalence, and lesser economical means, using two different cohorts from Iran and Mexico.

Sampling procedure
The CoLaus/PsyCoLaus study is a prospective population-based study intended to evaluate the prevalence and determinants of cardiovascular disease in the population of Lausanne, Switzerland. Details of the sampling procedure have been previously documented [6] and can be accessed online (www.colaus-psycolaus.ch). The source population was all adults aged 35 to 75 years in the Lausanne population register. A simple non-stratified random sample of 35% of the source population was drawn and an invitation letter sent. If the latter went unanswered, a second letter was sent and, if unanswered, several phone calls were performed. Recruitment began in June 2003 and ended in May 2006, enrolling 6733 participants who underwent an interview, a physical exam, and a blood analysis. The first follow-up was performed between April 2009 and September 2012, a mean of 5.6 years after the collection of baseline data. The

Competing interests:
The baseline CoLaus/ PsyCoLaus study (2003)(2004)(2005)(2006) was supported by an unrestricted research grant from GlaxoSmithKline. The funding source had no involvement in the study design, data collection, analysis and interpretation, writing of the report, or decision to submit the article for publication. This does not alter our adherence to PLoS One policies on sharing data and materials. The Tlalpan 2020 study received funding from AstraZeneca Mexico (no grant number). The funding source had no involvement in our study design, data collection, analysis and interpretation, writing of the report or decision to submit the article for publication. This does not alter our adherence to PLoS One policies on sharing data and materials.
second follow-up was performed between May 2014 and July 2016, a mean of 10.9 years after the collection of baseline data.

Clinical and biological data
All participants were examined in the morning after a fast of at least 8 hours. They were probed about their personal and family history of cardiovascular disease and cardiovascular risk factors. All prescribed and over-the-counter medicines were collected via questionnaire. Smoking status was categorized as never, former (irrespective of the time since quitting) and current (irrespective of the amount smoked). Educational level was categorized as low (primary), middle (apprenticeship), upper middle (high school), and high (university) for highest completed level of education. Physical activity was defined by exercising at least twice per week for at least 20 minutes per session. Body weight and height were measured with participants barefoot and in light indoor clothes. Body weight was measured in kilograms to the nearest 100 g using a Seca scale (Hamburg, Germany). Height was measured to the nearest 5 mm using a Seca (Hamburg, Germany) height gauge. Waist circumference was measured mid-way between the lowest rib and the iliac crest using a non-stretchable tape and the mean of two measurements was taken. Blood pressure (BP) was measured using an Omron HEM-907 automated oscillometric sphygmomanometer after at least a 10-minute rest in a seated position, and the mean of the last two measurements was used. Hypertension was defined by a SBP �130 mm Hg or a DBP �85 mm Hg or presence of antihypertensive drug treatment. Based on the review by Noble et al., the 130/85 mmHg threshold was preferred to the 140/90 mmHg one. High resting heart rate was defined by � 68 beats per minute in men and �70 beats per minute in women.

Incident diabetes mellitus
Two definitions of incident diabetes were used: 1) fasting glucose level �7 mmol/L and/or presence of an oral antidiabetic or insulin treatment, and 2) an HbA1c�6.5% (48 mmol/mol) and/or presence of an oral antidiabetic or insulin treatment. As HbA1c was assessed only in the last follow-up, analyses were restricted to participants who attended the second follow-up.

Other clinically-based diabetes mellitus risk scores for comparison
Two diabetes risk scores based solely on clinical data were considered: the score by Balkau et al. [7] derived from a French population and the clinical score by Kahn et al. [8] derived from a United States population (S1 Table). The score by Balkau et al. is based on five variables, and the score by Kahn et al. is based on eight variables. Both scores had been tested previously in our cohort [5].

Inclusion and exclusion criteria
The original inclusion criteria into the CoLaus/PsyCoLaus Study were: 1) written informed consent; 2) willingness to take part in the examination and to provide blood samples; 3) French language ability. For this study, the following exclusion criteria were applied: 1) participants with type 1 and 2 diabetes at baseline; 2) no follow-up and 3) missing data for calculation of scores.

Statistical analysis
Statistical analyses were conducted using Stata version 15.1 for Windows (Stata Corp, College Station, Texas, USA). Participants characteristics were expressed as number (percentage) for categorical variables or as mean±standard deviation for continuous variables. Between-group comparisons were performed using chi-square or Fisher's exact test for categorical variables and student's t-test for continuous variables. Multivariate analysis was conducted using logistic regression; to facilitate future scoring, all continuous variables (i.e. age, BMI, waist. . .) were categorized. Goodness of fit was assessed using the Hosmer-Lemeshow test with 10 categories; model quality was estimated using the Akaike and the Bayesian information criteria (AIC and BIC, respectively). For anthropometric data, models including only one of each parameter were computed, and the parameter providing the highest percentage of variance explained (pseudo-R 2 ) was selected. The diabetes risk score was created taking into account the contribution of each variable significantly associated with diabetes in the logistic model. The score was built as the sum of assigned points, defined as the OR rounded to the nearest integer, as performed in another setting [9]. The best threshold to define a high risk of diabetes was based after visual examination of the graphs displaying the values of sensitivity, specificity, positive and negative predictive values according to the score values, priority being given to a high specificity and a high negative predictive value.
The diagnostic capacity of the different scores was assessed by the AUC [area under the ROC (receiver operating characteristic) curve] and corresponding 95% confidence intervals (CI). Comparisons between scores were performed using the roccomp command of Stata. Sensitivity, specificity, positive and negative predictive values and their corresponding 95% CIs were computed using incident diabetes (definition 1) as gold standard. The number needed to screen (NNS) to detect one case of diabetes was computed as the number of detected diabetes cases (i.e. true positives) divided by the total number of participants screened. Statistical significance was assessed for a two-sided test with p<0.05.
As women with gestational diabetes are at higher risk of developing diabetes type 2, a last sensitivity analysis was performed after excluding women with personal history of gestational diabetes.

External validation cohorts
The performance of our diabetes risk score relative to two other clinically based diabetes risk scores was assessed in three cohorts. The European cohort included data from four countries (France, Germany, Netherlands and UK) of the EPIC-Europe cohort study [10]; these four countries were included because all relevant variables were available. Incident diabetes was defined using multiple sources of evidence including self-report, linkage to primary-care registers, secondary-care registers, medication use, hospital admissions and mortality data [11]. Tlalpan 2020 is a cohort of participating Mexico City residents recruited through promotional strategies in Mexico City, Mexico [12]; incident diabetes was defined as a fasting plasma glucose �7 mmol/L. The Shahedieh cohort study included data from a population-based survey in the province of Yazd, Iran [13] where incident diabetes was defined as a fasting plasma glucose �7 mmol/L.
We report our results according to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) statement [14].

Ethical considerations
The institutional Ethics Committee of the University of Lausanne, which afterwards became the Ethics Commission of Canton Vaud (http://www.cer-vd.ch) approved the baseline CoLaus/PsyCoLaus study (reference 16 All studies were performed in agreement with the Helsinki declaration and its former amendments. All participants gave their signed informed consent before entering the studies.

Characteristics of participants
Of the initial 6733 patients, 21.6% were excluded, leaving 5277 participants (78.4%) for analysis. The reasons for exclusion are summarized in Fig 1 and the comparison between excluded and included participants is provided in S2 Table. Excluded participants were older, with higher BMI and waist circumference. They were also more frequently men, sedentary, of lower educational level, former or current smokers, with higher alcohol consumption, with a personal history of CVD, hypertension and lipid lowering drugs, and with a parental or family history of diabetes.

Incidence of diabetes and score components
Between baseline and second follow-up, 405 participants (7.7%) developed diabetes as based on fasting plasma glucose. The bivariate comparison of 19 candidate variables between participants who developed and who remained free of diabetes is summarized in Table 1. In both genders, participants who developed diabetes were older, with higher BMI, waist, waist to height and waist to hip ratios. Participants who developed diabetes were also of lower educational level, had a higher frequency of hypertension, lipid lowering drugs, personal history of CVD and family history of diabetes, and a lower frequency of leisure-time physical activity ( Table 1).
The variables significantly associated with incident diabetes on bivariate analysis were introduced in a multivariable logistic regression model. Based on the results of the logistic regression and the percentage of variance explained, the following variables were selected for the diabetes risk score: gender, age, waist circumference, hypertension, family history of diabetes, and physical activity. The scoring system is provided in Table 2. First, we developed a separate scoring system for men and women. Nevertheless, as we wanted to keep the score as simple as possible, we developed a final scoring, applicable to either sex. The odd ratios enabling the scores are provided in S3 Table. Waist was the most important determinant of incident diabetes, while no increasing effect of age categories were found. The threshold of 13 points provided the best combination of sensitivity and specificity.
The comparison between our scoring system and the other diabetes risk scores for the CoLaus/PsyCoLaus study is provided in Table 3. Based on the AUC; our score performed better than the score by Balkau et al., while no differences were found with the score by Kahn et al. Similar findings were observed when the analysis was split by gender. The results of the diagnostic capacity of the different scores, overall and stratified by gender are provided in Table 4. Compared to the score by Balkau et al., our score had a higher sensitivity and negative predictive value, but a lower specificity. Compared to the score by Kahn et al., our score had comparable diagnostic capacities, and a slightly better specificity (Table 4). Similar findings were observed when the analysis was split by gender ( Table 4). The goodness of fit and information criteria are provided in S4 Table.   Table), and similar findings were observed for the diagnostic capacity (S6 Table). Sensitivity analysis after excluding 30 women with personal history of gestational diabetes led to similar findings (S7 and S8 Tables).

External validation cohorts
The characteristics of the different external validation cohorts are summarized in S9-S11 Tables. The results of our diabetes risk score compared with the other two clinically based diabetes risk scores for each of the three cohorts (European, Tlalpan 2020 and Shahedieh) are summarized in S12 and S13 Tables. Based on the AUC, our score performed better than both scores in the Tlalpan 2020 cohort, better than Kahn et al. on the Shahedieh cohort but performed less well in the European cohort (S12 Table). Our score had a better sensitivity and negative predictive value and a lower specificity and positive predictive value than the score of Balkau et al; the diagnostic capacity was similar to the score by Kahn et al. (S13 Table).

Discussion
Our score provides an easy way of screening people at risk of developing diabetes; further, and contrary to many other scores [3], it was replicated in other cohorts in Europe and in two developing countries.

Variables in the model
A previous systematic review identified 29 clinical (i.e. non biological) variables associated with incident diabetes [3]. In this study, we were able to assess the predictive capacity of 19 of them, and six were selected for the final model. Age was included in the model, as it is in most risk assessment models [8,[15][16][17]. The risk of diabetes increases with age [18], although in our model no such increase in risk was found. A possible explanation is that other factors such as waist or physical inactivity also increase with age [19,20], thus cancelling the age-specific increase in diabetes risk.
Gender was also included in the model, as in other scores [7,17]. Indeed, male sex is associated with a higher risk of diabetes independently of other risk factors [21].
Waist circumference was the obesity measure selected for our score, as it is also used in many other scores [7,8,15,16]. Waist circumference was by far the strongest variable in our score, a finding in agreement with the literature, where abdominal obesity has been found to be the strongest adiposity determinant of diabetes [22,23]. Importantly, the sole presence of a high waist circumference was enough to consider a subject as at high risk of diabetes, suggesting that waist measurement could already provide important information regarding the risk of diabetes, as it is the case for other populations such as in India [24], or Brazil [25].
Hypertension was included in our score, as it was in many other diabetes risk scores [7,8,[15][16][17]. Hypertension is known to be associated with development of diabetes [26][27][28]. For our score, we defined hypertension as a SBP�130 mm Hg or DBP �85 mm Hg or the presence of antihypertensive drug treatment, but different possible definitions (measured; anamnestic; hypertension medication) have been used [7,8,[15][16][17]. We chose to use measured hypertension in our score as this condition is also very prevalent in the general population [29] and is a major risk factor for cardiovascular disease [30]. Hence, assessing risk of diabetes using our score would also help to detect (and manage) hypertension.
Family history of diabetes is generally easy to obtain and was included in our score, as it is in most scores [7,8,15,17]. The presence of a positive family history underlines a genetic component to diabetes but can also reflect the lifestyle or the environmental conditions people were used to during their upbringing [31].
Lack of physical activity was included in the score, as it is also in some other scores. [15,16]. Physical activity is hard to categorize using a standard procedure; indeed, in this study, it was not possible to obtain a homogeneous definition for physical activity within all cohorts, see S14 Table [32,33]. Still, irrespective of this limitation, the results were quite similar between cohorts. Hence, our results suggest that even a bold definition of physical activity may suffice to predict the risk of diabetes. This should encourage people to be more active because it underlines that even a small amount of physical activity is better than nothing [34].

Comparison with other scores
When using CoLaus/PsyCoLaus data, the AUC of our score performed better than the one by Balkau et al. and similarly to Kahn clinic. Our score's sensitivity was higher than Balkau's and similar to Kahn's clinical score. Our specificity was inferior to Balkau's but similar to Kahn's clinical score. The positive and negative predictive values were similar to Kahn's. Although the score by Balkau et al. showed the highest specificity among all cohorts, it led to a considerable underestimation of the prevalence of subjects at risk. For instance, in Iran, the prevalence of subjects at risk according to the score of Balkau et al. was considerably lower than the reported prevalence of diabetes (11.37%) [35].
Overall, our results suggest that our score performs equally well or even better than existing ones. Importantly, the number needed to screen was considerably lower than obtained using the score by Balkau et al., and comparable to the score by Kahn et al. This has important consequences for screening, as is suggests that, for a given number of people screened, a higher number of people who will develop diabetes will be detected.

Strengths and limitations
The main strength of our study is that the score was replicated on three cohorts from different continents with contrasting diabetes prevalence [36] (S15 Table), a control seldom performed for other scores [3]. Indeed, our score's AUC in these cohorts is comparable or better than Balkau et al.'s and Kahn et al.'s. Secondly, our score performs similarly to Kahn's clinical score, but with a smaller number of variables. Importantly, the previous review indicated that many models and scores were not used because they required tests not routinely available or were developed without a specific user in mind [3]. Our score overcomes those two limitations, as it is based on easily accessible data, which could be collected by non-medical professionals. Hence, it could be implemented worldwide with little effort. A third strength is that obesity prevalence in women was different in each replication cohort, which reinforces our score. In men, however, average waist circumference was very similar among the cohorts. Fourthly, although part of the type 2 diabetes epidemic is due to environmental factors, our score is important on a personal level. Indeed, if one calculates one's diabetes score and it is elevated, every individual can decide to act upon the three modifiable variables (hypertension, waist circumference and physical activity). Fifthly, the scoring system could be adapted to the characteristics of the populations, by changing the threshold and/or the weights of its parameters, as it has been done for cardiovascular risk scores [37]. Sixthly, the American Diabetes Assocation (ADA) recommends testing all people, beginning at age 45 years. [38] The CoLaus/PsyColaus sample included adults aged 35 to 75, which is within the recommended age frame. Finally, our score includes 4 of the 9 criteria issued by the ADA recommendations for testing diabetes or prediabetes in asymptomatic adults: overweight, hypertension, family history, physical inactivity. [38] This shows the importance of these variables in the development of diabetes.
This study also has some limitations worth acknowledging. Firstly, participants excluded from the analysis presented higher prevalence of several components of the diabetes risk score. Hence, it is likely that the impact of some components might have been different, had those participants been included in the analysis. Still, external validation of the analysis in other cohorts led to similar findings. Secondly, the number of incident diabetes cases was relatively low in the Shahedieh cohort, a finding likely due to a short follow-up time (1 year) and leading to a lower statistical power. Hence, it would be of interest to replicate the analysis in the forthcoming years with a larger number of incident diabetes cases. Thirdly, we were unable to validate our score on a South-Eastern Asian country, where waist circumference is considered as a major determinant of diabetes [39]. Still, given the importance of waist circumference in our score, we hypothesize that it would perform relatively well in those countries, although an external validation study in a South-Eastern Asian country is necessary to test this hypothesis, especially in the genetically lean Asian men. Fourthly, physical activity was assessed differently in each cohort (S14 Table). Assessing physical activity in a standardized manner is a difficult task, and there is even lack of standardization when physical activity is measured using accelerometry [40]. Hence, we chose a very pragmatic solution, where each cohort would define physical activity according to its own standards. We acknowledge that this procedure increases variability between cohorts, but on the other hand it allows the use of our score by many other cohorts. Fifthly, our score uses a fixed weight for all risk factors independently of the country considered, while it has been suggested that the importance of conventional risk factors for predicting diabetes varies between countries [41]. Indeed, levels of diabetes have been shown to vary according to socio-economic status [42] and ethnicity [43]. However, in our study, neither ethnicity nor education came out as significant predictors for type 2 diabetes. Further, waist circumference has been shown to be the best anthropometric predictor across all racial and ethnic groups [44]. Moreover, due to its simple scoring system, our score allows the addition of other risk factors if the latter are considered as important in a given setting. Alternatively, it might be necessary to recalibrate our score according to country, as it has been suggested for cardiovascular risk prediction [37]. Finally, as other scores, ours did not allow the identification of all incident cases of diabetes. Still, it can be used with minimal clinical data, and can thus be applied in settings with limited health resources where no screening for diabetes risk is available.

Conclusion
Our clinically-based score shows comparable or even better results to other clinical scores and can be used on different populations with contrasting diabetes prevalence.
Supporting information S1  Table. Performance of the new score and of two other clinically based scores, in original cohort (CoLaus/PsyCoLaus) and in the replication cohorts. (DOCX) S13 Table. Diagnostic capacity of the new score and of two other clinically based scores, in original cohort (CoLaus/PsyCoLaus) and in the replication cohorts. (DOCX) S14 Table. Definitions of physical activity in the original and in the replication cohorts. (DOCX) S15