Development and validation of the CHIME simulation model to assess lifetime health outcomes of prediabetes and type 2 diabetes in Chinese populations: A modeling study

Background Existing predictive outcomes models for type 2 diabetes developed and validated in historical European populations may not be applicable for East Asian populations due to differences in the epidemiology and complications. Despite the continuum of risk across the spectrum of risk factor values, existing models are typically limited to diabetes alone and ignore the progression from prediabetes to diabetes. The objective of this study is to develop and externally validate a patient-level simulation model for prediabetes and type 2 diabetes in the East Asian population for predicting lifetime health outcomes. Methods and findings We developed a health outcomes model from a population-based cohort of individuals with prediabetes or type 2 diabetes: Hong Kong Clinical Management System (CMS, 97,628 participants) from 2006 to 2017. The Chinese Hong Kong Integrated Modeling and Evaluation (CHIME) simulation model comprises of 13 risk equations to predict mortality, micro- and macrovascular complications, and development of diabetes. Risk equations were derived using parametric proportional hazard models. External validation of the CHIME model was assessed in the China Health and Retirement Longitudinal Study (CHARLS, 4,567 participants) from 2011 to 2018 for mortality, ischemic heart disease, cerebrovascular disease, renal failure, cataract, and development of diabetes; and against 80 observed endpoints from 9 published trials using 100,000 simulated individuals per trial. The CHIME model was compared to United Kingdom Prospective Diabetes Study Outcomes Model 2 (UKPDS-OM2) and Risk Equations for Complications Of type 2 Diabetes (RECODe) by assessing model discrimination (C-statistics), calibration slope/intercept, root mean square percentage error (RMSPE), and R2. CHIME risk equations had C-statistics for discrimination from 0.636 to 0.813 internally and 0.702 to 0.770 externally for diabetes participants. Calibration slopes between deciles of expected and observed risk in CMS ranged from 0.680 to 1.333 for mortality, myocardial infarction, ischemic heart disease, retinopathy, neuropathy, ulcer of the skin, cataract, renal failure, and heart failure; 0.591 for peripheral vascular disease; 1.599 for cerebrovascular disease; and 2.247 for amputation; and in CHARLS outcomes from 0.709 to 1.035. CHIME had better discrimination and calibration than UKPDS-OM2 in CMS (C-statistics 0.548 to 0.772, slopes 0.130 to 3.846) and CHARLS (C-statistics 0.514 to 0.750, slopes −0.589 to 11.411); and small improvements in discrimination and better calibration than RECODe in CMS (C-statistics 0.615 to 0.793, slopes 0.138 to 1.514). Predictive error was smaller for CHIME in CMS (RSMPE 3.53% versus 10.82% for UKPDS-OM2 and 11.16% for RECODe) and CHARLS (RSMPE 4.49% versus 14.80% for UKPDS-OM2). Calibration performance of CHIME was generally better for trials with Asian participants (RMSPE 0.48% to 3.66%) than for non-Asian trials (RMPSE 0.81% to 8.50%). Main limitations include the limited number of outcomes recorded in the CHARLS cohort, and the generalizability of simulated cohorts derived from trial participants. Conclusions Our study shows that the CHIME model is a new validated tool for predicting progression of diabetes and its outcomes, particularly among Chinese and East Asian populations that has been lacking thus far. The CHIME model can be used by health service planners and policy makers to develop population-level strategies, for example, setting HbA1c and lipid targets, to optimize health outcomes.


Methods and findings
We developed a health outcomes model from a population-based cohort of individuals with prediabetes or type 2 diabetes: Hong Kong Clinical Management System (CMS, 97,628 participants) from 2006 to 2017. The Chinese Hong Kong Integrated Modeling and Evaluation (CHIME) simulation model comprises of 13 risk equations to predict mortality, micro-and macrovascular complications, and development of diabetes. Risk equations were derived using parametric proportional hazard models. External validation of the CHIME model was assessed in the China Health and Retirement Longitudinal Study (CHARLS, 4,567 participants) from 2011 to 2018 for mortality, ischemic heart disease, cerebrovascular disease, renal failure, cataract, and development of diabetes; and against 80 observed endpoints from 9 published trials using 100,000 simulated individuals per trial. The CHIME model was compared to United Kingdom Prospective Diabetes Study Outcomes Model 2 (UKPDS-OM2) and Risk Equations for Complications Of type 2 Diabetes (RECODe) by assessing PLOS  model discrimination (C-statistics), calibration slope/intercept, root mean square percentage error (RMSPE), and R 2 . CHIME risk equations had C-statistics for discrimination from 0.636 to 0.813 internally and 0.702 to 0.770 externally for diabetes participants. Calibration slopes between deciles of expected and observed risk in CMS ranged from 0.680 to 1.333 for mortality, myocardial infarction, ischemic heart disease, retinopathy, neuropathy, ulcer of the skin, cataract, renal failure, and heart failure; 0.591 for peripheral vascular disease; 1.599 for cerebrovascular disease; and 2.247 for amputation; and in CHARLS outcomes from 0.709 to 1.035. CHIME had better discrimination and calibration than UKPDS-OM2 in CMS (C-statistics 0.548 to 0.772, slopes 0.130 to 3.846) and CHARLS (C-statistics 0.514 to 0.750, slopes −0.589 to 11.411); and small improvements in discrimination and better calibration than RECODe in CMS (C-statistics 0.615 to 0.793, slopes 0.138 to 1.514). Predictive error was smaller for CHIME in CMS (RSMPE 3.53% versus 10.82% for UKPDS-OM2 and 11.16% for RECODe) and CHARLS (RSMPE 4.49% versus 14.80% for UKPDS-OM2). Calibration performance of CHIME was generally better for trials with Asian participants (RMSPE 0.48% to 3.66%) than for non-Asian trials (RMPSE 0.81% to 8.50%). Main limitations include the limited number of outcomes recorded in the CHARLS cohort, and the generalizability of simulated cohorts derived from trial participants.

Conclusions
Our study shows that the CHIME model is a new validated tool for predicting progression of diabetes and its outcomes, particularly among Chinese and East Asian populations that has been lacking thus far. The CHIME model can be used by health service planners and policy makers to develop population-level strategies, for example, setting HbA1c and lipid targets, to optimize health outcomes.

Author summary
Why was this study done?
• The chronic progression to diabetes-related complications is suitable for computer simulation modeling due to the long-term nature of health outcomes and the time lag for interventions to impact upon patient outcomes.
• Existing predictive outcomes models for type 2 diabetes developed and validated in historical European populations may not be applicable for East Asian populations due to differences in epidemiology and complications.
• A validated tool to predict lifetime health outcomes for prediabetes and type 2 diabetes in the Chinese population is needed.
What did the researchers do and find? •  • The CHIME outperformed the widely used United Kingdom Prospective Diabetes Study Outcomes Model 2 (UKPDS-OM2) and Risk Equations for Complications Of type 2 Diabetes (RECODe) models on real-world data.
• Validation of the CHIME model was more accurate for trials with mainly Asian participants than trials with mostly non-Asian participants.
What do these findings mean?
• Our study showed that the CHIME model is a new validated tool for predicting outcomes in Chinese and East Asian populations with prediabetes and type 2 diabetes.
• Existing diabetes outcomes models developed in European or North American populations may not be applicable to Chinese populations.
• Diabetes outcomes models such as the CHIME model can be used by health service planners and policy makers to develop population-level strategies to optimize health outcomes.

IntroductionAU : Pleaseconfirmthatallheadinglevelsarerepresentedcorrectly:
China has by far the largest absolute burden of diabetes, with an estimated 116 million adults living with the disease accounting for one-quarter of patients with diabetes globally [1]. Diabetes-related health expenditure for China alone reached USD 109 billion [1]. Worryingly, the prevalence of prediabetes has risen to 35.7% of Chinese adults [2], and the diabetes epidemic is expected to increase to 147 million adults by 2045. Evaluating the health and economic outcomes of diabetes and its complications is vital for formulating health policy. The chronic progression to diabetes-related complications is apt for computer simulation modeling due to the long-term nature of health outcomes and the time lag for interventions to impact upon patient outcomes. Yet differences in epidemiology and outcomes among East Asian populations with diabetes render application of existing models that were developed and validated in European and North American populations problematic [3]. The most widely used model, United Kingdom Prospective Diabetes Study Outcomes Model 2 (UKPDS-OM2), is underpinned by risk equations from a 1970s UK cohort and overestimates the absolute risks of coronary heart disease and stroke among East Asians [4,5]. The more recent Risk Equations for Complications Of type 2 Diabetes (RECODe) model for 10-year risks was developed from a trial in the United States/Canada and has been validated in both North American trials and cohorts [6,7]. Other existing diabetes models such as CDC-RTI, CORE, and BRAVO were all developed from trials conducted in European or North American settings with few, if any, Asian participants and have rarely been tested by external validation on individual-level data [8][9][10] (see S1 Table).
We sought to develop and validate an outcomes model for the development of diabetes and related complications derived from Chinese (East Asian) populations and compare this new Chinese Hong Kong Integrated Modeling and Evaluation (CHIME) model to the existing UKPDS-OM2 and RECODe models. Despite the continuum of risk across the spectrum of risk factor values, existing models are typically limited to diabetes alone and ignore the progression from prediabetes to diabetes. The CHIME simulation model integrates prediabetes and diabetes into a comprehensive outcomes model comprising of 13 outcomes including mortality, micro-and macrovascular complications, and development of diabetes. The lack of an appropriate simulation model for East Asia and prediabetes is a major gap for economic evaluation of interventions. The CHIME model can be applied as a tool to assist clinical and policy decision-makers evaluate management strategies over the lifetime horizon.

Methods
The analyses per se were not prespecified but have formed part of a multinational research project studying the long-term costs of diabetes care. As part of that work, we planned to undertake risk prediction modeling to assess the net value of medical spending on diabetes care using longitudinal patient-level from multiple health systems in Asia, Europe, and North America [11,12].
The analyses for model development were planned after obtaining and reviewing the Hong Kong Hospital Authority Clinical Management System (CMS) data, without which we did not a priori understand or had access to even the data fields and structures available. We planned to validate against simulated cohorts from 9 trials determined in advance from validation studies of existing diabetes outcome simulation models and diabetes trials conducted in East Asia (S1 Table). We subsequently obtained individual-level data for model validation from the China Health and Retirement Longitudinal Study (CHARLS) cohort. We planned to compare model performance of CHIME with the existing UKPDS OM2 model. Further comparison with the recently developed RECODe model and calibration assessments by slope and intercept were made in response to peer review. The analyses used only deidentified data, and the study was approved by the respective institutional review board for each Hospital Authority cluster (Hong Kong East/West, Kowloon Central/East/West, New Territories East/West).
The CHIME model was developed using CMS data and externally validated against CHARLS cohort and 9 published trials. CMS is one of the largest Chinese electronic health informatics systems with detailed clinical records. CHARLS was chosen for external validation as it is a nationally representative longitudinal cohort of middle-aged and elderly Chinese residents age 45 and older. We validated against 6 outcomes measures recorded in the CHARLS data and an additional 80 endpoints from 9 published trials of diabetes patients using simulated cohorts of 100,000 individuals.

Study populations
CMS. Hong Kong has a population of 7.5 million (92% Chinese) [13]. The estimated prevalence of prediabetes and diabetes in Hong Kong was 8.9% and 10.3%, respectively, in 2014 [14]. In Hong Kong, universal public healthcare is provided by Hospital Authority-a statutory body modeled after the British National Health Service (NHS) that manages public hospitals and ambulatory clinics. The Hospital Authority system provides care for 95% of people with diabetes in Hong Kong [15].
The Hospital Authority CMS is the health informatics system for the publicly provided healthcare in Hong Kong [16]. Electronic health records in CMS are linked via unique patient identity numbers and include patient demographics, records of deaths, admissions, attendances, diagnoses, procedures, medications, and laboratory tests. Diagnoses are coded according to the International Classification of Disease, ninth revision (ICD-9-CM) and the International Classification of Primary Care, second edition (ICPC-2).
We included adults diagnosed with either prediabetes or type 2 diabetes from January 1, 2006 to December 31, 2017. We defined prediabetes based on American Diabetes Association criteria, namely HbA1c 5.7% to 6.4% (39 to 47 mmol/mol), fasting glucose 5.6 to 7.0 mmol/L (100 to 125 mg/dL), or oral glucose tolerance test 7.8 to 11 mmol/L (140 to 199 mg/dL) [17]. We defined type 2 diabetes according to an algorithm for electronic healthcare records established in previous studies on the Hong Kong dataset [14], namely HbA1c �6.5% (�48 mmol/ mol); fasting plasma glucose �7.0 mmol/L (�126 mg/dL); oral glucose tolerance test �11.1 mmol/L (200 mg/dL); random plasma glucose �11.1 mmol/L (�200 mg/dL) on 2 separate occasions; diagnosis code for diabetes; or prescription of antihyperglycemic medication. We excluded individuals under the age of 20 at the date of onset of diabetes or prediabetes (whichever is earlier) or with a diagnosis code for type 1 diabetes.
We included 13 outcomes in our model development: all-cause mortality, diabetes-related macrovascular events (myocardial infarction, ischemic heart disease, heart failure, and cerebrovascular disease), microvascular events (peripheral vascular disease, neuropathy, amputation, ulcer of the skin, renal failure, cataracts, and retinopathy), and development of diabetes status. Clinical outcomes were extracted using diagnostic codes from the CMS dataset and mortality records from the Hong Kong death registry (detailed definitions of clinical outcomes with specified diagnostic codes shown in S2 Table).
CHARLS. The overall prevalence of diabetes and prediabetes in mainland China was 12.8% and 35.2% in 2018 [18]. CHARLS is a nationally representative longitudinal cohort of Chinese residents ages 45 and older. Details of the CHARLS cohort profile and biomarkers have been previously published [19,20]. The baseline survey wave was conducted between June 2011 and March 2012 and included 10,000 households in 150 counties/districts and 450 villages/resident committees using multistage stratified probability sampling. The CHARLS survey excluded individuals from collective dwellings such as school dormitories, nursing homes, and military bases. The response rate of the baseline wave was 80.5% [19]. All individuals were followed up in survey waves 2 to 4 conducted in 2013, 2015, and 2018. We derived the sample of prediabetes and diabetes participants from the 2011 baseline wave in line with the case definition algorithm based on measured HbA1c, fasting serum glucose, and self-reported diabetes status previously applied by Zhao and colleagues [21]. The 6 outcomes used for validation in the CHARLS cohort were mortality, ischemic heart disease, cerebrovascular disease, renal failure, cataract, and diabetes status.
Simulated trial cohorts. To further validate the CHIME model against additional outcomes, we compared the predicted against observed rates of 80 endpoints from 9 published trials of diabetes and prediabetes with long-term follow-up data, defined as trial length greater than 4 years. There were 4 trials conducted in East Asia (1 Chinese and 3 Japanese) and 5 trials outside Asia: Acarbose Cardiovascular Evaluation (ACE) [22], Action to Control Cardiovascular Risk in Diabetes (ACCORD) [23], Action in Diabetes and Vascular disease: preterAx and diamicroN-MR Controlled Evaluation (ADVANCE) [24], Diabetes Prevention Program (DPP) [25], Japan Diabetes Complications Study (JDCS) [26], Japan Elderly Diabetes Intervention Trial (J-EDIT) [27], Japanese Primary Prevention of Atherosclerosis with Aspirin for Diabetes trial (JPAD) [28], and UK Prospective Diabetes Study 33 and 80 (UKPDS 33 and 80) [29,30].

Predictors
In order to predict future health outcomes, we selected candidate predictors from a review of existing diabetes outcomes models in Mount Hood Diabetes registry of simulation models [31] (see S1 Table) and input from clinician experts within the authorship group. They included age, sex, diabetes status, duration of diabetes, smoking status, body mass index (BMI), glycated hemoglobin (HbA1c), systolic blood pressure, diastolic blood pressure, highdensity lipoprotein (HDLAU : PleasenotethatHDLandLDLhavebeendefinedashigh À densitylipoprotein ) cholesterol, low-density lipoprotein (LDL) cholesterol, triglycerides, estimated glomerular filtration rate (eGFR), hemoglobin, and white blood cell count; medications (insulin, non-insulin hypoglycemic agent, antihypertensives, and statins); and preexisting medical conditions (atrial fibrillation, myocardial infarction, ischemic heart disease, heart failure, cerebrovascular diseases, peripheral vascular diseases, neuropathy, amputation, renal failure, hemodialysis, retinopathy, cataract, and ulcer of skin). Since the CMS dataset accounts for more than 90% of total bed days in the Hong Kong healthcare system during the study period [32], we assumed that the health outcome data were essentially complete and therefore missing data in the predictor variables was not dependent on the outcome. Complete case analysis was preferred over multiple imputation as only the predictors have missing values and the probability to be missing does not depend on outcome [33][34][35]. Individuals in the derivation cohort with missing predictor data at baseline are shown in S3 Table.

Statistical methods
We used parametric proportional hazard models to analyze our data by fitting multivariable models incorporating time-varying clinical biomarkers and comorbidities for each outcome, in which time since enrollment was employed as the time interval. The model fitting process for the final risk models were based on a combination of backwards selection process using Akaike's information criterion (AIC) and consultation from clinical experts within the authorship group. Selected variables were assessed for clinical relevance with the outcome and the direction of associations with final selection based on group consensus among the clinical experts. The parametric form of the underlying hazard was examined graphically, and models were selected by AIC for exponential, log-logistic, log-normal, and Weibull parametric distributions, where lower AIC was considered to indicate a better model fit. We applied moving averages to smooth fluctuation for biomarkers by averaging the parameters values for each year. Continuous variables were modeled as nonlinear using restricted cubic spline function with 3 knot points at 10%, 50%, and 90% percentiles [36]. We also included the history of previous events, so an event occurring at baseline or during the previous model cycle would be recorded as a history of that event for the current yearly cycle. For internal validation, we calculated the overfitting bias corrected Harrell's C-statistic and Brier score at 10 years using bootstrap resampling with 100 replications [36]. Harrell's C-statistic is an extension of the receiver operating characteristic statistic for survival data. Brier score was based on the predicted and observed cumulative incidence at 10 years.
A schematic of the CHIME model structure is illustrated in Fig 1. The risk equations were applied annually in an individual-level discrete-time simulation model. Model inputs were entered for each individual including their baseline demographics, clinical risk factors, and history of complications. The simulation involved using the risk equations to estimate the probability of each outcome for each individual to determine whether the event occurred or not during the annual cycle. If the simulation predicted that an individual died in that annual cycle, the time to death and time to other outcomes were recorded. If the individual survived the annual cycle, their age, duration, history of events, and risk factor values were updated for entry into the next cycle. Thus, the individual's updated risk factors and history of events are used to predict the occurrence of outcomes and changes in risk factors in the next annual cycle. The discrete-time cycles are then repeated sequentially for the length of the simulation time. The simulation model recorded outputs including time to death and complications, annual incidence of complications and death, and changes in risk factors. For individuals with prediabetes, the simulation also recorded time to progression to diabetes.
We predicted the progression of risk factor values (glycated hemoglobin HbA1c, systolic blood pressure, diastolic blood pressure, HDL cholesterol, LDL cholesterol, triglycerides, and BMI) upon completion of each discrete-time cycle. To do this, we modeled the trajectory for each biomarker (continuous variables) over time for the study participants in the CMS dataset using ordinary least squares regression. The biomarker value for each risk factor in the current cycle was predicted by its lagged average values in the previous 2 years, age, sex, duration, and medications (statins, hypoglycemic agents, and antihypertensives), in keeping with other diabetes outcomes models [10]. Model fit was assessed by the root mean square error (RMSE) as the variables are continuous.

Validation
Comparisons to other outcomes models. We identified previous diabetes outcome models registered in Mount Hood Diabetes registry of simulation models [31]; participants characteristics, model development, and validation strategies are detailed in S1 Table. Almost all identified models were proprietary and did not have publicly available code; only the UKPDS-OM2 and RECODe had user interfaces for comparative performance assessment. The UKPDS equations take various functional forms specific to each outcome (e.g., Gompertz, Weibull, logistic, or exponential), whereas the RECODe equations are Cox proportional hazards models [5,6].
The RECODe model predicts risks at a specified period of 10 years; for comparison, we assessed the CHIME, UKPDS-OM2, and RECODe models against CMS participants enrolled from 2006 to 2008 and followed until December 31, 2017. We compared the CHIME and UKPDS-OM2 models against the CHARLS validation cohort at 6 years of follow-up (wave 4 conducted in July to September 2018). Since UKPDS-OM2 and RECODe do not predict for participants with prediabetes, for our main analyses, we compared all models against CMS and CHARLS participants with type 2 diabetes only (heart rate data were unavailable for CHARLS participants).
We assessed model discrimination using the C-statistic at 10 years for CMS and 6 years for CHARLS with confidence intervals estimated from 100 bootstrap replications. We assessed calibration through the slope and intercept of the line between predicted and observed probabilities of each outcome by deciles of risk, with fewer centiles than deciles used if fewer than 5 events were observed per group to prevent unstable inferences [37]. We also measured the goodness of fit between predicted and observed endpoints using the root mean square percentage error (RMSPE), where lower scores indicate better fit, and present scatterplots of predicted versus observed endpoints along with the coefficient of determination (R 2 ).
Simulation against published trial. Since the CHARLS data only reported mortality, ischemic heart disease, cerebrovascular disease, renal failure, cataracts, and diabetes status, we also performed validation against published trial data, in keeping with the performance assessment strategy used in most diabetes outcomes models that lacked an individual-level validation cohort (see S1 Table for further details) [8][9][10]. We used the published baseline characteristics of the trial participants to generate a simulated cohort for the duration of each respective trial, with separate cohorts for each arm of the trial to model differing treatment effects between intervention and control arms.
We modeled the entire distribution of risk factors to account for sampling uncertainty, patient heterogeneity, and prior history when extrapolating clinical trial data [38]. For each  individual participant at baseline, we took the reported mean and standard deviation for each continuous variable (e.g., age, duration of diabetes, and biomarkers) to randomly generate values assuming a normal distribution. We used rnorm function (R version 3.6.3) to generate the random values. Upper and lower bounds for generated values were set according to the inclusion and exclusion criteria in the study protocol for each trial. For example, in the treatment arm of the ACE trial, the age value of each participant used a normal distribution centered around mean 64.4 years with a standard deviation of 8.2 years, truncated at a lower limit of 50 years old (inclusion criterion). For binary and categorical variables (sex, smoking status, prescribed medications, and past medical history), we took the percentage of participants with the particular status as the sampling probability. For example, in the treatment arm of the ACE trial, the sampling probability for female was 27%.
To simulate the trial progression, we assumed that the full treatment effects on each measured biomarker was reached in year 2 and remained stable thereafter for the remainder of the trial. Full treatment effect was defined as the maximal benefit and operationalized as a percentage of the average value at baseline. For example, in the treatment arm of the ACE trial, the full treatment effect on HbA1c was an average decrease of 0.05 percentage points from the average baseline HbA1c value of 5.9. Thus, each participant in the simulated cohort had a relative decrease of 0.05/5.9 or 0.85% from the first year HbA1c value in the subsequent years. We made no attempt to calibrate the model outputs to each individual trial. The point estimates for each predicted endpoint were obtained from simulating at least 100,000 participants per trial.

Transparency and reporting
Statistical codes can be found on the GitHub repository: https://github.com/quan-group/ CHIME. All analyses were carried out using R version 3.6 (R Foundation for Statistical Computing, Vienna, Austria) using the rms package [36]. This simulation model has been registered on the Mount Hood Diabetes Challenge Network, a registry includes a set of reference simulations that are intended to enable comparisons of models across time; for further details, see [15]. This study was approved by the Institutional Review Boards of all Hong Kong Hospital Authority clusters: HKWC, HKEC, KC/KEC, KWC, NTWC, and NTEC. This study follows the reporting guidelines in the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement [39]; see S1 TRIPOD Checklist.

Results
Baseline characteristics of the study participants in CMS and CHARLS cohorts are provided in Table 1. The CMS development cohort had 97,628 participants in the Hong Kong with type 2 diabetes (43.5%) or prediabetes (56.5%), with a mean follow-up time of 4.1 years (range of 0 to 12.8 years, accruing 402,250 person-years). The CHARLS validation cohort had 4,567 participants of which 216 (4.7%) were missing at 6 years follow-up (S4 Table). The CHARLS cohort was younger than the CMS cohort (mean 59.5 years versus 61.9 years), had lower HbA1c (5.5% versus 6.7%), lower BMI (23.9 versus 25.3), better renal function, and similar lipid profiles, but consisted of more smokers (28.4% versus 11.1%) and fewer people on medications.
We observed 9,878 deaths in CMS data during the follow-up period, equivalent to annual rate of 0.025 (the number of events for each outcome during the follow-up period is presented in S5 Table). The predictors included in the CHIME biomarkers and outcomes model are shown with coefficients and standard errors (S6 Table) and survival time ratios (S7 Table). Table 2 shows the validation performance for CHIME, UKPDS-OM2, and RECODe against the CMS and CHARLS individual level datasets. In internal validations, CHIME C-statistics for discrimination ranged from 0.636 (for retinopathy) to 0.813 (for amputation and renal failure). Calibration slopes between expected and observed outcome rates ranged from 0.680 to 1.333 for mortality, myocardial infarction, ischemic heart disease, retinopathy, neuropathy, ulcer of the skin, cataract, renal failure, and heart failure; 0.591 for peripheral vascular disease; 1.599 for cerebrovascular disease; and 2.247 for amputation (ideal = 1). All calibration intercepts ranged from −0.066 to 0.022 (ideal = 0; Table 2, Fig 2). The performance of the risk prediction models from internal validation overall for prediabetes and diabetes and their functional form are presented in S5 Table. Model performance did not vary substantially when evaluating participants with diabetes compared to prediabetes (S8 Table;

Comparison with alternative risk equations
Our CHIME model had better discrimination and calibration than UKPDS-OM2 in both the CMS development cohort (C-statistics 0.548 to 0.772, slopes 0.130 to 3.846, and intercepts −0.041 to 0.072) and CHARLS validation cohort (C-statistics 0.514 to 0.750, slopes −0.589 to 11.411, and intercepts 0.018 to 0.191; Table 2; Fig 2). CHIME had small improvements in discrimination and better calibration than RECODe for all outcomes in the CMS development cohort (C-statistics 0.615 to 0.793, slopes 0.138 to 1.514, and intercepts −0.053 to 0.104). The predictive error was smaller for CHIME against the CMS development data (RSMPE 3.53% versus 10.82% for UKPDS-OM2 and 11.16% for RECODe at 10 years of follow-up; Table 2), and the CHARLS validation cohort (RSMPE 4.49% versus 14.80% for UKPDS-OM2 at 6 years of follow-up). The predicted event rates from UKPDS and CHIME models against the observed events rates over time for CMS participants with diabetes and prediabetes are shown in Figs 3 and S2 (RECODe model only has risk estimates at 10 years). On external validation, the UKPDS-OM2 overpredicted 4 outcomes (mortality, myocardial infarction, cerebrovascular disease, and amputation), underpredicted 2 outcomes (ischemic heart disease and renal failure), and had close correspondence with heart failure and ulcer of the skin. The CHIME model was derived from the CMS data and displayed close correspondence on internal validation with the exception of development of diabetes. Table 3 shows the validation of the CHIME model against 80 observed endpoints from 9 published trials. All simulation trial cohorts were checked for convergence of outcomes (S3 Fig). Among the simulated trial cohorts, the calibration performance of the CHIME model was generally better for trials with mainly Asian participants (RMSPE 0.48% to 3.66%, ideal = 0%) than for non-Asian trials (RMPSE 0.81% to 8.50%), with the exception of ADVANCE (RMSPE 0.81%) among non-Asian trials, although ADVANCE had a significant  RMSPE, root mean square percentage error; CHD, coronary heart disease; CVD, cardiovascular disease; BMI, body mass index; eGFR, estimated glomerular filtration rate; ACE, Acarbose Cardiovascular Evaluation [22]; ACCORD, Action to Control Cardiovascular Risk in Diabetes [23]; ADVANCE, Action in Diabetes and Vascular disease: preterAx and diamicroN-MR Controlled Evaluation [24]; DPP, Diabetes Prevention Program [25]; JDCS, Japan Diabetes Complications Study [26]; J-EDIT, Japan Elderly Diabetes Intervention Trial [27]; JPAD, Japanese Primary Prevention of Atherosclerosis with Aspirin for Diabetes trial [28]; UKPDS, United Kingdom Prospective Diabetes Study [29,30].
https://doi.org/10.1371/journal.pmed.1003692.t003 number of Asian participants (Table 3). Compared to UKPDS-OM2 and RECODe, CHIME was the best performing model for JPAD, JDCS, ADVANCE, and comparable to UKPD-S-OM2 for J-EDIT (RMPSE 3.66% versus 3.08%). The best performing model for UKPDS and ACCORD trials were the respective models that used this for development-UKPDS-OM2 and RECODe. CHIME was the closest model for Asian trials (RMSPE 1.86%), and UKPD-S-OM2 was closest for European and North American trials (4.72%). Calibration performance of each model by clinical outcome is shown in S9 Table and Fig 4, and by individual outcome for each individual trial in S10 Table. Output We developed an online, public, interactive interface for modeling diabetes and prediabetes outcomes, allowing input of demographic and clinical information to calculate risk  Table 3  probabilities. Full risk equation formulas and data visualization are presented online: https:// jquan.shinyapps.io/CHIME.

Discussion
In the current study, we developed and externally validated the first integrated prediabetes and type 2 diabetes outcomes model for Chinese and East Asian populations: comprising of 13 outcomes including mortality, micro-and macrovascular complications, and development of diabetes. We validated using both individual-level data in the CHARLS cohort and aggregatelevel data using simulated cohorts from 9 published trials. We compared the CHIME model to the existing UKPDS-OM2 and RECODe models.
We found that the widely used UKPDS-OM2 was not well calibrated to the Chinese population on external validation of 2 individual-level datasets. The UKPDS-OM2 was developed from a 1970s UK cohort and overpredicted mortality and cerebrovascular disease but underpredicted outcomes that are more common in the Asian population such as renal failure, reflecting the differences in epidemiology of diabetes between East Asian and European/North American populations [3]. The RECODe model developed from a North American trial displayed similar patterns of overpredicting myocardial infarction and cerebrovascular disease but underpredicting renal failure. The overprediction of macrovascular outcome by UKPDS-OM2 could be due to more intensive management in the past decade, such as early initiation and tighter clinical thresholds for antihyperglycemic agents, statins and antihypertensives, whereas RECODe was developed from a more recent trial conducted from 2001 to 2009 and consequently better calibration than UKPDS-OM2.
The RECODe model showed good discrimination, often comparable to the CHIME model, but was less well calibrated to the CMS development cohort. Unlike CHIME and UKPD-S-OM2, the RECODe model is restricted to predicting risk at a specific time interval of 10 years and does not incorporate time-varying covariates. This limits its applicability for lifetime projections, and flexibility when validating against sample of varying follow-up periods. As expected, the best performing model for cohorts simulated from UKPDS trial was UKPDS-OM2, and for ACCORD trial was RECODe, which were their respective development cohorts, supporting the face validity of this validation approach. The RECODe (ACCORD trial) differs markedly from the social and historical context of UKPDS trial [7], both of which differ even more markedly from the Chinese CMS and CHARLS cohorts.
The CHIME model showed good calibration between predicted and observed probabilities by deciles of risk against the CMS development cohort for most outcomes (mortality, myocardial infarction, ischemic heart disease, ulcer of the skin, retinopathy, neuropathy, renal failure, and heart failure) with poorer calibration among the higher-risk subgroups for cerebrovascular disease, peripheral vascular disease, and amputation. The CHIME model had good calibration against CHARLS, whereas the UKPDS OM-2 was poorly calibrated.
While in general CHIME performed better in Asian than non-Asian trials, there were 2 notable exceptions in J-EDIT and ADVANCE. The J-EDIT trial enrolled substantially older participants of age 65 to 84 years (mean age 72) compared to the CMS dataset (mean age 61.9). This older age group with higher risk may be closer to the UKPDS-OM2, which tends to overestimate risks in trials and cohorts [4,6,7,24]. CHIME also performed well for the ADVANCE trial, which may be due to the diverse geographical/ethnic mix of the trial participants with almost 40% of participants from Asia.
Among the various outcomes, the prediction of diabetes status and impaired renal function for participants with prediabetes was notably worse, likely due to the insidious onset of diabetes and impaired renal function and the lack of routine screening at regular intervals in population-based cohorts. More accurate ascertainment was achieved in prediabetes trial settings such as ACE, which employ more rigorous ascertainment of the development of diabetes and impaired renal function as an outcome. Overall calibration was good for the trials supporting population-level policy assessment and health economic evaluation. The CHIME models can be useful for population-level risk prediction of several endpoints considered together rather than at the individual level.
Previous diabetes simulation models are typically developed from trial data on limited number of participants in European or North American settings and were not externally validated on an individual-level dataset. Study participants in trials are selected according to strict inclusion and exclusion criteria that may not reflect real-world generalizability that is essential for useful policy modeling of whole populations. In contrast, we used an extensive populationbased health records database based on routine health contact that has the benefit of a larger sample size and generalizability. We conducted validation against real-world observational data from a nationally representative CHARLS cohort. Similar to other validation studies of diabetes outcomes models, we also generated simulated cohorts from reported data on trial participants to further validate against additional outcomes [8][9][10]. Due to varying inclusion and exclusion criteria and trial protocol-driven practices among different studies, there are likely to be differences between the simulated cohort generated and the characteristics of the actual patients in the trial.
The CHIME model was developed according to American Diabetes Association and ISPOR guidelines on modeling best practice [40][41][42]. Similar to other diabetes modeling approaches to uncertainty and heterogeneity, we addressed first-order uncertainty (stochastic uncertainty) by performing Monte Carlo simulations with sufficient replications for convergence, secondorder uncertainty (parameter uncertainty) by bootstrap resampling with replacement of individuals in the study population and reestimating equations to derive a distribution of parameters for each equation, and patient heterogeneity by using individual-level simulation of a large sample [10,43].
Our study had a number of limitations. While we modeled a broad range of diabetes-related outcomes, some complications such as hypoglycemic episodes were not able to be included due to lack of data. The development sample was drawn from population-based health records, which have less complete ascertainment of clinical data compared to the idealized settings of clinical trials. Nevertheless, our sample size was far larger-almost 10-fold higher than the UKPDS (n = 5,102) and ACCORD trials (n = 10,251) utilized to develop previous diabetes prediction models [5,6,[8][9][10]. Our electronic medical records covered all public healthcare services across the territory, which increases generalizability compared to the strict inclusion and exclusion criteria of the randomized trials used in the development of other diabetes models [5,[8][9][10]. There was a general lack of long-term Chinese-or East Asian-specific cohorts or trials longer than 4 years. We excluded the China Da Qing Diabetes Prevention Study (CDQDPS) study of a 1980s Chinese cohort due its small sample size and lack of baseline biomarkers [44] and the Japan Diabetes Optimal Treatment study for 3 major risk factors of cardiovascular diseases (J-DOIT3) trial as it only published composite endpoints [45]. The neuropathy endpoint was unavailable from the trial data. Some trials failed to report sufficient details such as rates of existing complications at baseline. Further work on validating against more outcomes, longer follow-up, and in other East Asian populations are warranted. New predictive biomarkers such as hs-CRP and serum amyloid P component have improved for mortality prediction for diabetes, and their inclusion may improve predictions for other outcomes [46,47].
In many health systems, access to interventions is often dependent on evidence of value for money. For diabetes, this will require simulation modeling to estimate the long-term health outcomes and to inform decision analysis such as cost per quality-adjusted life years (QALYs) gained. Estimation of inputs including complication-related costs, healthcare utilization, and health state utility values will require further work for East Asia settings. The CHIME outcomes model can be used to evaluate population health status for prediabetes and diabetes using routinely recorded data. By applying the appropriate utility values of the target population for the wide range of diabetes-related complications [48], the CHIME outcomes model can be utilized to assess quality of life and measure QALYs over the long-time horizon of chronic disease conditions. This supports economic evaluation of policy guidelines and clinical treatment pathways to tackle diabetes, prediabetes, their associated micro-and macrovascular complications, and life expectancy.
OurAU : PleaseconfirmthattheeditstothesentenceOurstudyshowsthattheCHIMEmodelisavalidated:::d study shows that the CHIME model is a validated tool for predicting progression of diabetes and its outcomes, particularly among Chinese and East Asian populations, which has been lacking thus far. This will support the clinical and economic evaluation of therapies related to the long-term management of diabetes. The CHIME model can be used by health service planners and policy makers to develop population-level strategies, for example, setting HbA1c and lipid targets, to optimize health outcomes.  Table 3 Table. External validation of observed against predicted trial endpoints across all validation trials by outcome for CHIME, UKPDS-OM2, and RECODe models. (DOCX) S10 Table. External validation of observed against predicted endpoints by outcome and individual trial for CHIME, UKPDS-OM2, and RECODe models.