Identification of Cardiovascular Risk Components in Urban Chinese with Metabolic Syndrome and Application to Coronary Heart Disease Prediction: A Longitudinal Study

Background Metabolic syndrome (MetS) is proposed as a predictor for cardiovascular disease (CVD). It involves the mechanisms of insulin resistance, obesity, inflammation process of atherosclerosis, and their complex relationship in the metabolic network. Therefore, more cardiovascular risk-related biomarkers within this network should be considered as components of MetS in order to improve the prediction of CVD. Methods Factor analysis was performed in 5311 (4574 males and 737 females) Han Chinese subjects with MetS to extract CVD-related factors with specific clinical significance from 16 biomarkers tested in routine health check-up. Logistic regression model, based on an extreme case-control design with 445 coronary heart disease (CHD) patients and 890 controls, was performed to evaluate the extracted factors used to identify CHD. Then, Cox model, based on a cohort design with 1923 subjects followed up for 5 years, was conducted to validate their predictive effects. Finally, a synthetic predictor (SP) was created by weighting each factor with their risks for CHD to develop a risk matrix to predicting CHD. Results Eight factors were obtained from both males and females with a similar pattern. The AUC to classify CHD under the extreme case-control suggested that SP might serve as a useful tool in identifying CHD with 0.994 (95%CI 0.984-0.998) for males and 0.998 (95%CI 0.982-1.000) for females respectively. In the cohort study, the AUC to predict CHD was 0.871 (95%CI 0.851-0.889) for males and 0.899 (95%CI 0.873-0.921) for females, highlighting that SP was a powerful predictor for CHD. The SP-based 5-year CHD risk matrix provided as convenient tool for CHD risk appraisal. Conclusions Eight factors were extracted from sixteen biomarkers in subjects with MetS and the SP adds to new insights into studies of prediction of CHD risk using data from routine health check-up.


Introduction
Metabolic syndrome (MetS) is a public health challenge because of its high prevalence and association with the risk of cardiovascular disease (CVD) [1,2] and type 2 diabetes [3,4]. Several studies have applied MetS as a marker to predict the development of CVD at the population level, however, few studies have been conducted in Asian populations. According to the criteria recommended by the Diabetes Branch of Chinese Medical Association [5], MetS encompasses a cluster of metabolically related CVD risk factors: being overweight or obese, high blood pressure, dyslipidemia, and hyperglycemia. This criterion is slightly different from the international definition of MetS [6]. In pathogenesis, MetS, defined by either Chinese or international criteria, is defined using factors including obesity, diabetes, hypertension, and dyslipidemia. Because these factors are involved in the mechanisms of insulin resistance, and the process of inflammation and atherosclerosis, this complex relationship has been suggested in study as the metabolic network of MetS [7]. Therefore, a generalized definition of MetS could be extended using multiple components within this network. Several studies have suggested that the definition of MetS may further include microalbuminuria, proinflammatory cytokines, prothrombotic & fibrinolytic factors, and oxidative stress [8,9]. However, the structure and inclusion of MetS components are inconclusive [10]. Some studies found three or four factors, underlying the overall correlation between metabolic variables [11,12], while in recent years, some researchers verified a single-factor model that can represent MetS [13,14]. The different patterns of MetS components resulted from differences in data availability, the number of biomarkers incorporated into specific models, and studies with specific purposes. In the present study, we aimed to select several cardiovascular risk biomarkers involved in the above metabolic network using robust bio-statistical modeling technique to develop a MetS related synthetic predictor (SP) for classifying subjects with or without CVD, and to predict high risk of CVD using data from a large-scale routine health checkup sample among urban Chinese residents.

Ethics Statement
This study was approved by the Ethics Committee of School of Public Health, Shandong University, and all participants were informed by written consent to participate in this study. The data was de-identified before it was provided to us. The data can't be shared with researchers upon request because while we cooperate with the hospital and have the right to use the data, the hospital is reluctant to let us share the data.

Study population
The study population includes a cohort of all participants who received routine health check-ups from 2005 to 2010 at the Center for Health Management of Shandong Provincial QianFoShan Hospital, and the Health Examination Center of Shandong Provincial Hospital. These two hospitals are affiliated teaching hospitals of the Shandong University, Jinan, Shandong province, China. Participants are urban residents living in Jinan city, the capital of Shandong province, and they were primarily employees. Participants who had completed physical examinations and the measurements of the study of biomarkers were included (n=28200). Of them, 5311 were classified having MetS (4574 males and 737 females) at their first health check-up using the criteria of the Chinese Medical Association. In the study, we used CHD as the study outcome, because MetS is suggested as an independent predictor for CHD [15][16][17]. Of 5311 MetS subjects, 445 cases (292 males and 153 females) had CHD diagnosed by physicians. Controls (n=890, males=584, and females=306) were randomly selected from the individuals without any components of MetS in the present case-control study design. Of 28200, 1923 participants (1263 males and 660 females) who had no CHD at baseline and completed a 5-year follow-up were included in the present cohort study design. At the end of the follow-up period, 134 incident CHD cases (90 males and 44 females) were diagnosed and the cumulative incidence rate was 6.97% (7.13% in males and 6.67% in females) ( Table S1).
All participants received a general health questionnaire survey, anthropometric measurement, and routine blood BMI was calculated by weight/height 2 (kg/m 2 ). BP was measured on the right arm from a sitting position after a 5minute rest. Blood samples were drawn after overnight fasting (>8 hours) for laboratory tests. All lab tests were conducted by certified experimental specialists using standard protocols at the hospital's Department of Laboratory. WBC, Hb and HCT were measured using SYSMEX XE-2100 automatic whole blood count system. FBG was measured by enzymic method, TC & GGT by enzymic colorimetric method, SUA & TG by colorimetric method, HDL-C & LDL-C by direct method, ALT by the criteria of International Federation of Clinical Chemistry (IFCC), and CREA by picric acid method using Roche Cobas 8000 Automatic Biochemical Analyzer. NAFLD was diagnosed by abdominal ultrasonography as brightness of the liver and a diffusely echogenic change in the liver parenchyma, with participants who were diagnosed alcoholic fatty liver disease, infected hepatitis virus (Hepatitis B antigen or hepatitis C antibody positive), and other causes of steatosis (Wilson disease) excluded. Based on the diagnostic criteria recommended by the Diabetes Branch of Chinese Medical Association [5], MetS was defined as presence of three or more of the following four risk factors: 1) overweight or obesity (BMI≥25.0 Kg/M 2 ); 2) hypertension (SBP≥140mmHg, DBP≥90mmHg or those with history of hypertension); 3) hyperglycemia (FPG≥6.1mmol/L or 2h plasma glucose≥7.8 mmol/L, or those with history of diabetes); 4) dyslipidemia (TG≥1.7 mmol/L, or HDL-C<0.9 mmol/L in male and HDL-C<1.0 mmol/L in female). CHD was diagnosed by physicians using the World Health Organization's criteria (symptoms plus either diagnostic ECG changes or elevated levels of cardiac enzymes) [41].

Strategy of the development of synthetic predictor
In the study, the orthogonal exploratory factor analysis (EFA), a standard method to identify patterns of MetS, was used to extract cardiovascular risk-related factors from the above 16 manifest biomarkers in the MetS population. After EFA, the clinical significance of each latent factor was named. Furthermore, the logistic regression discrimination (LRD) model based on an extreme case-control design was performed to evaluate their discriminant effects for CHD, and the Cox proportional hazard prediction model based on a cohort design was further conducted to validate their predictive effects. Finally, SP was created by weighting each factor with their risks for CHD for developing a risk matrix to predict CHD in the practice of routine health check-up.

Statistical analysis
Descriptive analysis. For patients with MetS, student t test (for continuous variables) and the χ 2 test (for categorical variables) were used to test the significant differences of the sixteen biomarkers between males and females. The difference in the prevalence of the four basic components (obesity, hypertension, dyslipidemia and hyperglycemia) and their combination between males and females were tested by χ 2 test.
Steps of the development of synthetic predictor. EFA with principal component algorithm and varimax rotation from correlation matrix was performed to extract independent factors of MetS from above 16 manifest biomarkers for male and female MetS groups respectively. The criteria for retaining factors were set up as eigenvalue>1 as well as accounting for 70% of the total variation. Only variables that shared at least 15% of the factor variance, corresponding to a factor loading of at least 0.40 were used for further analytical interpretation [42,43]. After EFA, the clinical significance of each latent factor was named, and SP was created using a weighted approach:SP=γ 1 F 1 +γ 2 F 2 +⋯+γ k F k , where F 1 ,F 2 ,⋯F k were the extracted independent factors with specific clinical significance from the 16 manifest biomarkers, and γ 1 ,γ 2 ,⋯γ k denoted their risks to CHD, which were partial regression coefficients in LRD and Cox regression models described below.
The Cox model, on the basis of cohort study design, was conducted to validate the predictive effects using the formula below, where P(t) was the predictive probability of CHD at year t. Β denoted the predictor vector estimated by Cox regression, where Β=β 1 age+γ 1 F 1 +γ 2 F 2 +⋯+γ k F k or Β=β 1 age+γSP. The average probability at time t of j th age (P j t ), which approximately referred to the baseline hazard rate of general population, was calculated by model (2) through Β j = β 1 age j + γSP j , where SP j was the mean of SP in j th age from the general population. Therefore, the absolute risk (AR) of an individual with age j at time t was calculated by P j (t), the excess absolute risk (EAR) by P j t − P j t , and the relative absolute risk (RAR) by P j t / P j t .
Receiver operating characteristic (ROC) curve was used to evaluate the discriminant effect for LRD model (1), and to validate their predictive effect for Cox model (2). The area under the curve (AUC) for the ROC analysis together with sensitivity, specificity, and cutoff of P value was calculated using MedCalc software for ROC curve analysis [44].
In practice, to calculate the AR of CHD for a person using model (2), the SP needed to be turned to its original expression with the raw 16 biomarkers by, where α 1 ,α 2 ,⋯α 16 denoted the weights of 16 biomarkers after destandardization with mean (μ) and standard deviation (σ) from the large sample in the routine health check-up database.
Finally, we calculated SP of 28200 individuals to identify the high risk individuals with optimal cutoff provided by ROC curve for predicting risk of CHD.
All data analyses were conducted for males and females separately. The risk matrix for AR and RAR were depicted using ArcGIS 9.1, and all statistical analysis of predictive models was performed using SAS 9.1. A two-sided p value <0.05 was considered as having statistical significance.

Results
The prevalence of MetS in the study sample was 18.83%. Table 1 shows the distribution of age and 16 biomarkers between males and females with MetS. All variables were significantly different between genders. Of these biomarkers, BMI, DBP, SUA, TG, WBC count, ALT, GGT, CREA, Hb, HCT and prevalence of NAFLD were higher in males than in females, while SBP, FBG, TC, HDL-C and LDL-C were higher in females than in males. Table S2 shows significant correlations in most biomarkers. Table S3 and Figure S1 showed that the prevalence of 4 basic components (BMI≥25 kg/m 2 , high BP, hyperglycemia, and dyslipidemia), and their combinations were significantly different. Table 2 shows the explained variance, cumulative variance and loadings of the first eight factors. The results suggested that 75.07% and 75.49% of total variance were explained by the first eight factors for males and females respectively. These suggested that the patterns of factors were similar between males and females, though their ranks with descending order of explained variance were slightly different. Table S4 shows the standardized scoring coefficients of each factor for male and female groups. Figure 1_A1 and Figure 1_B1 depict the AUC with sensitivity, specificity, and the cut-off points of P value (criterion) by LRD model from the extreme case-control design using age and SP as discriminant factors.
The AUC was 0.994 (95%CI 0.984-0.998) in males, and 0.998 (95%CI 0.982-1.000) in females, suggesting that SP had a good performance for identifying subjects with or without CHD. Figures 1_A2 and 1_B2 depicts the AUC with the study parameters to predict 5-year risk of CHD by Cox model from the cohort study design using age and SP. AUC was 0.871 (95%CI 0.851-0.889) in males, and 0.899 (95%CI 0.873-0.921) in females. Figure 2 shows the 5-year AR matrix and RAR matrix for CHD by gender (A1 and A2 indicating AR and RAR matrices for males, and B1 and B2 for females). These matrices provide a simple tool for conducting CHD prediction in health management and clinical practice. For example, a male aged 40-years-old and having AR of 0.082 has an EAR of 0.064 (0.082-0.018) and RAR of 4.56 (0.082/0.018), while a male aged 60-years-old and having AR of 0.082, has the EAR of -0.043 (0.082-0.125) and RAR of 0.656 (0.082/0.125). These show that although their predictive probabilities for CHD over 5 years are the same, the CHD risk of the younger male is higher than the average risk of the same age, almost 4.56 times that of his peers, indicating that changes in lifestyle and social intervention strategies are needed for him. Alternatively the older male has a lower CHD risk compared to his peers, only 65.6% of the average risk of 60-year-old population, indicating that he has a good health status compared with his peers. Figure 3 shows the proportion of identified individuals with high risk of CHD among the total study sample (n=28200) using the cutoff point of AR with 0.0701 for male and 0.0739 for female as discriminate criterion (see Figures 1_A2 and 1_B2). The proportion of identified high risk individuals showed an S curve with age. For example, people at age 65 are at high risk of CHD (>90% for both genders) in the next 5 years.

Discussion
The present study is the first to apply a large populationbased data sample from routine health check-ups in an urban resident sample in China. The study extended previous studies by adding new biomarkers to the network of MetS. The main findings suggest that a weighted SP developed from 8 latent variables are able to classify patients with and without CHD, and to predict subjects who are at high risk of CHD in 5 years.

The components of MetS
Clinical use of criteria for MetS is primarily to predict risk of CVD and diabetes. MetS was commonly defined by the presence of obesity, diabetes, hypertension, and dyslipidemia, and these factors were involved in the mechanisms of insulin resistance, inflammation, and atherosclerosis [7]. This complex relationship can be considered as the metabolic network of MetS. Therefore, other cardiovascular risks related to this network should be included in the concept of MetS in order to predict risk of CVD and diabetes more precisely. Several studies have suggested that NAFLD [45,46], SUA [47], microalbuminuria, proinflammatory cytokines, prothrombotic and fibrinolytic factors, and oxidative stress [8,9] should be part of the components of MetS. However, the inclusion of extended biomarkers may be differ by study populations, and the availability of measurements. For example, data from routine health check-ups, the generalized concept of MetS should be extended using biomarkers, which are routinely measured with simple, inexpensive and standardized approaches. In this paper, we extended previous studies using factor analysis to identify 8 factors from 16 biomarkers they are measured in routine health check-up. Of 8 factors, EVF, LMF, HEF, and FAF were contributed by Hb & HCT, TG &HDL-C, ALT & GGT, and NAFLD & BMI in both males and females, standing for erythrocyte viscosity, lipid metabolism, hepatic enzyme metabolism, and fat accumulation respectively. LVF stood for lipid viscosity with inclusive (i.e. high loadings in factor analysis) of TC & LDL-C in both genders, and with further inclusive of HDL-C in females BPF reflected SBP & DBP in both genders, while BMI also had high loading on this factor in females. GMF, with FBG as the main manifestation, stood for glucose metabolism status, while SUA & CREA also had high loadings in males, and WBC count in females. This finding suggests that serum glucose concentration may be involved in renal function and inflammatory response. Similarly, IRF stood for inflammation response status with WBC count as the key element, clustered together with CREA in males while SUA & CREA in females, suggesting that WBC count was related to inflammation response status, which may link with renal function.

Synthetic predictor and its application
In CVD prevention, a well-established primary healthcare strategy is to identify population at higher risk using prediction models in order to prevent CVD at earlier stages [48]. Several prediction models, including the Framingham risk score, have been applied in healthcare and management in different populations [49][50][51][52][53][54]. However, the predictive powers of these models are relatively low (AUC usually around 0.80), which may be partially due to the limited number of selected predictive factors in these models. To improve and optimize the predictive models, one of the methods is to identify and add new predictors in the predictive models. In China, as part of primary healthcare support, routine health check-up for urban residents has been developed rapidly in recent years. A number of convenient and inexpensive measurements for CVD and diabetes related biomarkers are included in the health check-up. It provides us with a unique opportunity to develop new predictive models using these biomarkers to extend the concept of MetS in a cost-effective manner. In our present study, a weighted SP was created by summarizing eight latent factors. The results show that SP was not only a good discriminative index (assessed by ROC curves, Figures 1_A1  and 1_B1), but also a significantly improved predictor for CHD ( Figures 1_A2 and 1_B2). The AUC for the prediction of 5-year risk of CHD in the total study population was more than 0.85 in both genders (Figures 1_A2 and 1_B2) and demonstrated the SP could be used as a simple and effective health management tool in routine health check-up to predict subjects at high risk of CHD. Furthermore, SP-based 5-year CHD absolute risk matrix and relative absolute risk matrix can be easily applied in practice ( Figure 2). For example, for a male at a given age who receives health check-up, the matrices show his absolute hazard (A1) and the relative hazard ratio (A2) as compared with the average hazard in the same age group in males.

Advantages and limitations
The present study has several strengths. First, findings from the study were based on a large population-based sample from routine health check-up. People who receive the health checkup consist of the majority of urban Chinese employees, as they receive full coverage for healthcare supported by the Chinese government. Second, a weighted SP was developed using factor analysis from 8 latent factors of 16 measured variables, which represents eight metabolic-related factors (i.e., EVF, LVF, LMF, BPF, HEF, GMF, FAF, and IRF). This approach has the advantage of taking into consideration the associations between predictors and outcomes (i.e., the use of weighting method), rather than simply summing up the number of predictors [55]. Meanwhile, two main limitations should be kept in mind when interpreting the results. First, although the study included a large population-based sample size, the results cannot be generalized for the general population because the participants who received routine health check-ups are urban residents and are employed. Second, the time period of followup in this cohort analysis is relatively short. Therefore, further studies are needed to confirm the findings.
Despite the aforementioned limitations above, the present study, using data from routine health check-ups, extends the previous concept of MetS by including variables related to the MetS network that are ready to use from routine health checkup. The developed SP shows as a simple and validate predictor for identifying subjects at high risk of CHD. Figure S1. The combined proportion of the 4 basic components between male and female metabolic syndrome groups. OB, obesity; HT, hypertension; HG, hyperglycemia; DY, dyslipidemia. (TIF) Table S1. The incidence of coronary heart disease by follow-up year.

Supporting Information
(DOC)