Predictors of all-cause mortality among 514,866 participants from the Korean National Health Screening Cohort

Background There is not enough evidence regarding how information obtained from general health check-ups can predict individual mortality based on long-term follow-ups and large sample sizes. This study evaluated the applicability of various health information and measurements, consisting of self-reported data, anthropometric measurements and laboratory test results, in predicting individual mortality. Methods The National Health Screening Cohort included 514,866 participants (aged 40–79 years) who were randomly selected from the overall database of the national health screening program in 2002–2003. Death was determined from causes of death statistics provided by Statistics Korea. We assessed variables that were collected at baseline and repeatedly measured for two consecutive years using traditional and time-variant Cox proportional hazards models in addition to random forest and boosting algorithms to identify predictors of 10-year all-cause mortality. Participants’ age at enrollment, lifestyle factors, anthropometric measurements and laboratory test results were included in the prediction models. We used c-statistics to assess the discriminatory ability of the models, their external validity and the ratio of expected to observed numbers to evaluate model calibration. Eligibility of Medicaid and household income levels were used as inequality indexes. Results After the follow-up by 2013, 38,031 deaths were identified. The risk score based on the selected health information and measurements achieved a higher discriminatory ability for mortality prediction (c-statistics = 0.832, 0.841, 0.893, and 0.712 for Cox model, time-variant Cox model, random forest and boosting, respectively) than that of the previous studies. The results were externally validated using the community-based cohort data (c-statistics = 0.814). Conclusions Individuals’ health information and measurements based on health screening can provide early indicators of their 10-year death risk, which can be useful for health monitoring and related policy decisions.


Introduction
General health check-ups are a screening procedure targeting the currently healthy population to detect diseases earlier and to intervene to better prevent chronic diseases. Check-ups usually include a medical history, anthropometric measurements and laboratory tests such as simple blood and urine tests. These visits might help detect and prevent chronic diseases, but there is insufficient evidence regarding the effectiveness of interventions based on periodic health check-ups and the predictive value of the information obtained. Two prior systematic reviews of clinical trials of general health check-ups were critical of the outcomes, with general checkups not reducing all-cause, cancer or cardiovascular disease mortality [1,2]. On the other hand, two nationwide population-based cohort studies in Korea and Taiwan reported a favorable effect of health check-ups, such as lower all-cause and cardiovascular disease (CVD) mortality rate and early treatment of hypertension, diabetes, and dyslipidemia [3,4]. The study populations and areas included in the systematic reviews were limited, as they included studies on European descendants in developed countries and the average cost of general health checkups (£423 (near $464) for Eurohealth in 2009 [5]) seemed to be higher than that of the national health insurance coverage in the latter cohort studies. The Korea National Health Insurance Service (NHIS) provides a mandatory biennial general health check-up for people aged 40 years and over that does not require copayment and reimburses approximately $40 to medical providers upon return of individuals' health check-up and report [6,7]. The NHIS also provides health check-ups for blue-collar workers every year. It covers the entire employed and unemployed population over the age of 40 years. 74.8% of eligible population participated in the biennial health check-up in 2014 [8].
The difference in results between the favorable effect of general health check-ups in Korea and Taiwan and the lack of beneficial effect in the reviews of European descendants may be due to differences in whether the general health check-up program was covered by mandatory policies under cheap or no copayment. The results of 'natural experiments' in entire populations provide compelling evidence [9]. Previous studies have assessed the predictive values of risk factors and developed all-cause mortality predictors with self-reported health status in the UK and US [10,11], but they did not develop a mortality predictor using a combination of self-reported health status and objective test results to represent the mortality risk of the general population.
Although having periodic health check-ups cannot be mortality predictor by itself, the collected information from heath check-ups can be useful to predict of mortality risk and to apply personalized prevention and intervention strategies. This study examined the applicability of information from self-reported questionnaires, anthropometric measurements and laboratory test results collected from the general health check-up program as an effective predictor of mortality among the healthy or asymptomatic population based on the large-scale nationwide database in Korea.

Data collection
The NHIS database includes various health check-up items based on physicians' counseling and physical examinations, dentists' dental examinations, and health examinees' questionnaire results and anthropometric measurements. In addition, systolic and diastolic blood pressure (SBP, DBP), vision, hearing, and chest x-ray imaging results were collected. Blood and urine samples were collected, and laboratory tests were performed, including dipstick urine tests (occult blood, glucose and protein), complete blood count (CBC), fasting blood glucose (FBG), and serum levels of aspartate aminotransferase/alanine transaminase (AST/ALT), gamma-glutamyl transpeptidase (U-GTP), total cholesterol, high-density lipoprotein (HDL)cholesterol, triglycerides (TG) and creatinine.
Among the participants aged 40-79 years who participated in the biennial national health screening program covered in the Korea National Health Insurance cooperation in 2002 and 2003, 10% of all participants were randomly selected to form the National Health Insurance Service-National Health Screening Cohort (NHIS-HEALS). As a result, 514 866 subjects were selected to construct the cohort. Between 2002 and 2008, each subject had participated in the national health screening program 1-7 times. 67 737 participants undertook once, on the other hand 38 222 participants undertook 7 times of health check-ups. The data included information from the repeated health check-ups. The date and cause of death was identified from the records of Statistics Korea.
The health check-up collects survey data, body measurements and blood and urine test results. Between 2002 and 2008, the questionnaire included past medical history and family history (liver disease, stroke, cancer, heart disease, diabetes mellitus, or hypertension, drinking frequency and amount, and smoking frequency and amount. In the same period, anthropometric measurements (weight and height), BP and laboratory testing results (fasting blood glucose (FBG), total cholesterol level, ALT, AST, GGT, hemoglobin, urine pH, urine occult blood, and urine protein). In total, 546 subjects whose body mass indexes (BMIs) were missing were excluded.
Based on a modified version of the guidelines of the American Diabetes Association (ADA) in 2016 [12], study subjects were classified by measured FBG levels into 5 groups: 'Hyperglycemic crisis' (!200 mg/dL, 11.1 mmol/L); 'Diabetes' (126-199 mg/dL, 7.0-11.0 mmol/L); 'Prediabetes' (100-125 mg/dL, 5.6-6.9 mmol/L); 'Healthy' (50-99 mg/dL); and 'Low FBG' (< 50 mg/dL). Based on a modification of the guidelines of the Third Report of the National Cholesterol Education Program (NCEP) [13], study subjects were classified by their measured total cholesterol levels into 5 groups: 'Extremely high' (> 360 mg/dl); 'High' (240-359 mg/dl); 'Borderline' (200-239 mg/dL); 'Healthy' (120-199 mg/dl); and 'Low' (< 120 mg/dL). Using hemoglobin levels, study subjects were classified into 4 groups: 'Anemia' (< 13.0 g/dl in men and < 12.0 g/dl in women); 'Desirable' (13-14.9 g/dl in men and 12-13.9 g/dl in women); and 'High' (! 15 g/dl in men and ! 14 g/dl in women). ALT levels were used to determine 3 groups: 'Low' (< 20 U/L), 'Desirable' (20-39 U/L) and 'High' (! 40 U/L). Urine occult blood and urine protein detection in the Dipstick test were used as surrogate markers of chronic kidney injury (CKI). A disease score was constructed by the subject's number of self-reported diseases including heart disease, stroke, diabetes mellitus, liver disease and cancer at baseline. We used 18.5, 23, 25, 27.5, and 30 as BMI (kg/m 2 ) cut-off points to enable international comparisons with the WHO [14]. SBP and DBP was used to classify subjects into the group, 'Healthy' (SBP < 140 mmHg and DBP < 90 mmHg); 'Hypertension Stage 1' (SBP of 140-159 mmHg or DBP 90-99 mmHg); and 'Hypertension Stage 2' (SBP of ! 160 mmHg or DBP of ! 100 mmHg), based on the guidelines of the Seventh Report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure (JNC 7) [15]. Prehypertension was included in the 'Healthy' group. We included the variables age, square of age (age 2 ), sex, smoking (pack-years), drinking frequency, disease score from the questionnaire, BMI, FBG, total cholesterol, ALT, hemoglobin, and CKI as surrogate markers to develop the risk predictor. The variables were selected for the general health status based on the current knowledge and how widely they are used for health screening. Korean medical insurance system evaluates the property and annual household income to provide Medicaid and health insurance services. We used the quintiles of property and annual income as an inequality index.

Statistical methods
Hazard ratios (HRs) and 95% confidential intervals (95%CIs) for each exposure variable and mortality risk were calculated by multiple Cox proportional hazard models in both traditional and time-variant methods. Risk score assessing the probability of each individual's death was calculated by the product of HRs as follows: β rf : Beta coefficient of risk factor M rf : Mean value of risk factor (mean of each category of risk factor for categorical variables) Equation 1. Calculation of risk score The absolute risk of each individual was calculated with risk score and mean survival rate in 10 years.
expðrisk scoreÞ=ME rs P: absolute risk S(t): survival rate in t years ME rs : mean of exp (risk score) Equation 2. Calculation of absolute risk from risk score In addition to the risk prediction models based on Cox proportional hazard models, other risk prediction models based on boosting and random forest [16,17] were used. Boosting and random forest are both decision tree based machine learning methods. Boosting makes a decision tree and changes it slightly based on the classification error at each step. Random forest makes several decision trees with randomly selected subgroups.
We calculated the range of 10-year mortality risk and its risk reduction by eliminating a modifiable risk factor at different age groups and sex. The risk reduction was calculated by subtracting each risk factor from the highest risk condition for mortality.

Validation and sensitivity analysis
The risk prediction model was evaluated by the area under the curve (AUC) of the receiver operating characteristics (ROC) curve for discrimination based on 5-fold cross-validation with bootstrapping and external validation with the Korean Multicenter Cancer Cohort (KMCC), which is a community-based cohort recruited from four regions [18] with study subjects over 40 years of age. To assess the effect of repetitive measurements, we built another model with time-variant Cox regression and compared the two models. The calibration of the model was evaluated with the ratio of the expected and observed number of deaths (expected/observed ratio). For missing value imputation, the two highest rates of missing data for risk factors were 12.6% (smoking pack-years) and 2.9% (exercise frequency). The missing rates of the other risk factors were lower than 1%. The median value for continuous variables and the mode for categorical variables were used to impute missing values for the prediction models using random forest and boosting. Multiple imputation with chained equations (MICE) [19] with 5 imputed datasets and 10 iterations was used to impute the missing values of the KMCC data. The missing values were treated the same as the category with baseline risk. The statistical software packages used were R 3.2 (R Core Team, Vienna, Austria), mice R-package [20], Python 2.7 (Python Software Foundation, Amsterdam, Netherlands) and scikit-learn 0.17 python-package [21].

Ethical approval
This project was approved by the institutional review board in Seoul National University Hospital (reference number 0909-048-295). There is no consent form because the data were analyzed anonymously.

Results
The mean age of the 514,320 study subjects at baseline was 53.15 years. The general characteristics of the study subjects and the associations between potential risk factors and risk of allcause death are presented in Table 1. The differences in estimated HRs either byCox proportional hazard model or by the time-variant Cox regression model were not significant. Age, square of age-40, sex, smoking amount, drinking frequency, past history score, BMI, BP, FBG, total cholesterol, hemoglobin, ALT and surrogate markers of CKI were selected to implement the risk prediction models (S1 Equation). The mean of HRs of the multiple Cox proportional hazard model was used to calculate the risk score. All-cause 10-year survival rate was used to calculate the 10-year mortality risk of the risk score (Table 1).
Since the average of the risk scores is 0, the proportion of risk scores decrease as the risk scores increase. At higher risk scores (> 1), the proportion of the risk score was low (< 0.2), and at much higher risk scores (> 2), the proportion of the risk score was rare (< 0.1) (Figs 1  and 2). The calibration was good for the total population (expected/observed = 0.894), for women (expected/observed = 0.919) and for men (expected/observed = 0.895). Table 2 shows the contribution of each modifiable risk factor to the 10-year motarlity risk. Modifying fasting blood glucose had the highest reduction and blood pressure had the lowest reduction on 10-year mortality risk.
The c-statistics using Cox proportional hazard and time-variant Cox proportional hazard models were 0.832 and 0.841, respectively. The c-statistics of the machine learning-based random forest and boosting models were 0.893 and 0.712, respectively. The discrimination ability of the prediction model decreased with age (c-statistics = 0.82 for subjects aged 40-49 years; 0.78 for those aged 50-59; 0.72 for those aged 60-69; and 0.69 for those aged 70+). The c-statistics for the external validation using a community-based cohort, KMCC, was 0.814 (Table 3). Subjects with a high income level had relatively low 3-year and 10-year all-cause mortality risks and comprised a lower proportion of the high-risk group defined by various risk score cutoffs (S1-S3 Tables).

Discussion
In this large, population-based national cohort study, we evaluated the associations between health information and measurements that can be obtained from routine health check-ups and 10-year all-cause mortality and developed a mortality risk predictor. Although the prediction models were developed with health information and measurements based on selfreported data including smoking, drinking, exercise, and medical history, anthropometric All-cause mortality predictor from Korean National Health Screening Cohort measurements, and blood and urine laboratory test results, the all-cause mortality risk predictor showed excellent discrimination with cross and external validation. The Korea National Health and Nutrition Examination Survey (KNHANES) [22] and several other health screening programs [23][24][25] previously used these health information and measurements for the health check-up program. Although general health check-ups cannot reduce all-cause mortality [1,2], repeated check-ups can be used to improve the surrogate markers of mortality [2]. Prior results support our finding that the health information and measurements obtained in general health check-up can be used to estimate a future individual death risk.
The discriminatory ability of our prediction model was higher than the mortality predicted by general self-rated health only (c-statistics = 0.74) or by the Vulnerable Elders Survey (VES-13) for the elderly (c-statistics = 0.78) [26,27]. The discriminatory ability of the random forest model relative to that of the traditional or time-variant Cox regression models was higher for predicting mortality (c-statistics = 0.89) despite the wide confidence intervals of the c-statistics, whereas the discriminatory ability of the boosting model was rather low (c-statistics = 0.71). The major reason for this difference is the difference between the two machine-learning models in classification algorithms; boosting is usually known to work better in shallow trees (5-15 leaves) and with data from many weak learners (a classifier that is only slightly correlated with the true classification) [28]. Therefore, a lower discriminatory ability for boosting is consistent with its innate algorithm.
Prior studies have reported U-or J-shaped associations of all-cause mortality with BMI (underweight, normal weight, and obesity) [29]; cholesterol (low, high, total and LDL-cholesterol) [30,31]; FBG (low FBG or impaired glucose levels, diabetes levels) [32,33]; and hemoglobin (low or very high) [34], while for physical activity and cigarette smoking, an increase in all-cause mortality with higher levels of exposure to these variables has been reported [35,36]. Undernutrition may be a highly predictive factor of short-term mortality, especially in the elderly [37], and low cholesterol, FBG, hemoglobin, and BMI may be phenotypes of undernutrition. In our data, those with lower levels of the four surrogate markers had a lower SES, smoked more cigarettes, drank alcohol more frequently, exercised less frequently and had a past history of 0.75 diseases on average. By contrast, those with higher levels of FBG, BMI, cholesterol and hemoglobin had a higher SES, exercised more frequently and had a past history of 0.93 diseases on average, and they were more vulnerable to long-term death than short-term death.
A previous study developed a risk prediction score using many self-reported health indicators and some blood assays [10], while we developed a risk predictor based on health information and measurements composed of self-reported data, blood and urine assays, and anthropometric measurements that are commonly used in periodic general health check-ups.
Both models can help the adult population seeking health information by providing proper health information regarding health determinants to estimate their future individual risk of death and by improving self-awareness of proper interventions to maintain good health despite increases in anxiety and overutilization. Additionally, physicians would be able to provide suggestions for modifying lifestyles using the mortality risk predictor as quantitative evidence. Moreover, a reduction in mortality risk at the population level could be expected from targeted interventions for high-risk individuals or groups based on individual mortality risk.
Our study has several limitations. First, the risk predictor was calibrated for individuals who were enrolled in the national health screening program and were 40-70 years old. Additional calibration is required before it can be generalized. Second, we developed a risk predictor for all-cause mortality, and thus additional risk predictors for cause-specific mortality may be developed in the future. Since the risk predictor was developed based on the general health examinees who are relatively healthy, the predictability of the developed risk predictor in variant health conditions has to be investigated before it can be generalized. Lastly, we did not include the control group of subjects who would not attend check-ups in the analyses. For this reason, it was difficult to evaluate how periodic check-up itself influence individuals' mortality risk. Therefore, we also need to interpret the data considering potential bias of socio-economic status or employment status.
In summary, we developed a 10-year all-cause mortality risk predictor based on data from a national health screening program conducted with the middle-aged to elderly population in Korea. We developed a risk prediction model based on common measures obtained by questionnaires, basic physical examinations and blood tests. The risk predictor developed in this study showed better discriminatory ability than previous predictors. Further trials are needed including studies determining the availability of health information and measurements in All-cause mortality predictor from Korean National Health Screening Cohort younger populations, site-specific mortality risk and risk of disease incidence, and validation of the predictor in other populations should continue to be researched.