Table 1.
Summary of previous studies on MetS prediction using noninvasive information.
Fig 1.
Overall model development procedure.
Table 2.
The characteristics of the selected participants (n = 70,370).
Table 3.
List of features extracted from the health checkup records.
Table 4.
List of types and details of synthetic features used in this study.
Table 5.
Characteristics of the training/validation/test datasets.
Fig 2.
Diagram for feature selection process.
Fig 3.
Feature selection process for each round.
Table 6.
Parameter tuning settings by classifier.
Table 7.
Final selected features.
Table 8.
Raw features used for synthesis.
Table 9.
Descriptions of raw features used for synthesis.
Fig 4.
Optimized classifiers and their important features.
The x-axis is the relative importance of the features, and the y-axis is the name of the features used. The x-axis of LR is the value of applying the regression coefficient to the exponential function. “Our proposed synthetic feature” was asterisked in red before the feature name. (CLBE: The percentage of energy obtained from carbohydrates, non-smoker: Current smoking status, grain: Whole grain intake, retinol: Retinol intake, kimchi: Kimchi (Korean traditional food) intake, fat energy: Percentage of energy from fat, green vegetables, leaf tea: Green tee intake, lettuce: Lettuce intake) See Table 4 for synthetic features.
Fig 5.
Feature diagram for each classifier from raw features perspective.
The number of features used by each classifier is indicated in parentheses.
Table 10.
Parameters of the candidate models after tuning the parameters.
Table 11.
Performance of candidate models.
Fig 6.
Calibration plot for each classification model.
The x-axis is the mean predicted value, and the y-axis is the fraction of positives. The more it matches the diagonal line of the plot, the better the calibration. If the plot is drawn diagonally below, the predicted result is overestimated than the actual result.
Table 12.
Calibration results for each classification model.
Table 13.
Final evaluation results of candidate models.
Fig 7.
Example of the execution process of the final model from raw feature to prediction.
The values in the circle are the actual values for an instance.
Fig 8.
Creating a MetS risk map from the decision tree.
(A) is the DT of this study and has a depth of 5. Each node is a classification rule for datasets, with blue representing MetS and orange representing non-MetS. The higher the probability, the darker is the color. (B) is a textual representation of the decision rule in (A), where only the red box portion is taken. Classification rules are expressed in the form of inequality; class 1 means MetS, and class 0 means non-MetS. (C) represents the inequality expressed in (B) in a plane. For example, at (C), the red diagonal at the bottom is the line for BPWC_add = 0.66. Because BPWC_add = BP + WC, the rule can be drawn on a plane with WC and BP as axes.
Fig 9.
(A) is the final form of the MetS risk map. We split the region to correspond to the leaf node of the DT on a plane with the proposed synthetic feature WC and BP as two axes. Within each region, there is a record of how much the incidence is higher than average. The greater the value, the higher is the risk of developing the disease. Regions with values of 1 or more are classified as MetS areas and marked in red. In (B), important lines and regions are emphasized in preventing and managing MetS management.
Fig 10.
Coordinate space of composite feature WC and BP.
The 0.5 point of the two axes is equal to the diagnostic criterion. WC is a feature synthesized by waist circumference and gender. BP is a feature synthesized by systolic and diastolic blood pressure. α = (diagnostic criteria for systolic blood pressure−diagnostic criteria for diastolic blood pressure)*0.1, β = Diagnostic criteria for waist circumference *0.1.
Fig 11.
Features and feature importance of integrated and sex-specific individual models.
“grp35” is a synthetic feature related to dairy products.
Table 14.
Performance of integrated and sex-specific individual models using the same test set.
Fig 12.
Risk map from PPV perspective and the risk index distribution in FP cases.
(A) Three zones of risk map from the perspective of PPV perspective: Green, yellow, and red zone. (B) Risk index distribution of FP in yellow zone. (C) Risk index distribution of FP in red zone. 0.45 is the point where MetS management is required preemptively (indicated by a red dotted line).
Table 15.
Misclassification by zone.