Machine learning-based predictive model for prevention of metabolic syndrome

doi:10.1371/journal.pone.0286635

Table 1.

Summary of previous studies on MetS prediction using noninvasive information.

More »

Expand

Fig 1.

Overall model development procedure.

More »

Expand

Table 2.

The characteristics of the selected participants (n = 70,370).

More »

Expand

Table 3.

List of features extracted from the health checkup records.

More »

Expand

Table 4.

List of types and details of synthetic features used in this study.

More »

Expand

Table 5.

Characteristics of the training/validation/test datasets.

More »

Expand

Fig 2.

Diagram for feature selection process.

More »

Expand

Fig 3.

Feature selection process for each round.

More »

Expand

Table 6.

Parameter tuning settings by classifier.

More »

Expand

Table 7.

Final selected features.

More »

Expand

Table 8.

Raw features used for synthesis.

More »

Expand

Table 9.

Descriptions of raw features used for synthesis.

More »

Expand

Fig 4.

Optimized classifiers and their important features.

The x-axis is the relative importance of the features, and the y-axis is the name of the features used. The x-axis of LR is the value of applying the regression coefficient to the exponential function. “Our proposed synthetic feature” was asterisked in red before the feature name. (CLBE: The percentage of energy obtained from carbohydrates, non-smoker: Current smoking status, grain: Whole grain intake, retinol: Retinol intake, kimchi: Kimchi (Korean traditional food) intake, fat energy: Percentage of energy from fat, green vegetables, leaf tea: Green tee intake, lettuce: Lettuce intake) See Table 4 for synthetic features.

More »

Expand

Fig 5.

Feature diagram for each classifier from raw features perspective.

The number of features used by each classifier is indicated in parentheses.

More »

Expand

Table 10.

Parameters of the candidate models after tuning the parameters.

More »

Expand

Table 11.

Performance of candidate models.

More »

Expand

Fig 6.

Calibration plot for each classification model.

The x-axis is the mean predicted value, and the y-axis is the fraction of positives. The more it matches the diagonal line of the plot, the better the calibration. If the plot is drawn diagonally below, the predicted result is overestimated than the actual result.

More »

Expand

Table 12.

Calibration results for each classification model.

More »

Expand

Table 13.

Final evaluation results of candidate models.

More »

Expand

Fig 7.

Example of the execution process of the final model from raw feature to prediction.

The values in the circle are the actual values for an instance.

More »

Expand

Fig 8.

Creating a MetS risk map from the decision tree.

(A) is the DT of this study and has a depth of 5. Each node is a classification rule for datasets, with blue representing MetS and orange representing non-MetS. The higher the probability, the darker is the color. (B) is a textual representation of the decision rule in (A), where only the red box portion is taken. Classification rules are expressed in the form of inequality; class 1 means MetS, and class 0 means non-MetS. (C) represents the inequality expressed in (B) in a plane. For example, at (C), the red diagonal at the bottom is the line for BPWC_add = 0.66. Because BPWC_add = BP + WC, the rule can be drawn on a plane with WC and BP as axes.

More »

Expand

Fig 9.

MetS risk map.

(A) is the final form of the MetS risk map. We split the region to correspond to the leaf node of the DT on a plane with the proposed synthetic feature WC and BP as two axes. Within each region, there is a record of how much the incidence is higher than average. The greater the value, the higher is the risk of developing the disease. Regions with values of 1 or more are classified as MetS areas and marked in red. In (B), important lines and regions are emphasized in preventing and managing MetS management.

More »

Expand

Fig 10.

Coordinate space of composite feature WC and BP.

The 0.5 point of the two axes is equal to the diagnostic criterion. WC is a feature synthesized by waist circumference and gender. BP is a feature synthesized by systolic and diastolic blood pressure. α = (diagnostic criteria for systolic blood pressure−diagnostic criteria for diastolic blood pressure)*0.1, β = Diagnostic criteria for waist circumference *0.1.

More »

Expand

Fig 11.

Features and feature importance of integrated and sex-specific individual models.

“grp35” is a synthetic feature related to dairy products.

More »

Expand

Table 14.

Performance of integrated and sex-specific individual models using the same test set.

More »

Expand

Fig 12.

Risk map from PPV perspective and the risk index distribution in FP cases.

(A) Three zones of risk map from the perspective of PPV perspective: Green, yellow, and red zone. (B) Risk index distribution of FP in yellow zone. (C) Risk index distribution of FP in red zone. 0.45 is the point where MetS management is required preemptively (indicated by a red dotted line).

More »

Expand

Table 15.

Misclassification by zone.

More »

Expand