Table 1.
Variables included in the machine-learning algorithms.
Fig 1.
Patient cohort data extraction procedures.
Table 2.
Characteristics of patients aged 30 to 84 in the CPRD study cohort who were free from CVD at baseline.
Patients are stratified by first CVD event during the 10-year follow-up period.
Table 3.
Top 10 risk factor variables for CVD algorithms listed in descending order of coefficient effect size (ACC/AHA; logistic regression), weighting (neural networks), or selection frequency (random forest, gradient boosting machines).
Algorithms were derived from training cohort of 295,267 patients.
Table 4.
Performance of the machine-learning (ML) algorithms predicting 10-year cardiovascular disease (CVD) risk derived from applying training algorithms on the validation cohort of 82,989 patients.
Higher c-statistics results in better algorithm discrimination. The baseline (BL) ACC/AHA 10-year risk prediction algorithm is provided for comparative purposes.
Fig 2.
Illuminating “black-box” understanding of machine-learning neural networks: visualization of the risk factors and their association with cardiovascular disease developed from CPRD primary care study population.
Green lines are positive predictors, red lines are negative predictors, and the thickness of the line represents the weight (importance) of the risk factor to the outcome.