Table 1.
Descriptive statistics of the study population, grouped by age.
Only age, sex, height and weight were ultimately used in the machine learning models as predictive variables. All variables were, however, used for imputing missing data and constructing synthetic datasets.
Fig 1.
Histogram comparison for each variable comparing the aggregate demographic characteristics of the real training dataset (n = 2408) against synthetic dataset A (n = 2408).
Fig 2.
Histogram comparison for each variable comparing the aggregate demographic characteristics of the real training dataset (n = 2408) against synthetic dataset B (n = 4816).
Table 2.
Statistical analysis comparing synthetic data tables to the real training dataset (n = 2408).
Presented are propensity score mean-squared-error and standardised ration of propensity score mean-squared error.
Table 3.
Results of the machine learning models, trained on real or synthetic datasets.
Each was tested on the same test dataset (real data). None of the p-values were <0.05.