Machine learning models of healthcare expenditures predicting mortality: A cohort study of spousal bereaved Danish individuals

doi:10.1371/journal.pone.0289632

Fig 1.

List of all DIORs of healthcare expenditures time series used in prediction models by signal category.

More »

Expand

Table 1.

Distribution of socio-demographic variables across validation and holdout study samples.

More »

Expand

Fig 2.

Performance of risk prediction models for the non-stratified analysis.

Orange point-ranges specify performance measure value along with their 95% confidence intervals for the holdout set. The vertical dashed black line specifies the value of the AUC (0.5) for which a model is not able to discriminate.

More »

Expand

Table 2.

Performance of prediction models for 1-year mortality risk stratified on sex in the holdout set.

More »

Expand

Table 3.

Performance of “Benchmark + Aggregated Dynamics” model stratified on age groups.

More »

Expand

Fig 3.

Mean variable-importance calculated by using 1000 permutations and the one minus AUC loss function for the XGBoost model.

The bars in the plot indicate the mean values of the variable-importance measures for all explanatory variables. Box plots (dark blue) are added to the bars to provide an overview about the distribution of the values of the measure across the permutations.

More »

Expand

Fig 4.

Calibration plot showing risk estimates of all-cause mortality within the year after spousal bereavement against outcome proportions observed in the holdout dataset.

The plot (density type) displays the calibration of the XGBoost model fitted to the validation dataset. The model’s risk predictions for the individuals in the hold-out dataset are ordered from low to high and shown in the x-axis (“Predicted 1-year mortality risk after spousal loss”). A fixed bandwidth defines how many hold-out dataset individuals are nearest neighbors to any given probability risk-value p on the x-axis. The y-axis (“Observed Mortality Proportion”) shows the relative frequency of the holdout set individuals with the outcome (Mortality) in the neighborhood around the value p. Points falling in the diagonal line represent perfect calibration of the model.

More »

Expand

Fig 5.

Decision curve for validation of prediction model developed to estimate the risk of all-cause mortality with the year after spousal bereavement.

The x-axis (‘Risk Threshold’) shows the range of risk threshold probabilities, that is the probabilities which when exceeded, individuals are classified as high risk of dying within 1-year. The y-axis (‘Net Benefit’) shows the proportion, in True Positives, of accurately diagnosed and treated individuals, for each given threshold probability, after subtracting the weighted False Positives. A proportion of 0.02 implies 2 true positives for every 100 persons in the target population, without unnecessarily intervening on those. The red line represents the scenario of treating every individual 65 years of age or older who suffers bereavement without adhering to a predictive model (‘Treat All’). The brown, flat line depicts the scenario of not intervening at all in the newly bereaved individuals (‘Treat None’). The rest of the colored lines indicate the number of true positives (after subtracting the weighted False Positives) as a proportion based on the predictions of a specified model.

More »

Expand

Fig 6.

Calibration plot for two different modelling algorithms (logistic regression and XGBoost).

Risk estimates are being shown of all-cause mortality within the year after spousal bereavement against outcome proportions observed in the holdout dataset. Points falling in the diagonal line represent perfect calibration of the model.

More »

Expand