A Machine Learning Approach for Using the Postmortem Skin Microbiome to Estimate the Postmortem Interval

doi:10.1371/journal.pone.0167370

Fig 1.

A plot of ADD versus percentage composition for the bacterial genus Vagococcus.

Each of the four sample cadavers has a corresponding curve, as indicated in the legend.

More »

Expand

Table 1.

Summary of data matrix dimensions for joint data (swabs for both ear and nose).

The number of rows in each table is 67 for all data, and the number of columns is the number of organisms, as shown. We also provide the logarithm of the number of columns in each dataset, for later reference.

More »

Expand

Fig 2.

The images A, B, and C show how the correlation between ^qD(X) and y depends on the choice of q and the dataset X.

The image D shows how diversity changes with ADD for the ear, nose and joint datasets (q = 0.4).

More »

Expand

Table 2.

The most significant correlation found between ^qD(X) and y for each dataset X, and the optimizing q value.

Kingdom data is omitted.

More »

Expand

Table 3.

The top ten models as ranked by cross-validation error on the training data when restricted to nose data are shown here.

The error units in columns 1 and 4 are mean absolute error. The values in the NRMSE column are root mean squared error on the test set, divided by the mean ADD over all nose data.

More »

Expand

Table 4.

The ear equivalent of Table 3.

More »

Expand

Table 5.

This table is similar to Table 3, but with joint datasets.

More »

Expand

Fig 3.

All 91 models considered for the joint data are plotted according to their cross-validation (training) error and test error, in units of mean absolute error.

The Pearson r = 0.53 with a p value of 8.67 × 10⁻⁸.

More »

Expand

Fig 4.

Panel D displays the classic diagram for the bias-variance tradeoff, showing how overly complex models minimize training error but may have sub-optimal test error.

The other panels show a similar picture for three regressors (SVR, KNeighbors, and ElasticNet) with the dimensionality of the dataset serving as a proxy for model complexity. The horizontal dimension is logarithmic.

More »

Expand

Table 6.

The ten top performing models when ranked by validation error.

More »

Expand

Fig 5.

The performance of the best model with respect to validation error on the validation set is described in panel A, by plotting true ADD for each element of the test set against the prediction of the model.

The identity function is plotted in the same frame for reference. Panel B is a similar plot describing the performance of the model which minimized cross-validation error on the training set.

More »

Expand

Table 7.

For each taxon in the leftmost column, this table shows the five most useful organisms for prediction of ADD, as determined by three different ranking methods: F-value, a decision tree based approach, and mutual information.

Unless otherwise indicated, terms refer to microbes located in the ear.

More »

Expand

Fig 6.

Some select high performing phyla, with ADD plotted against abundance.

The vertical axis is normalized for each organism so that the relative abundances are on a similar scale.

More »

Expand

Fig 7.

Some select high performing organisms from several taxa, with ADD plotted against abundance.

The vertical axis is normalized for each organism so that the relative abundances are on a similar scale.

More »

Expand