Bayesian clustering with uncertain data

doi:10.1371/journal.pcbi.1012301

Bayesian clustering with uncertain data

Fig 2

Accuracy on simulated datasets.

Simulation methods are split according to whether they assumed perfectly observed data or allow for uncertainty. mclust and kmeans do not by default allow for uncertain data, and appear in the second column when used as the clustering engine for representative clustering. The left two columns include results where K is inferred. For methods which allow a known value for K to be supplied, we show these for comparison in the righthand two columns. The first row of plots has the lowest noise N around the cluster mean, and the bottom row has the highest noise. Increasing uncertainty, corresponding to greater difference between latent data and the observed data is shown on the x-axis. The accuracy of the clustering is given by the Adjusted Rand Index (ARI) between the true clustering and the inferred clustering, averaged over 100 simulations. Higher values of ARI are better.

doi: https://doi.org/10.1371/journal.pcbi.1012301.g002