Shape complexity in cluster analysis

doi:10.1371/journal.pone.0286312

Table 1.

Data sets used and their properties.

More »

Expand

Table 2.

Performance of k-means, according to ARI_fnc and AMI_max, on various scaled versions of the data sets in Table 1.

More »

Expand

Fig 1.

ARI_fnc versus AMI_max for all partitions resulting from scaling as in the rightmost column of Table 2.

More »

Expand

Fig 2.

Results of the random trials with Problem P on Iris (A), BCW (B), BC-DR3 (C), BNA-DR3 (D), and BCW-Diag-10 (E), expanding on the summary given on the rightmost column of Table 2. Each point on each left panel corresponds to a trial and is color-coded according to the accompanying palette to reflect the value of ARI_fnc it leads to by way of clustering with k-means. The point leading to the highest ARI_fnc value is marked by the crosshair in the panel. Each right panel provides a view of how ARI_fnc is distributed over all pertaining trials.

More »

Expand

Table 3.

Scaling factors used in Figs 3 (Iris) and 4 (BNA-DR3).

The α_k’s are the ones leading to the highest values of ARI_fnc in the intervals on the rightmost column of Table 2.

More »

Expand

Fig 3.

Reference partition for the Iris data set (leftmost column of panels) and the effects of two scaling schemes: Scaling by 1/σ_k (middle column) and scaling by α_k/σ_k (rightmost column), with factors as in Table 3. Effects can be seen both with respect to the shape of the data set (top row of panels, all plots drawn to the same scale) and to the distribution of distances between samples (the r_ij’s; bottom row, all plots drawn to the same scale).

More »

Expand

Fig 4.

As in Fig 3, now for the BNA-DR3 data set.

More »

Expand