Table 1.
Data sets used and their properties.
Table 2.
Performance of k-means, according to ARIfnc and AMImax, on various scaled versions of the data sets in Table 1.
Fig 1.
ARIfnc versus AMImax for all partitions resulting from scaling as in the rightmost column of Table 2.
Fig 2.
Results of the random trials with Problem P on Iris (A), BCW (B), BC-DR3 (C), BNA-DR3 (D), and BCW-Diag-10 (E), expanding on the summary given on the rightmost column of Table 2. Each point on each left panel corresponds to a trial and is color-coded according to the accompanying palette to reflect the value of ARIfnc it leads to by way of clustering with k-means. The point leading to the highest ARIfnc value is marked by the crosshair in the panel. Each right panel provides a view of how ARIfnc is distributed over all pertaining trials.
Table 3.
Scaling factors used in Figs 3 (Iris) and 4 (BNA-DR3).
The αk’s are the ones leading to the highest values of ARIfnc in the intervals on the rightmost column of Table 2.
Fig 3.
Reference partition for the Iris data set (leftmost column of panels) and the effects of two scaling schemes: Scaling by 1/σk (middle column) and scaling by αk/σk (rightmost column), with factors as in Table 3. Effects can be seen both with respect to the shape of the data set (top row of panels, all plots drawn to the same scale) and to the distribution of distances between samples (the rij’s; bottom row, all plots drawn to the same scale).
Fig 4.
As in Fig 3, now for the BNA-DR3 data set.