Fig 1.
Technical route of DKCDC algorithm.
Bound(P) and Inter(P) denote the set of boundary point and the set of internal point respectively. Clust(P) denotes the set of initial clustering label obtained by CDC. Brid(P) denotes the points on the bridge. denotes the points that are obviously deviated from the cluster.
denotes the points on the edge of the cluster.
denotes unobvious deviation points at the edge of clusters.
denotes interior points at the edge of the clusters.
Fig 2.
The geometric meaning of bridge points, deviation points and less deviated edge points.
Fig 3.
The four core steps of the DKCDC algorithm and the silhouette coefficients at different steps.
Fig 4.
A fusion strategy that combines voting and distance methods.
(a) The r-domain of the boundary point pi, which located on the bridge, does not include any internal points. (b) Boundary point pi that deviates from cluster has its r-domain containing one internal point. (c) In small-scale cluster, the r-domain of the boundary point pi contains most of the boundary points. (d) The r-domain of the boundary point pi, which deviates from the cluster, contains more boundary points, and . (e) The number of boundary points within the r-domain of the boundary point pi, which located on the cluster’s edge, is greater than the number of internal points, and
. (f) The number of boundary points within the r-domain of the boundary point pi, which deviates from the cluster, is greater than the number of internal points and there are cross-cluster internal points. (g) The r-domain of the boundary point pi, which deviates from the cluster, contains more internal points, but
. (h) The number of boundary points within the r-domain of the boundary point pi, which located on the cluster’s edge, is smaller than the number of internal points, but
. (u) The number of boundary points within the r-domain of the boundary point pi, which deviates from the cluster, is smaller than the number of internal points and there are cross-cluster internal points. (v) The result of clustering under fusion strategy.
Fig 5.
The silhouette coefficient obtained by the DKCDC algorithm under different parameters r.
Fig 6.
The noise distribution under different r values.
Fig 7.
The overall flow chart of DKCDC algorithm.
Table 1.
Comparison of clustering results on artificial datasets (Use the Silhouette Coefficient as measurement index).
Fig 8.
The results of DKCDC, CDC, K-Means, DBSCAN, OPTICS, HDBSCAN algorithms on Rotundity dataset: (a)-(f).
Fig 9.
The results of DKCDC, CDC, K-Means, DBSCAN, OPTICS, HDBSCAN algorithms on Islands dataset: (a)–(f).
Fig 10.
The results of DKCDC, CDC, K-Means, DBSCAN, OPTICS, HDBSCAN algorithms on Alphabet dataset: (a)–(f).
Table 2.
UCI datasets.
Table 3.
Comparison of clustering results on UCI datasets.
Fig 11.
Cluster evaluation index comparison chart of different algorithms on UCI datasets: (a)–(c).